Transformers in music recommendation


Users have more choices for listening to music than ever before. Popular services boast of massive and varied catalogs. The YouTube Music catalog, for example, has over 100M songs globally. It follows that item recommendations are a core part of these products. Recommender systems make sense of the item catalog and are critical for tuning the catalog for the user’s tastes and needs. In products that provide recommendations, user actions on the recommended items — such as skip, like, or dislike — provide an important signal about user preferences. Observing and learning from these actions can lead to better recommendation systems. In YouTube Music, leveraging this signal is critical to understanding a user’s musical taste.

Consider a scenario where a user typically likes slow-tempo songs. When presented with an uptempo song, the user would typically skip it. However, at the gym, when they’re in a workout session, they like more uptempo music. In such a situation, we want to continue learning from their prior history to understand their musical preferences. At the same time, we want to discount prior skips of uptempo songs when recommending workout music.

Below we illustrate the users’ music listening experience, with music songs shown as items and with the user’s actions as text beneath. In current recommendation systems that don’t consider the broader context, we would predict that the user will skip an uptempo song, resulting in demoting a potentially relevant and valuable song.