From Your Year in Music: A Technical Guide to Generating Personalized Listening Stories

Overview

Every December, millions of Spotify users eagerly open their Wrapped experience, which reveals not just top artists and genres but also personalized narratives about their most interesting listening moments of the year. This guide pulls back the curtain on the engineering behind those stories—how we transform raw streaming data into engaging, data-driven highlights for your 2025 Wrapped. You'll learn the core pipeline: from data collection and feature extraction to anomaly detection, clustering, and narrative generation. By the end, you'll understand how to build a similar system that can identify and tell stories about unique listening patterns.

From Your Year in Music: A Technical Guide to Generating Personalized Listening Stories — Source: engineering.atspotify.com

Prerequisites

Data: A rich dataset of user listening history, including timestamps, track IDs, artist IDs, play duration, and skip events.
Tools: Python 3.8+, common data science libraries (pandas, numpy, scikit-learn), a time-series library (e.g., tsfresh), and an LLM API (e.g., OpenAI or a local model) for text generation.
Knowledge: Basic understanding of clustering (e.g., DBSCAN), anomaly detection (e.g., Isolation Forest), and natural language generation.

Step-by-Step Instructions

1. Data Collection & Preprocessing

The first step is to gather and clean listening data for a given user over the entire year. In production, this lives in a data warehouse (e.g., BigQuery), but for prototyping, you can use a CSV export of your own listening history.

import pandas as pd

# Load listening data
listening_data = pd.read_csv('user_listening_history.csv')

# Filter for year 2025
listening_data = listening_data[listening_data['timestamp'].dt.year == 2025]

# Create basic features: hour of day, day of week, month, track duration

We also compute per-session features: session length, diversity of artists, and skip patterns. This forms the basis for detecting interesting moments.

2. Feature Engineering for Listening Moments

To identify what makes a listening moment “interesting,” we need to transform raw data into meaningful signals. Key features include:

Deviation from baseline: How much does a session’s genre mix differ from the user’s average?
Novelty score: Were tracks played that the user has never heard before?
Time anomaly: Did the user listen at a highly unusual hour?

We use tsfresh to extract hundreds of statistical features from time windows (e.g., length of consecutive listens, autocorrelation).

from tsfresh import extract_features

# Sliding window of 24 hours
features = extract_features(listening_data, column_id='user_id', column_sort='timestamp')

# Reduce dimensionality with PCA (keeping 95% variance)

3. Anomaly Detection – Finding the Outliers

Interesting moments are often anomalies. We apply an Isolation Forest model on the feature matrix to flag sessions that significantly deviate from the norm. These flagged sessions become candidate highlights.

from sklearn.ensemble import IsolationForest

model = IsolationForest(contamination=0.05, random_state=42)
anomaly_scores = model.fit_predict(features_pca)

# Select anomaly indices (score == -1)
anomalies = features_pca[anomaly_scores == -1]

Remember: contamination rate (0.05) means we expect ~5% of sessions to be highlighted. Tune this based on user feedback.

4. Clustering – Grouping Similar Highlights

Anomalies can be scattered; we want to group them into coherent stories. Use DBSCAN (density-based) because it doesn’t require specifying the number of clusters and handles noise well.

from sklearn.cluster import DBSCAN

clustering = DBSCAN(eps=0.3, min_samples=2)
cluster_labels = clustering.fit_predict(anomalies)

# Each cluster label corresponds to a theme (e.g., 'late-night explorations', 'genre jump moments')

To interpret clusters, compute the centroid’s top features (e.g., high novelty, high diversity) and assign a human-readable label.

5. Narrative Generation – Telling the Story

Finally, we need to convert each cluster into a short, engaging text. We structure a prompt for an LLM that includes:

The cluster’s label (e.g., “Weekend deep dives”)
Key statistics (e.g., 15 hours of listening, 3 new genres discovered)
A representative track or artist

import openai

prompt = f"""Write a fun, personal recap of a user's music listening moment. They had a cluster of listens called '{cluster_label}' with {hours_listened} hours of listening. They explored {num_new_artists} new artists, and the most played track was '{top_track}'. Make it sound like a story."""

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)
narrative = response.choices[0].message.content

In production, we fine-tune the model or use controlled templates to ensure consistency and avoid hallucination.

Common Mistakes

Overfitting anomaly detection: Setting contamination too high floods the user with every little variation; too low misses real highlights. Use hold-out validation.
Ignoring time context: A anomaly that occurs at 3 AM may just be a sleep track playing—filter out non-waking hours or use contextual windowing.
Poor cluster explainability: DBSCAN produces arbitrary cluster IDs. Always add post-processing to extract top descriptive features for each cluster.
LLM hallucinations: The narrative might invent facts (e.g., “You discovered this new genre in January” when the data shows February). Mitigate by providing strict constraints and using a fact-checking step.

Summary

This guide walked you through the technical pipeline behind generating personalized listening stories for Spotify Wrapped 2025. Starting from raw listening data, we engineered features, detected anomalies with Isolation Forest, grouped them with DBSCAN, and finally generated human-readable narratives using an LLM. The key is balancing statistical rigor with creative storytelling—making users feel seen and surprised. By implementing these steps, you can create your own system to identify and narrate unique moments from any time-series behavioral data.