Encoding Your Domain Expert: The Context Layer Behind Spotify's Data Assistant
Spotify's data assistant, Vedder, addresses the problem of scalable data insights by introducing a context layer that captures the nuances of the data domain. This layer, composed of clusters owned by domain experts, includes datasets with full schema and profiling, vetted question-and-SQL pairs, and business context. By having domain experts curate and approve the examples that influence the assistant's behavior, Spotify ensures the accuracy and trustworthiness of the generated insights. The cluster model consists of three components: datasets, pairs, and docs, which capture the relevant data, teach the model about the domain, and provide additional business context. To ensure the examples are accurate and trustworthy, only 12.5% of proposed pairs were accepted during the curation phase, highlighting the importance of human judgment and expert review. This approach enables the assistant to scale while maintaining reliability and trustworthiness in its recommendations. The system continuously monitors cluster health, reflecting changes in data and context, allowing data experts to focus their c