Large-Scale Embedding Feature Generation at Uber
Explore how Uber uses embeddings for ML systems, covering their lifecycle, from creation to deployment, and their impact on performance.
Explore how Uber uses embeddings for ML systems, covering their lifecycle, from creation to deployment, and their impact on performance.
Embeddings are integral to numerous top-tier models at Uber, driving critical machine learning (ML) systems such as UberEats, HomeFeed, and Ads platforms. This talk will provide an in-depth exploration of how embeddings are generated at scale for various entities, such as eaters and restaurants. These embeddings are extensively utilized as features in downstream critical models and nearest neighbor-based retrieval systems. We will discuss the entire lifecycle of embeddings, from creation to deployment, and essential aspects such as versioning, analytics, and monitoring, which ensure the safe and consistent usage of embeddings in both offline and online environments. Additionally, we will showcase the ongoing enhancements to Michelangelo, Uber’s central ML platform, aimed at supporting the new embedding data type alongside numerical and categorical data types. These upgrades elevate embeddings to first-class citizens, promoting embedding reuse and significantly improving ML systems. Through a detailed case study of our HomeFeed ML system, we will demonstrate the tangible benefits of using embeddings, highlighting their impact on driving business metrics and performance.