15 min
Oct 15, 2024
11:40 am

Immutable KV Store on Cassandra

Learn how Uber's Michelangelo team uses Cassandra for online prediction, its limitations, and how an immutable store offers a solution.

About this session

The Michelangelo (MA) feature store is a service to store and serve ML features for Uber's AI platform at scale of millions of QPS. For serving online predictions, it uses Cassandra as the online storage and the Marmaray dispersal job to perform daily data ingestion from Hadoop to Cassandra using a direct data dump. The ingested data is then used to serve real-time services with low-latency read requirements. If the dispersal job fails to upload data, MA can still make model predictions using the previous day's data. The MA team uses TTL to recycle old data from Cassandra. If the dispersal job fails to ingest data due to bad code/config or lacking data to ingest continuously, data in Cassandra will eventually expire resulting in no features to server the model predictions. Cassandra Immutable Store is an effort to implement a version based data recycling on top of Cassandra to avoid data expiration caused by TTL. In this talk, we will go through: How michelangelo uses Cassandra to serve features for online prediction and the issue of that set up How the concept of immutable store fix the issue we are seeing And a deep dive of the detail implementation of immutable store.

Moderator

Session Speaker

Session Speaker

Session Speaker

Session Speaker

Session Speaker

Join our Slack channel to stay up to date on all the latest feature store news, including early notifications for conference updates.