20 min
Oct 15, 2024
8:40 am

From Feature Store to AI Lakehouse

Hopsworks introduces the AI Lakehouse - an extension to the Lakehouse with MLOps capabilities, real-time data support, LLMs and more.

About this session

The feature store has been the data layer for MLOps platforms that consumed data from both historical data sources (data warehouses, data lakes, etc) and real-time data sources (message buses). Historical data, however, is increasingly found in the Lakehouse - an open transactional data layer (Apache Iceberg, Apache Hudi, Delta) for any query engine. Just as the cloud separated storage and compute, the Lakehouse separates data from query engines.

In this talk, we introduce the AI Lakehouse - extensions to the Lakehouse to include MLOps capabilities. These include a native query engine for Python powered by Arrow (giving 10-45X performance throughput improvement compared to existing Lakehouse provides), new support for real-time data. You can't build TikTok's real-time recommender system using today's Lakehouse - it needs real-time data to enable AI systems to use the freshest of features. And support for LLMs - the Lakehouse can be the data layer for instruction datasets, prompt logs, and more. Finally, the AI Lakehouse should support building and operating all types of AI systems - from batch to real-time to LLM AI systems. We will talk about the work we have done on Hopsworks to realize the vision of an AI Lakehouse.

Moderator

Session Speaker

Session Speaker

Session Speaker

Session Speaker

Session Speaker

Join our Slack channel to stay up to date on all the latest feature store news, including early notifications for conference updates.