Shepherd: High-Scale, Low-Latency Machine Learning with Flink at Stripe
Stripe explores Shepherd's architecture: Flink, tiled data storage, and an automated control plane for faster feature development.
Stripe explores Shepherd's architecture: Flink, tiled data storage, and an automated control plane for faster feature development.
Stripe, a leading financial infrastructure platform, extensively utilizes ML in its products for use cases such as fraud prevention and Radar. To support ML engineering, we developed Shepherd, a low-latency, real-time ML feature computation and serving platform built on the open-source Chronon project. Shepherd processes 10,000s of events per second in <150ms at p99, serves feature values in <30ms (p95), and has an uptime of >99.9%. This talk will delve into Shepherd's online architecture, including its usage of Flink, its tiled data storage strategy, and the control plane built to automate infrastructure provisioning and accelerate feature development.