We present StreamBed, a capacity planning system for stream processing. StreamBed predicts, ahead of any production deployment, the resources that a query will require to process an incoming data rate sustainably, and the appropriate configuration of these resources. For this purpose, StreamBed builds a capacity planning model by piloting a series of runs of the target query in a small-scale, controlled testbed. We implement StreamBed for Apache Flink. Our evaluation with large-scale queries of the Nexmark benchmark demonstrates that StreamBed can accurately predict capacity requirements for jobs spanning more than 1,000 cores using a model built with a 48-core testbed.
Rosinosky, G., Schmitz, D., & Riviere, E. (2024). StreamBed: Capacity Planning for Stream Processing. Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems. Published. 18th ACM International Conference on Distributed and Event-based Systems. https://hdl.handle.net/2078.5/256340