Abstract: Machine learning (ML) systems need to be optimized in an end-to-end system manner, not just for ML algorithms and models. Therefore, recently, the role of software systems and underlying distributed computing platforms and their intersections with ML has been discussed intensively in systems and ML communities. Various abilities of ML systems, such as robustness, reliability and responsiveness, rely on the capabilities of underlying computing and data platforms, and optimizing ML systems requires a strong integration with software and systems engineering techniques. In our work, we are interested in addressing challenging runtime issues in the robustness, reliability, resilience and elasticity (R3E) of end-to-end ML systems. In this talk, we will discuss the view of quality of analytics (QoA) and the principles of elasticity engineering for big data and cloud computing that can be applied to ML systems. We will present initial results of applying QoA for ML pipelines and discuss a long-term view research activities to support QoA for runtime optimization of ML systems.
Speaker: Hong-Linh Truong
Affiliation: Associate professor, Department of Computer Science, Aalto University
Place of Seminar: Lecture Hall T6, Aalto University