当前位置: 首页  2014贵州省先进计算与医疗信息服务工程实验室  通知公告
20221121论文报告-Photon: A Fast Query Engine for Lakehouse Systems

报告题目:Photon:A Fast Query Engine for Lakehouse Systems
论文出处:SIGMOD 2022

作者:Alexander Behm, Shoumik Palkar, Utkarsh Agarwal, Timothy Armstrong, David Cashman, Ankur Dave, Todd Greenstein, Shant Hovsepian, Ryan Johnson, Arvind Sai Krishnan, Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman van Hovell, Maryann Xue, Reynold Xin, Matei Zaharia

单位:Databricks
报告人:彭顺华
报告时间:20221121日 下午 13:00

报告地点:贵州大学北校区博学楼624

报告内容摘要:Many organizations are shifting to a data management paradigm called the Lakehouse, which implements the functionality of structured data warehouses on top of unstructured data lakes. This presents new challenges for query execution engines. The engine needs to provide good performance on the raw uncurated datasets that are ubiquitous in data lakes, and excellent performance on structured data stored in popular columnar file formats like Apache Parquet. Toward these goals, we present Photon, a vectorized query engine for Lakehouse environments that we developed at Databricks. Photon can outperform existing warehouses on SQL workloads and also supports the Apache Spark API. We discuss the design choices we made in Photon (e.g., vectorization vs. code generation) and describe its integration with our existing SQL and Apache Spark runtimes, its task model, and its memory manager. Photon has accelerated some customer workloads by over 10x and has recently allowed Databricks to set a new audited performance record for the official 100TB TPC-DS benchmark.


【关闭本页】 【返回顶部】