报告题目:CompressDB:Enabling Efficient Compressed Data Direct Processing for Various Databases
论文出处:SIGMOD 2022
作者:Feng Zhang⋄,Weitao Wan⋄, Chenyang Zhang⋄, Jidong Zhai★, Yunpeng Chai⋄, Haixiang Li+, Xiaoyong Du⋄
单位:⋄Key Laboratory of Data Engineering and Knowledge Engineering (MOE), and School of Information, Renmin University of China
★Department of Computer Science and Technology, Tsinghua University +Tencent Inc, China
报告人:邹晓锋
报告时间:2022年10月10日 下午 1:00
报告地点:贵州大学北校区博学楼624室
报告内容摘要:In modern data management systems, directly performing operations on compressed data has been proven to be a big success facing big data problems. These systems have demonstrated significant compression benefits and performance improvement for data analytics applications. However, current systems only focus on data queries, while a complete big data system must support both data query and data manipulation.We develop a new storage engine, called CompressDB, which can support data processing for databases without decompression.CompressDB has the following advantages. First, CompressDB utilizes context-free grammar to compress data, and supports both data query and data manipulation. Second, for adaptability, we integrate CompressDB to file systems so that a wide range of databases can directly use CompressDB without any change. Third, we enableoperation pushdown to storage so that we can perform data query and manipulation in storage systems without bringing large data to memory for high efficiency.We validate the efficacy of CompressDB supporting various kinds of database systems, including SQLite, LevelDB, MongoDB, and ClickHouse. We evaluate our method using six real-world datasets with various lengths, structures, and content in both single node and cluster environments. Experiments show that CompressDB achieves 40% throughput improvement and 44% latency reduction, along with 1.81 compression ratio on average.