Products

Use Cases

Customer Stories

Resources

Company

/

Product & Technology

Optimizing the Read Path in Hybrid Row-Column Storage Formats

Optimizing the Read Path in Hybrid Row-Column Storage Formats

Optimizing the Read Path in Hybrid Row-Column Storage Formats

beichen

Dec 27, 2024

Introduction

In the field of big data, hybrid row-columnar storage has become the predominant storage format. Compared to purely row-based or column-based storage, hybrid row-columnar storage strikes a balance by enhancing point query performance while leveraging column pruning to reduce I/O and data decoding overhead, thereby improving scan performance. At the same time, hybrid row-columnar storage enables highly efficient compression algorithms. By storing data from the same column together, it improves data locality and similarity, achieving higher compression ratios.

Relyt implements a Parquet-like hybrid row-columnar storage format and introduces extensive optimizations tailored to customer needs. This article will detail Relyt’s optimizations in the data read path, focusing on two key areas: data pruning and data encoding/decoding.

Data Pruning

The most direct approach to improve query performance is to reduce the amount of data scanned, and there are different optimization strategies at various stages of query execution. In scenarios with cluster key, Relyt supports incorporating cluster key related filters when obtaining file lists from meta service, filtering out unnecessary files based on file-level cluster key Min/Max statistics. For data that requires reading before filtering, Relyt pushes certain filters down as meta filters, performing coarse filtering at the block and page levels within the file reader; For scenarios like delayed materialization and those involving delete vectors, selectivity can be pushed down to the decoding pipeline. These data pruning strategies enable Relyt to minimize I/O and data decoding overhead, thereby improving scan performance. The following sections will focus on three key aspects: block pruning, page pruning, and selectivity pushdown.

2.1 Block Pruning

In a hybrid row-columnnar storage file, the file meta in the footer stores block-level column statistics, mainly including Min/Max, null count, etc., in the following format:

When opening a file, Relyt first retrieves the block-level column statistics from the file's footer and evaluates the pushed-down meta filter expressions to compute the block index hit by each column. Since the data for each column is row-aligned at the block level, determining the final block hits involves performing intersection, union, or difference operations on the block index results of each column according to the logical relationships of the filters. This greatly simplifies block pruning. For example, consider the following query:

Suppose a file contains N blocks (N > 4), where the condition id < 100 matches the 1st and 3rd blocks, and the condition dt between '2024-12-01' and '2024-12-04' matches the 2nd and 3rd blocks:

Since the two query conditions are in an AND relationship, only block 3 is ultimately matched. In this case, the entire file requires reading just a single block, significantly reducing I/O overhead. Additionally, Relyt has implemented a lightweight cache for the file footer's metadata, further minimizing the cost of opening files and enhancing query efficiency.

2.2 Page Pruning

Considering the impact on write performance and data compression, blocks are typically not configured to be very small (generally ranging from 64MB to 256MB). This limits the effectiveness of block pruning in fully reducing I/O and unnecessary data decoding. To address this, Relyt extends pruning beyond the block level by leveraging page-level statistical information, enabling more fine-grained data pruning.

In Relyt, a page is a further subdivision of a column chunk within a block based on size. The page-level statistical information is stored in a dedicated area between the file footer and the data region. In addition to Min/Max values, this information includes details such as the row number of the first record in the page relative to the block, the offset of the data within the file, and the length of the data. Unlike blocks, pages across different columns are not guaranteed to be row-aligned, which adds complexity to the pruning process. When filtering based on page-level statistics, the following steps are performed:

  1. Compute the pages hit by each column individually based on the filter conditions;

  2. Determine the common row set across multiple columns by applying the filter's logical relationships to the rows matched in each column;

  3. Use the resulting row set to identify the pages hit for each column;

  4. Finally, read the specific data within the hit pages based on the row set for each column.

For example, consider the following query:

When reading the block, the condition a = 'x' matches the 1st and 3rd pages, while the condition b = 'y' matches the 0th and 2nd pages. As shown in the diagram below, the row ranges matched by column a are [1000, 2200) and [3000,3990), while the row ranges matched by column b are [0,500) and [1500, 3000). Therefore, the range of rows matched by a = 'x' AND b = 'y' is [1500, 2200). Based on the page statistics of each column and the row range [1500, 2200), it is determined that this row range corresponds to the second half of the first page in column a and the first half of the second page in column b. Since pages in Relyt are the smallest units of decompression, both pages for column a and column b need to be fully loaded from the I/O layer. Then, within the loaded pages, the respective row range is used to "trim" the pages by discarding the non-relevant parts (i.e., "head-tail trimming"). Additionally, if there are gaps between the row sets of multiple matched ranges within a page, these gaps must be skipped. Finally, the data between columns is aligned by row and output accordingly.

During this process, Relyt also implements extensive optimizations for I/O operations. In addition to common techniques like I/O merging, Relyt uses a fixed-buffer-based circular sliding window for data loading. This approach minimizes the overhead associated with data copying during I/O operations, improving both efficiency and performance by reducing the number of memory copy operations required while reading data.

2.3 Selectivity Pushdown

Data deletion and updates also require filtering out invisible data based on the delete vector during the read process. Additionally, in case of late materialization, some columns may need to be loaded first for computation, and based on the computed results, data from other columns will be read.

In these scenarios, Relyt pushes down selectivity to the table scan operator. This allows Relyt to prune blocks based on the selectivity information and, depending on the hit rate, decide whether to convert the selectivity into a hit row set for further pruning at the page level. Within a single page, selectivity can also be applied directly to the data decoding pipeline. During the decoding process, Relyt can skip irrelevant data, thus significantly reducing the overhead associated with data decoding. This approach ensures more efficient data reads and minimizes unnecessary computation and I/O operations.

Data Encoding/Decoding

The next step in improving read data performance is to optimize data encoding/decoding and compression, and Relyt has made a lot of optimizations in this area as well. The following sections will focus on data decoding, dictionary read optimization, and adaptive encoding.

3.1 Decoding Optimization

Relyt leverages SIMD instructions to reduce decoding and conversion overhead. Besides, Relyt employs a more precise buffer size estimation algorithm in the decoding pipeline, minimizing the copying overhead caused by frequent buffer resizing. To further optimize performance in critical decoding paths, Relyt reduces branch predictions and applies techniques such as manual loop unrolling, effectively enhancing the efficiency of data decoding.

3.2 Dictionary Read Optimization

Dictionary encoding is a common data encoding technique that can effectively reduce storage space and improve query efficiency when used appropriately. In Relyt, during data reads, the system dynamically decides whether to load a column directly in its dictionary-encoded form based on the data type and dictionary-related metadata.

Industry Standard Approach: When a column uses dictionary encoding and the query also requires dictionary-based output, the decoding process often involves reconstructing the in-memory dictionary. Since dictionary encoding operates at the column chunk level, for columns that have dictionary encoding enabled, if data written to a column is deemed unsuitable for dictionary encoding during the write process, the previously encoded data is flushed to disk in dictionary-encoded form, while subsequent data is stored as plain text. This can result in a mixed format within a block, where the earlier pages are dictionary-encoded, and later pages are plain text. During reads, this inconsistency often necessitates re-encoding the unencoded pages back into dictionary format, which can significantly degrade performance.

Relyt has optimized the above scenario. When reading data, it will consider both dictionary-encoded pages and non-dictionary-encoded pages, comprehensively evaluate the cost of reading, and decide whether to read in dictionary form or non-dictionary form; Secondly, during data reads, Relyt leverages collected statistics to read data directly, wherever possible, avoiding the costly overhead of reconstructing the in-memory dictionary.

3.3 Adaptive Encoding

Choosing an appropriate data encoding and compression strategy requires a comprehensive evaluation of storage costs and query performance. Relyt achieves user transparency through adaptive encoding. When creating a new table, if the user does not specify the encoding method for a column, Relyt will default to enabling adaptive encoding. Once activated, Relyt selects a default encoding and compression method for each column based on its data type, and provides real-time feedback during the writing process to dynamically adjust the encoding and compression, optimizing local data storage. Furthermore, to minimize the impact of local data variations, adaptive encoding and compression are integrated with the capabilities of AutoTableService, enabling global encoding and compression optimization. As a result, data files written to disk gradually adopt the most efficient encoding and compression during the compaction process, reducing storage costs and enhancing query performance.

Taking dictionary encoding as an example, when adaptive encoding is not enabled, different pages within the same column chunk may use different encoding methods — some pages may employ dictionary encoding, while others use plain text storage. With adaptive encoding enabled, however, all pages within the same column chunk adopt a consistent encoding method. Across multiple column chunks of the same column, the encoding method may still vary depending on data characteristics. After the AutoTableService organizes the files, the encoding methods across different column chunks for the same column are aligned as much as possible. Furthermore, enabling adaptive encoding ensures a stable compression ratio for the written data.

Conclusion

To enhance query performance and storage efficiency, Relyt has implemented extensive optimizations in data pruning, encoding/decoding, and compression, delivering tangible value and superior experiences to customers. Moving forward, Relyt will continue to focus on these areas, striving to provide customers with exceptional query performance, cost efficiency, and an even better user experience.