Products

Use Cases

Customer Stories

Resources

Company

/

Product & Technology

How to Build Auto Table Service to Ensure High Performance Querying

How to Build Auto Table Service to Ensure High Performance Querying

How to Build Auto Table Service to Ensure High Performance Querying

qianzhen

Dec 23, 2024

Background

In database and lakehouse systems, different methods are applied to obtain a well-organized data layout, such as partitioning, bucketing, sorting, indexing, etc. to ensure optimal and stable query performance. However, in production environments, data streams into the system continuously at different times and in various ways, disrupting the balance of existing data layout and leading to query performance degradation. Therefore, the system needs data reorganization capabilities to continuously optimize data layout while data is constantly flowing in, providing consistent and predictable high performance.

In Relyt, the capability of data layout optimization is provided by the Table Service module. This module is responsible for data maintenance, handling tasks such as data lifecycle (i.e., time-to-live), garbage cleanup, and statistics collection. Table Service can be triggered manually by users or adaptively by the system background worker (AutoTableService).

This article will introduce the Relyt Table Service module from three aspects: overall architecture, data layout optimization strategies, and Conflict-Free transaction model.

Architecture

Relyt abstracts various goals in the Table Service as independent tasks and orchestrates them uniformly. For example, data layout optimization tasks are implemented by two independent tasks: PLAN Task and MERGE Task. The PLAN Task decides which part of the data files' data layout needs optimization based on data metadata and strategy configuration. The MERGE Task carries out the decision made by the PLAN task, i.e., rewriting data files and switching the data version. Besides the PLAN & MERGE Task, Relyt also implements several other Tasks, such as:

  • GC Task: Old version (meta) data recycling

  • STATS Task: Statistics collection

  • INDEX Task: Index building

Fine-grained task management not only helps with clear and concise scheduling logic but also supports flexible task distribution and execution. The following diagram illustrates the scheduling of Relyt Table Service, where the Table Service Scheduler on the Master is responsible for overall scheduling, distributing tasks to Hybrid DPS (HDPS) or Extreme DPS (EDPS) for execution based on task characteristics and overall cluster load:

  • HDPS (resident resources): Suitable for executing lightweight tasks, such as PLAN/GC Tasks, etc.

  • EDPS (elastic resources): Suitable for executing heavy IO/heavy CPU tasks, such as MERGE/INDEX Task, etc.

EDPS resources are elastically initiated, isolating the execution of Table Service Tasks without affecting users' online queries. After execution, they are autonomously reclaimed in seconds, achieving on-demand support for Table Service execution.

Data Layout Optimization Strategies

The PLAN Task aims to optimize query performance, generating MERGE Tasks according to several predefined strategies and the current state of data layout.

Small File Strategy

Small files directly affect query performance because of open file overhead and metadata blowup. This strategy checks the current number of small files in the table and generates a MERGE Task to merge them when the threshold is reached.

Cluster Key Strategy

The orderliness of data directly affects query pruning efficiency, thereby affecting query performance. Relyt stores the min/max of the table's Cluster Key in metadata and uses it for query pruning. Based on this metadata, the PLAN Task can calculate the overlap depth on the Cluster Key of the table to determine the overall orderliness of the data. As shown in the figure below, the max overlap depth on the id column of data files is 3.

The Cluster Key strategy generates MERGE Tasks for overlapped files (such as File 0, File 1, and File 2 in the figure above) where the overlap depth exceeds the threshold, reducing the overlap and improving the overall orderliness of the data, thereby obtaining a better filter rate in query.

Deleted Tuple Strategy

In Relyt, deleted rows are not immediately removed from data files. Instead, they are represented in a marker (called visimap) and filtered during reading. When there are too many deleted rows in a data file, it affects query performance. The Deleted Tuple strategy monitors the proportion of deleted rows, and if a threshold is reached, it generates a MERGE Task to rewrite the file, physically erasing the deleted data.

Other Strategies and Constraints

Task IO Limitation

Constrain the total IO amount of data file rewriting by a MERGE Task, controlling the maximum execution time of a single task. Gradually optimize the data layout of the table to its best state with a "small steps" approach.

Time Factor

Add a time factor to thresholds used in various strategies. For example, in the Deleted Tuple strategy, use stricter thresholds (e.g., 5%) for older data files (e.g., data files generated 7 days ago) to ensure that all deleted rows in the table can be cleaned over time.

Scheduling Weight

Every MERGE Task has a score and is used for global task scheduling weight, ensuring that more valuable MERGE Tasks are prioritized for execution. The score is based on a theoretical query time improvement by executing the MERGE task.

Strategy Combination

In the implementation, the PLAN Task supports the combination of various strategies and new strategies can be customized and developed according to business scenarios and installed as plugins.

Conflict-Free Transaction Model

Relyt supports ACID transaction semantics, ensuring data consistency, integrity, and reliability in large-scale concurrent scenarios, correctly handling concurrent writes, deletions, etc. As a background task, Table Service logically does not modify user data, only performs data reorganization, cleanup, etc. However, when user SQL and background Table Service Task are concurrent, the following scenarios may occur:

  • File 1 and File 2 are being rewritten into a new file Target File by the MERGE Task

  • A user-initiated SQL concurrently deletes some rows in File 1 and File 2

If this concurrent scenario is not handled correctly, it will result in the loss of the user's delete: after the MERGE Task is completed, the new and old version data switch (Target File replacesFile 1&File 2), but the data that should be deleted (red rows) is still in the data file.

For this scenario, common solutions include:

  1. Abort one of the transactions, for example, the later submitted transaction checks the modification set and aborts if a conflict is found;

  2. Table Service Task detects concurrent user SQL and tries to wait for the user SQL to finish before committing.

However, these two solutions have some drawbacks: the former affects user experience, and the latter leads to task starvation.

Relyt designed a compensation mechanism for this scenario, achieving Conflict-Free between TableService and user transactions, allowing them to run completely parallel without conflict.

The core logic of compensation is to Replay Delete for the later submitted transaction, applying the deleted rows to the latest version of the data file. As shown in the figure below, in Case 1, when a delete transaction is doing commit, the rows deleted by this transaction are found and deleted in the Target File generated by the MERGE Task. In Case 2, when the MERGE Task is committing, the deletions of concurrent transactions on the input files (File 1 & File 2) during the execution of this transaction are applied to the Target File.

Intuitively, Reply Delete is a heavy IO operation which may block transaction commit. However, relying on Relyt's metadata design and mark-based deletion mechanism, Reply Delete is a relatively lightweight metadata operation in implementation. Meanwhile, the IO size of a MERGE Task is considered during generation, so the probability of conflict with concurrent transactions and the number of conflict rows are limited. In our production environment, the compensation mechanism can achieve complete transparency of Table Service to users.

Conclusion

In modern database & lakehouse systems, data layout and data reorganization are core elements to ensure efficient data management and query performance. The deepening of Table Service capabilities is also one of the focus points of various products in the market, such as Snowflake's micro-partition, DeltaLake's Liquid Clustering, etc.

Relyt's Table Service module dynamically adjusts data layout progressively through fine-grained task division and optimization strategies, ensuring the system always maintains efficient query performance. In addition, the compensation mechanism effectively resolves conflicts between Table Service and user transactions, achieving imperceptible background data maintenance. This design not only enhances system stability and performance but also provides users with a smoother experience.