Products

Use Cases

Customer Stories

Resources

Company

/

Partner & Customer Value

AWS Partner | DataCloud Invited to Share Serverless AI-ready Data Cloud

AWS Partner | DataCloud Invited to Share Serverless AI-ready Data Cloud

AWS Partner | DataCloud Invited to Share Serverless AI-ready Data Cloud

ningyu

Jan 10, 2025

Recently, as an Amazon Web Services (AWS) technology partner, DataCloud was invited to participate in the AWS data partner event. Ning Yu, the product manager of DataCloud, delivered a keynote speech titled "Serverless AI-ready Data Cloud." Here is a share of some of the content.

Evolution of Data Platforms

Looking at the evolution of data platforms, in the 1990s, BI was analyzed by separate databases. With the increase of data and the convergence of multiple data sources, data warehouse products emerged, with typical vendors including Oracle/Teradata. With the development of big data, the Hadoop system appeared, processing ETL, stream data, and batch data. As technology further evolved, the concept of lake-warehouse integration emerged. With the advent of LLMs, enterprise-level customers began to conduct real-time and integrated data analysis, which involves the fusion analysis of structured and unstructured data.

From the perspective of the technical evolution of data engines, the earliest SMP to MPP architectures were deployed in IDCs/their own data centers, with typical products still coming from Oracle, Teradata, and Greenplum. After the birth of the Hadoop system, products such as Hive and Spark, which are based on the BSP computing model, became popular. With the emergence of the cloud, these products were first cloudified, which means transferring these products from IDCs to the cloud. With the development of cloud technologies such as Kubernetes and OSS, cloud products gradually transitioned from cloud-hosted to cloud-native. With the appearance of LLMs, AI and data technologies have been continuously integrated, leading to the emergence of brand-new AI-ready data clouds and AI-native data analysis products.

Throughout the evolution of data platforms and data engine technologies, DataCloud has nurtured and incubated its core product: Relyt AI-ready Data Cloud.

Relyt AI-ready Data Cloud: A Cloud Infrastructure Centered on Data

Relyt AI-ready Data Cloud, built on top of public clouds (including global mainstream public cloud services such as AWS), has constructed a data service layer that integrates structured and unstructured data (tables/text/graphs/vectors/files/data lakes). Above the storage, we have built a stateless computing service DPS. The top layer is the AI data analysis service realized through large model capabilities. From bottom to top, we have built a complete AI-ready Data Cloud product system. Relyt AI-ready Data Cloud provides standard interface services for SQL and Python. "Now you can purchase the Relyt AI-ready Data Cloud, which has been verified by millions of AI data analysis users, through the AWS Marketplace: <https://aws.amazon.com/marketplace/pp/prodview-sj3gjqpgqdqq4>"

10X TCO Savings

In the traditional architecture of computing, metadata, and data coupling, there are issues such as low resource utilization and insufficient resource scalability. Relyt, based on a decoupled architecture of metadata, data, cache, and computing, has implemented a layered architecture. Above the storage layer, it separates operators such as Scan, Projection, Filter, and Join. After optimizing these operators and combining them with resources, it forms a stateless Serverless computing service DPS. This brings the following benefits: First, the stateless computing ensures good scalability, and scheduling a new DPS/computing resource only takes a few tens of milliseconds. Second, the computing density is improved. By atomizing the operators, we can fully utilize the characteristics of cloud ECS to enhance the cost-effectiveness of heterogeneous resources, including ARM/x86. This achieves a 10X TCO cost optimization.

Zero Maintenance, 99.9% Query Success Rate

In mixed scenarios of high-concurrency real-time queries (such as reports) and
High-throughput ETL queries, high-throughput tasks in the MPP architecture will occupy all computing resources, causing real-time query tasks to fail to get a response. The Relyt component AQS (Adaptive Query Scaling) will automatically identify high-throughput tasks that put heavy pressure on the system load and schedule them to an elastic resource pool, allowing customers to pay by usage. The BSP model is applied to ensure the query success rate. There are two advantages of this architecture: First, we can implement mixed workloads on the same platform, including high-concurrency queries and high-throughput queries. Second, it ensures the high reliability and availability of the system. Through AQS, automatic routing of queries is achieved. This capability acts as a fallback plan when resources are not fully prepared or when temporary traffic arrives, greatly reducing the customer's cost and risk, and ensuring that customers focus on business development rather than the maintenance of underlying resources.

Operator Optimization

Relyt has decomposed operators layer by layer and has also optimized the collaboration between software and hardware. In the TPC-H benchmark test, we summarized all queries and classified them according to Filter/Projection, Join, and Aggregation. We compared the performance of products Trino, Spark, and Clickhouse. In the Filter/Projection scenario, Relyt's performance is 7.6 times that of Spark. In the Join scenario, its performance is 5 times that of Trino. In the Join scenario, it also outperforms other products. In summary, in all the above operator scenarios, Relyt has a 100% query success rate.

Vector Query Performance

On 10 million/512-dimensional face data, we conducted a query test with 8 cores/32GB RAM/32 concurrency, requiring a query accuracy of 99%. It can be seen that Relyt's QPS is between 12,000 and 14,000, which is 1.8X to 5X better in performance compared to other products.

Real-time Analytics of PB-level Data

Today, customers' systems include more and more real-time businesses such as BI, search, recommendation, risk control, and operation, and the demand for real-time data analysis is growing. The real-time nature of data includes two aspects: one is the real-time writing of data, and the other is the real-time querying of data. In terms of real-time writing, Relyt provides ACID capabilities, supports high-throughput capabilities of up to one million per second, and provides two interface methods: ODBC/JDBC and OpenAPI. It also supports high-concurrency KV point queries, with a maximum concurrency of over 1,000 queries.

End-to-end Security, Privacy, and Compliance Assurance

Relyt, based on public clouds, provides data governance, federated analysis, and other capabilities. It supports a multi-cloud, multi-region security and compliance system that complies with policies of multiple countries. It offers end-to-end security assurance, including database security, data encryption and privacy protection, data anti-leakage/anti-loss protection, user login, and database link authentication. It has passed the international standardization organization's information security standard ISO27001, information technology service management standard ISO20000, and the American Institute of Certified Public Accountants' data security control standard AICPA SOC2 certification. It meets the compliance requirements of the EU General Data Protection Regulation (GDPR) and the Singapore Personal Data Protection Act (PDPA).

20+ Ecosystem Connections

Relyt, based on the PostgreSQL protocol, is compatible with more than 20 ecosystem products such as Data Ops, Data Pipeline, BI & data visualization, and supports smooth migration from Redshift/Greenplum.

The Data+AI integrated architecture of Relyt AI-ready Data Cloud fundamentally solves the problems of scale, real-time performance, accuracy, and cost in enterprise private data analysis scenarios. On this basis, enterprise customers can quickly build personalized self-service data analysis applications.