/

Strategy & Insights

Building AI's Memory: A Conversation with DataCloud's Founder

Building AI's Memory: A Conversation with DataCloud's Founder

Building AI's Memory: A Conversation with DataCloud's Founder

MemoryLake

Mar 2, 2026


"Memory will take center stage in the AI era, and multimodal memory platforms will become the new enterprise AI infrastructure."

The first time I met Ethan, Founder and CEO of DataCloud, he was habitually using his hands to gesture, as if gathering and connecting invisible fragments in the air. "Human memory is fragmented," he said. "Fragmented pieces of knowledge are like countless streams flowing into a lake." This vividly explains the origin of DataCloud's core product, "MemoryLake," and also points to the most critical challenges and opportunities in today's enterprise AI evolution.

Today, enterprise AI stands at a watershed: on one side are "smart toys" that can only handle conversations; on the other lies the promise of a "business partner" that is capable of continuous understanding, learning, and delivering actionable decisions. The essential difference between the two does not depend entirely on the size of the model itself, but on whether AI possesses the ability to understand, connect, and reason with complex real-world information—namely, multimodal memory.

This is not a simple addition of features, but a cognitive paradigm revolution: it requires AI to move from merely handling conversational scenarios to understanding the continuous "decision trajectory" interwoven with text, tables, audio, video, and workflows in enterprise production environments. Ethan and his team are committed to becoming key builders of this quiet revolution through MemoryLake.

01 Fragmented Memories, Converging into a Lake (MemoryLake)

Q1: Why did you want to work on memory?

E: It's related to the development trend of the AI industry. We can divide this round of AI development into three stages:

The first stage was before 2024. The prevailing view was that AI could unlock value and connect to enterprises, leading many to build vector databases and knowledge bases. The goal was to solve the first threshold of entering enterprises and crossing into production value. Since many applications—like Q&A—didn't dramatically improve production efficiency, the first step was to build a bridge between large models and data through vector databases. Although that bridge was still far from production, it did solve the problem of the first stage: connecting AI with data.

During this period, we found huge room for development. On one hand, data is not just a vector representation; on the other, human knowledge is divided into tacit and explicit forms. Take a news article: if you're a media professional, you can judge its newsworthiness far more easily than an ordinary person—that's your tacit knowledge. But AI doesn't know this, making effective deployment difficult.

The second stage started around 2024, driven by two factors: first, the drop in model costs and improvement in performance; second, the emergence of demonstration applications, like general agent applications. So from 2024 to 2025, the focus was more on solving the problem of demonstration apps—the first layer beyond simple chat tools. The biggest issue with these demonstration apps isn't that they're not useful, but that they haven't fully integrated into enterprise workflows; they can't be properly evaluated, guaranteed, or held accountable.

The third stage began in the second half of 2025. If the previous two years were about exploring production efficiency, we're now moving toward an enterprise productivity platform, with high demands for trustworthiness, reliability, and complexity. Enterprises are starting to judge AI by the standards of a "production system," not by "demo effect." In the past, everyone talked about improvements, but it was always about "time savings." Compressing time can speed up the trial-and-error process, but in highly constrained scenarios like chip manufacturing or risk control, the bottleneck often isn't time, but physical and risk boundaries. This also relates to several issues that need to be solved when entering enterprises this year: first, truly centering on value; then later, efficiency, accuracy, complexity, and so on.

How do we improve these things? Especially, how do we make a lot of tacit knowledge explicit? For example, in venture capital: faced with the same three financial statements of a portfolio company, why can an ordinary person see little, while a VC can uncover many "insights"? Because those "insights" are logic they have internalized. In other words, internalized tacit knowledge is the most valuable. This is different from the "digital human" concept we often hear about now, which merely digitizes a person's appearance but doesn't achieve internalization—it doesn't digitize the tacit aspects. How do we make these tacit things "manifest"? That requires memory.

Q2: At this stage, for enterprises to leap from "exploring production efficiency" to a "productivity platform," what do you think is the most critical breakthrough needed?

E: The answer lies in multimodal memory. Multimodal memory will become an enterprise necessity, because decision trajectories are inherently multimodal. In a corporate procurement decision, clues might come from a PDF report (text), key arguments in a meeting recording (audio), a historical price trend chart (visual), and annotations in an approval workflow (structured data). Traditional "conversation-level" memory is merely an isolated slice of this continuous, interwoven trajectory, losing most of the context and the chain of reasoning. The goal of a multimodal memory platform is to completely replicate this "decision trajectory," allowing AI to reason based on a comprehensive corpus of memory.

Building multimodal memory presents high technical thresholds. It requires a full-stack memory engineering technology stack and an independent multimodal data large model to handle:

  • Multimodal Representation and Alignment: Mapping information from different modalities—text, images, tables—into a unified semantic space and establishing cross-modal associations (e.g., aligning the text "sales surged" in a report with the peak of a line chart in a PPT).

  • Deep Understanding and Structured Extraction: Using specialized models (like MemoryLake-D1) to extract logical relationships and structured knowledge from complex documents and charts, rather than simply transcribing words.

  • Memory State Management: Handling logical conflicts, updates, enhancement, reflection, and synthesis of memories—a dynamic, ongoing process.

This also explains why general-purpose large model vendors or traditional data platforms struggle: the former lack deep structured understanding and system-level memory management capabilities; the latter lack top-level multimodal cognitive and reasoning abilities. From this perspective, multimodal memory is not a feature upgrade, but a revolution in the AI paradigm.

Q3: Does this mean that the success of a multimodal memory platform lies in establishing a system for data understanding, representation, storage, management, and computation that is different from traditional text processing?

E: Yes, that's the core. We trained MemoryLake-D1 not to build a better OCR or speech-to-text tool—that's feature optimization. Our goal is to establish a unified "multimodal memory framework" where the logic of tables, the semantics of images, and the emotion in speech can all be structurally understood and associated, becoming reasoning-capable memory units. This indeed requires comprehensive innovation from the underlying models to a memory-centric storage and computation architecture.

Q4: Why did you choose the name MemoryLake?

E: Essentially because human memory is also fragmented, multi-sourced, and multi-type. For example, when I see you today, there might be multiple angles: first, your high visibility in the industry; second, you came to our company; third, you're from the media; fourth, we have a conversation; fifth, our faces, the audio during the conversation, etc. In short, it's fragmented knowledge, like countless streams flowing into a lake. It's a dynamic, flowing collection. Based on the user's intent, we can dynamically "fish out" the relevant pieces of memory, or when you need it, we build it for you in real-time according to your intent and context window size.

A special note: although everyone talks about short-term, medium-term, and long-term memory, requiring static compression, forgetting, and so on, the main reason is the limited storage capacity and computational power of the human brain. The real world shouldn't be statically pre-compressed. Instead, we should adopt new distributed multimodal storage and computation capabilities, storing and organizing as much as possible, and then dynamically building dedicated, refined, and complete memories on-demand based on real problems.

Q5: What is the product form of MemoryLake? How do you view this product form?

E: MemoryLake has multiple forms. One of the most common is as an API and compatibility with existing specifications (like mem0, MCP, OpenMemory). This allows users to directly use the large models and agents they're familiar with, easily connect to us, and have multimodal memory by default, connected to massive data.

In the global market, the vast majority of MemoryLake scenarios involve being integrated, such as with ChatGPT and Claude. Our memory platform can transform any data into a memory format supported by any large model or agent. So whether MemoryLake is a plugin or some other form doesn't really matter. MemoryLake will serve as a persistent memory layer, not locked into any single model or tool.

Q6: Specifically regarding MemoryLake-D1, what problems does it mainly solve? What is the invocation cost like?

E: MemoryLake-D1 mainly solves the problem of data understanding—how to better understand multimodal content like Excel, PDF, and other formats, because personalized business spreadsheets are very complex (Excel is arguably the best and most complex software). To solve this, we invested significant resources in data annotation and synthesis, combined with user feedback, to train our own multimodal data understanding model, MemoryLake-D1.

As for the invocation cost of MemoryLake-D1, it's considerably lower than calling OCR models and multimodal vision models yourself. However, there's a trade-off: do you prioritize speed, flexibility, or accuracy? Different choices lead to different outcomes. For "extremely fast" scenarios, we can use a pre-static compiled Skills model to generate code for continuous reuse, achieving high performance, low flexibility, and low-cost parsing.

Q7: What are the future update directions for MemoryLake? What are the difficulties?

E: Currently, MemoryLake-D1 mainly handles text, tables, images, documents, databases, and audio. The next focus is enhancing video and audio.

Compared to images, audio and video are more challenging because they involve speech rate and emotion, making processing more complex. For example, if a gamer is very angry, but when you transcribe their angry speech to text, you easily lose the speech rate and emotion, potentially altering the player's original intent.

This is actually a problem AI faces today: losing a lot of important information during the conversion process. Because much information is tacit, but models and data understanding capabilities are limited.

02 Internalizing Tacit Knowledge, Building Decision Intelligence

Q8: You've repeatedly mentioned "tacit knowledge." How can it be internalized within enterprises?

E: I believe any enterprise looking to implement AI must first capture and formalize decision trajectories from its employees' key workflows—trajectorizing multimodal elements like speech, video, text, documents, approvals, and so on. Only then can there be efficiency improvements and breakthroughs.

Q9: Specifically, how does DataCloud do this?

E: First, it's important to clarify that perfecting decision trajectories doesn't happen overnight; it gets stronger and more complete over time. Moreover, from our past practice, we believed from day one that the ultimate form of intelligence must be action intelligence and decision intelligence. Only with decision intelligence can you have action intelligence. So we started working on decision agents from day one, beginning R&D in 2024. The core philosophy was "Every chat is software"—meaning every interaction could generate executable, composable code. The architecture was based on using general large models to generate thought trees, then iteratively generating local code through self-evolution. Only this way can decisions be made explainable, intervenable, trustworthy, reliable, and executable.

For example, we built a related decision agent (https://powerdrill.ai/), which involves a very complex decision system. But essentially, if the decision itself is strongly correlated with rapid validation, it's relatively easier. As for much tacit information, it's actually "hidden."

Q10: Does this mean the decision intelligence/AI personalized decision market has become a red ocean? And what development challenges does it currently face?

E: Not really. The AI personalized decision market is still very large. Is decision intelligence difficult to build? Yes, it is. But the difficulty often stems from the inability to validate or incentivize, or because the validation cycle is too long.

How do we make these things "manifest"? It requires the deep integration of memory and the deep thinking capabilities of large models. Building vast static memories at the foundation—like entity extraction, knowledge "skillification"—and then dynamically layering them when needed—this is actually the first type of product we built ourselves.

As for the second type of product, those are some office scenarios and gaming scenarios we later served.

Q11: Gaming scenarios? How should we understand that?

E: I've always believed that games are essentially a projection of real society, or even an evolutionary, richer social experiment field.

Games used to be static—once you logged off, the world stopped and waited for you to return. But it's different now. In many AI games, after you log off, the world doesn't pause; it continues to operate at a speed close to, or even many times faster than, the real world. The characters in the game continue to live, make choices, and change. In a sense, it's no longer just a "system for people to play," but a continuously running virtual society, using a higher time density to map and amplify the operating logic of the real world.

Furthermore, besides being naturally close to users and close to value assessment, games have another advantage: high tolerance. In games, local errors in memory or AI understanding don't cause severe consequences. But AI tolerance in enterprise scenarios is extremely low.

Q12: Can you elaborate on the low tolerance for AI on the enterprise side?

E: Regarding tolerance, in many real-world scenarios, it's far more difficult than people imagine, because once certain errors occur, the consequences are irreversible. For example, in e-commerce and customer service, anything involving large-scale financial losses like returns and refunds is very complex. Another example is insurance: premium rates vary for different people and different symptoms.

Q13: What is the biggest impact of low enterprise tolerance on AI development?

E: I think the biggest impact of "low tolerance" on AI development isn't simply "dare not use it," but that enterprises cannot accept a system whose behavior is inexplicable, whose results are untraceable, and where problems recur without the ability to optimize.

In recent years, the reason AI adoption in enterprises has seen "much talk but little action" isn't that the models aren't smart enough. It's that many systems make every judgment as if it's their first decision—they don't remember why they made a judgment before, and they can't fully reproduce the basis for the decision.

Once a problem occurs, besides efficiency loss, enterprises fear three things: Why did it go wrong? Where did it go wrong? Can it be avoided in the future? If these questions can't be answered, no matter how smart the system, enterprises won't dare put it into their real production and decision-making chains.

From this perspective, low enterprise tolerance for AI is essentially forcing AI to evolve from "being able to answer" to a system that has memory, context, can explain its own behavior, and can solve problems. This is why I believe memory isn't just icing on the cake; it's a prerequisite for AI to truly enter enterprises.

Q14: What is DataCloud's current user composition like?

E: Mainly divided into three categories: first, office-oriented; second, finance-oriented; third, emerging industries like AI gaming and embodied intelligence.

In the consumer market, MemoryLake serves over 1.5 million professional data users globally. In industry practice, MemoryLake serves globally significant enterprises, including a hyperscale document office platform (with over 10 trillion records and 100 million documents in its production system), leading enterprise mobile office software providers, and large model companies. In competition with global cloud giants and prominent AI vendors, MemoryLake demonstrates multiple-fold advantages in performance metrics like cost, accuracy, recall, and latency. For instance, in a rigorous office scenario end-to-end evaluation, it achieved 99.8% accuracy.

03 Generalization May Defeat Vertical Specialization

Q15: Looking at domestic and international markets and platforms, what aspects of demonstration applications are you currently paying attention to?

E: Two categories: general and vertical. General applications are still mostly at the chat level. If we categorize by business depth, ChatGPT and Claude might be at the first layer—insufficient understanding of many enterprise businesses and data, execution not yet reliable. Agent companies might be at the second layer. A large number of vertical platforms might be at the third layer. And deeper customization with long delivery cycles, like Palantir, might be at the fourth layer.

Although these platforms are all doing demonstration applications of a certain type or depth, there's also a process of gradual cannibalization at play: as general large models continuously strengthen, over time, they may cannibalize more and more of the vertical depth. After a certain point, today's so-called FDE+ platforms, the emerging BPO business models, and so on, might not necessarily exist.

Q16: Can you elaborate on the relationship between generalization and vertical specialization?

E: I think generalization will likely defeat vertical specialization.

What many companies today—especially many vertical startups—call "vertical" lacks barriers (except those with data or data models). It's just that many companies, at different stages of adopting AI—from adaptation, integration, to value upgrade—need some roles, like the current so-called verticals, to assist in completing phased tasks. So during this period, people feel verticals have value and can improve their efficiency because everyone's starting point is relatively low. Once everyone develops further, the value of verticals won't be particularly obvious.

Q17: What is the basis for your conclusion that "generalization will defeat vertical specialization"?

E: When serving many overseas clients, we clearly feel their dependence on ChatGPT and Claude far exceeds that on vertical solutions. This is mainly because general large models evolve quickly, and their ecosystems are very powerful. Current tools are all adapting upwards. During this adaptation process, their capabilities also get stronger. You'll notice that after each new release from ChatGPT or Claude, some shallow verticals are easily eliminated.

For example, Claude recently launched Interactive Tools. This is a landmark event that might overturn the future of software development. Because it predicts that any software in the future can be headless, not needing an interface. Moreover, on January 26th, they defined and released a specification—MCP Apps—an integrated UI around LLMs and cross-application interaction standards. This truly represents the revolutionary missing link for SaaS.

LLMs handle thinking, Agent Skills handle injecting domain knowledge, MemoryLake handles connecting and organizing multimodal data, and MCP handles communication/invocation/local UI generation (MCP Apps). This new generation of application paradigms will realize a shift in the software industry. When apps integrate into the MCP Apps ecosystem, the ones most harmed will be verticals. Before this, small verticals might say they do better than big companies; after Interactive Tools, current verticals might all face significant disruption.

Q18: You judge that "generalization may defeat vertical specialization," and that memory has a "gravitational effect." Can this be understood as the multimodal memory platform becoming an infrastructure paradigm for the AI era, much like data platforms did in the cloud era?

E: Yes. Memory will take center stage in the AI era. What memory platforms solve isn't just "remembering," but the paradigm issues of "how to deeply understand," "how to deeply organize," and "how to dynamically construct based on a query." When the capabilities of general large models are deeply integrated with multimodal memory platforms through specifications like MCP/Agent Skills/OpenMemory, they gain continuously evolving, trustworthy "experience" and "knowledge." This changes the paradigm of software construction. We firmly believe that companies that define and implement this new "memory-driven intelligence" paradigm will have the opportunity to become the cornerstone enterprises of the AI era.

Q19: If an opportunity like Manus came along in the future, would DataCloud consider selling?

E: We are not for sale. Although many companies want to acquire us now, we believe that memory has huge development potential in the future and is one of the core technological infrastructures of the AI era. Because memory has a gravitational effect—the more you use it, the better it gets, the more valuable it becomes. Models can be switched on demand, but memory is a core asset that enterprises need to continuously build. Coupled with our advantages in platform capabilities, memory capabilities, and best practices, we have the opportunity to build a company like Databricks or Snowflake.

Q20: While maintaining independence, in which directions will DataCloud focus its efforts?

E: In terms of core technology: We will continue to build multimodal capabilities, such as supporting images, video, audio, and more data sources; enhance the accuracy of the MemoryLake-D1 multimodal data model; strengthen distributed memory computation capabilities; and improve product end-to-end precision, explainability, intervenability, and security.

In terms of market expansion: We will focus on exploring highly promising market areas such as gaming, office, embodied intelligence, and finance.

In terms of technical research: We will delve deeper into distributed memory computation capabilities (memory scale will continue to accelerate) and the construction of an end-to-end memory evaluation system.