zeming
Jan 10, 2025
Introduction
With the development of large models and generative AI, AI Agents have become one of the hottest topics in the AI field by the end of 2024. LangChain released the survey "State of AI Agents in 2024" (https://www.langchain.com/stateofaiagents). The report shows that the application of Agent technology in actual business scenarios is gradually accelerating. Data scientist and AI-ready data cloud evangelist Zeming in DataCloud shared a series of topics titled "Agentic AI," covering the development trajectory of Agentic AI, its technical architecture, the support provided by the current open-source community, and specific commercial applications. This article is the first in the Agentic AI series.
Basic Characteristics of AI Agents
The concept of AI Agents is not new; it is essentially a framework for exploring how to compute and simulate human intellectual activities. AI Agents have the following basic characteristics:
1. Autonomous action: Once trained and deployed, AI Agents can make decisions and take actions on their own without further human intervention.
2. Environmental perception: AI Agents can (actively or passively) acquire various contextual information from their execution environment. They can even accept and process environmental information in multiple modalities.
3. Tool usage: In the process of completing tasks, AI Agents can interact with various systems (including other Agents) through their tool usage capabilities.
With the development of AI technology, the capabilities of Agents have also been enhanced:
4. Multi-party coordination: AI Agents can autonomously cooperate with other intelligent systems or Agents to generate global plans to complete various tasks.
5. Autonomous learning: AI Agents can obtain the necessary data from past task experiences and feedback to improve themselves, thereby further enhancing task performance.
In the history of artificial intelligence development, various technologies have emerged to support AI Agents, such as expert systems in the 1970s and 1980s, which simulated the decision-making behavior of human experts in specific fields through rule-based reasoning. After the rise of statistical machine learning, AI Agents could automatically learn and discover specific patterns and regularities from large amounts of data, reducing the dependence on (manual) rules. The rise of neural networks and deep learning further improved the performance of AI Agents in multi-modal data such as natural language, voice, and images.
Intelligence Enhancement of AI Agents in the Era of Large Models
With the rise of pre-trained large models, AI Agents have been further enhanced, mainly in the following aspects:
1. CoT & Planning: The thinking chain and its related derivative technologies have further improved the generalization ability of Agent planning. By breaking down complex problems, ReAct has opened up new ideas for large models in behavioral planning; subsequently, various developments such as Tree-of-Thought, Graph-of-Thought, and Forest-of-Thought have emerged; this has further promoted the optimization of execution paths, such as the introduction of MCMC (Monte-Carlo Markov Chain) and other technologies.
2. Memory: Traditional Agent memory often relies on predefined schemas, which limits the state space and expressive power of Agent memory. GenAI provides a unified semantic representation, and Agent memory has thus gained an almost "free form" capability, greatly expanding the application scope of Agent memory.
3. Multi-modal capabilities: Pre-trained large model technologies essentially create a unified semantic space, allowing various modal data to be trained in this unified space, and applications can thus uniformly process data in multiple modalities.
4. Self-reflection: With the support of LLMs, Agents can further analyze their own plans and execution results to obtain improvements and further enhance performance such as accuracy.
The combination of these capabilities has perfected a general MAS scenario. With the support of GenAI, Agents now have a unified semantic expression method for communication, as well as the ability to dynamically generate plans, and can reflect on plans and results. This actually forms a recursive problem-solving pattern, which is one of the core characteristics of intelligence. (Gödel, Escher, Bach)
As the limitations of simple LLMs are gradually exposed in various commercial scenarios, Agentic AI continues to develop in-depth after the wave of generative AI. The Deloitte report "How AI Agents Are Reshaping the Future of Work" (https://www2.deloitte.com/content/dam/Deloitte/us/Documents/consulting/us-ai-institute-generative-ai-agents-multiagent-systems.pdf) explores the obvious advantages of Agentic AI in complex (and unpredictable) scenarios.
As the founder of deeplearning.ai, Andrew Ng has discussed the importance of Agentic AI workflows in various settings (Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote https://www.youtube.com/watch?v=KrRD7r7y7NY). In his view, Agentic AI represents the development of a new generation of intelligent agent technology, with the core being the ability to solve problems more efficiently and accurately through step-by-step planning, iterative workflows, and role division. He also proposed several core design patterns in current Agentic AI:
1. Reflection
2. Tool Use/API Call
3. Planning
4. Multi-Agent Collaboration
With further deepening of technical understanding and optimization and adjustment in practice, Agentic AI has been applied and developed in various scenarios. 2025 is expected to be the year of AI Agents (Why 2025 Will Be The Year of AI Agents https://www.youtube.com/watch?v=kHPXbo2OkzA). In the future, Agentic AI will evolve from a single Agent to a "group collaboration" model, where they cooperate and even compete with each other to complete more complex tasks.
As an explorer of the data infra wilderness in the AI era, DataCloud has released Powerdrill AI, an AI analysis service centered on data. Relying on a large-model-based interpretable analysis engine, end-to-end (E2E) data understanding and management technology, and core technologies that support large-scale PB-level computing, Powerdrill AI can provide a range of capabilities, including AI Q&A based on datasets, natural language interaction for data analysis, exploration, and insights. These functions are widely used in work scenarios with extremely high accuracy requirements.
Currently, the Powerdrill AI analysis service has been frequently used and verified by more than 1.1 million users in the global market, processing over 2 million files and 12 million data tasks. In the QuALITY benchmark test for long-text input Q&A, Powerdrill AI ranks first globally, demonstrating its excellent ability to handle real data scenarios. (The QuALITY benchmark test is known for its challenging test cases, assessing the model's ability to understand extended paragraphs and accurately answer questions, that is, advanced understanding and complex problem-solving capabilities.)
The Data+AI integrated architecture of Relyt AI-ready Data Cloud fundamentally solves the problems of scale, real-time performance, accuracy, and cost that Powerdrill AI services face when serving enterprise customers. On this basis, personalized autonomous data analysis applications for enterprise customers can be built.
DataCloud looks forward to working with global customers to activate the data potential of everyone.