Overview

  • AI agents are autonomous systems designed to perform tasks by making decisions based on their environment and inputs. These decisions are typically made using AI techniques such as machine learning, natural language processing and can incorporate multiple modalities.
  • AI agents can be proactive and reactive, meaning they can initiate actions on their own and respond to changes in their environment. Their functionality is often complex and involves a degree of learning or adaptation to new situations
  • These tasks are determined by the AI itself based on the data it gathers and processes, making AI agents essential tools for efficiency and automation in various sectors.
  • AI agents distinguish themselves from ordinary software by their ability to make rational decisions. They process data received from their environments, whether through physical sensors or digital inputs, and use this information to predict and execute actions that align with set goals. This could range from a chatbot handling customer inquiries to a self-driving car navigating obstacles on the road.

“While there isn’t a widely accepted definition for LLM-powered agents, they can be described as a system that can use an LLM to reason through a problem, create a plan to solve the problem, and execute the plan with the help of a set of tools.” source

Why use Agents?

  • The image (source) above shows a scenario where the number of actions to take in the current state are exponential.
  • In this scenario, creating all the possible combinations of rule-based systems is not as easy, thus, we rely on agents.
  • LLMs are very good at making simplistic decisions and generally don’t hallucinate during them. We can leverage different Agentic Architectures to achieve our goals.
  • Agent to Agent in conversations: Code assistant and code executor.

Evaluation

  • Quantifying and objectively evaluating LLM-based agents remains challenging despite their performance in various domains. Benchmarks designed to evaluate LLM agents include:
    • AgentBench
    • IGLU
    • ClemBench
    • ToolBench
    • GentBench
    • MLAgentBench
  • Evaluation dimensions include:
    • Utility: Task completion effectiveness and efficiency, measured by success rate and task outcomes.
    • Sociability: Language communication proficiency, cooperation, negotiation abilities, and role-playing capability.
    • Values: Adherence to moral and ethical guidelines, honesty, harmlessness, and contextual appropriateness.
    • Ability to Evolve Continually: Continual learning, autotelic learning ability, and adaptability to new environments.
    • Adversarial Robustness: Susceptibility to adversarial attacks, with techniques like adversarial training and human-in-the-loop supervision employed.
    • Trustworthiness: Calibration problems and biases in training data affect trustworthiness. Efforts are made to guide models to exhibit thought processes or explanations to enhance credibility.

Core Components of AI Agents

  • The image above (source) simplifies the architecture of a traditional end to end agent pipeline.
  • Let’s dive deeper into each component of AI agents to understand their structure and functionality at a more detailed, technical level.

Agent Core (LLM)

  • Decision-Making Engine: Analyzes data from memory and inputs to make informed decisions.
  • Goal Management System: Maintains and updates the goals of the AI agent.
  • Integration Bus: Facilitates communication between memory modules, planning module, and tools.

Memory Modules

  • Short-term Memory (STM):
    • Data Structure: Implemented using stack, queue, or temporary databases for fast access and modification.
    • Volatility: Data in STM is transient and systematically cleared to free up space and processing power.
    • Functionality: Crucial for tasks requiring immediate but temporary recall.
  • Long-term Memory (LTM):
    • Data Storage: More permanent data storage solutions ensure data persistence.
    • Indexing and Retrieval Systems: Sophisticated indexing mechanisms facilitate quick retrieval of relevant information.
    • Learning and Updating Mechanisms: Updates stored data based on new information and learning outcomes.

Tools

  • Executable Workflows: Scripted actions or processes defined in a high-level language for specific tasks.
  • APIs: External and internal APIs for secure and efficient communication and modular design.
  • Middleware: Bridges the agent’s core logic and tools, handling data formatting, error handling, and security checks.

Workflow vs Agents

  • This section leverages the learnings from Anthropic’s recent blog on Agents.
  • Key Distinctions
    • Workflows: Depend on predefined, code-driven paths for LLM calls (e.g., prompt chaining, routing, parallelization).
      • When to use: Structured tasks that can be broken down into predictable steps, or when you need strict control over execution.
    • Agents: Grant the model autonomy to decide which tools to invoke, when to loop for more context, and how to finalize results.
      • When to use: Open-ended or unpredictable tasks (e.g., multi-file coding tasks, complex support flows) requiring flexible reasoning and decision-making.
  • Complexity vs. Value
    • Start Simple: A single LLM call with retrieval/context often suffices for many use cases.
    • Add Steps Only as Needed: Escalate to multi-step workflows or agentic loops only if they demonstrably improve accuracy, coverage, or user satisfaction.
    • Tradeoffs: Agents and multi-step workflows can increase cost, latency, and error propagation—ensure gains justify the added complexity.
  • Common Workflow Patterns
    • Augmented LLM: An LLM enriched with retrieval, memory, and tool usage—forms the foundation for more complex patterns.
    • Prompt Chaining: Breaks a larger task into sequential sub-steps, with potential checks or validations between calls.
    • Routing: Classifies or dispatches requests to specialized prompts/tools based on input type.
    • Parallelization:
      • Sectioning: Partition tasks into parallelizable chunks to reduce overall time.
      • Voting: Run multiple attempts in parallel, then compare or combine results for higher confidence.
    • Orchestrator-Workers: Allows one “orchestrator” LLM to dynamically identify subtasks and assign them to “worker” LLMs, ideal for tasks where subtasks aren’t known upfront.
    • Evaluator-Optimizer: Iterates between generating and critiquing outputs, refining results through repeated feedback loops.
  • Agents in Practice
    • Continuous Planning & Acting: Agents repeatedly decide next steps, call tools, and evaluate outcomes.
    • Powerful but Risky: Good for tasks where the path is unclear, but require robust guardrails, sandbox testing, and careful oversight to manage costs and prevent runaway errors.
  • Tooling and Best Practices
    • Tool Design: Provide consistent, low-overhead formats (avoid tricky JSON escapes, complex diffs, etc.) and clear usage examples to reduce LLM confusion.
    • Frameworks: Options like LangGraph or Bedrock’s AI Agent can speed up development but may obscure prompts, making debugging tougher. Some teams prefer manually orchestrated workflows with direct API calls for greater transparency.
    • Measure & Iterate: Continuously test and track performance to ensure each added layer (routing, chaining, autonomy) delivers concrete benefits.
  • Bottom Line
    • Balance simplicity, transparency, and measurable performance.
    • Build up from single calls to complex agents only as needed.
    • Success is about finding the right approach for the problem—rather than overbuilding a sophisticated system.

Example Flow Chart for an LLM Agent: Handling a Customer Inquiry

  • The image above (source) shows an example of AI Agent flow.
  1. Customer Interaction
    • Input: “Is the new XYZ smartphone available, and what are its features?”
    • Action: Customer types the query into the e-commerce platform’s chat interface.
  2. Query Reception and Parsing
    • Agent Core Reception: Receive text input.
    • Natural Language Understanding: Parse the text to extract intent and relevant entities.
  3. Intent Classification and Information Retrieval
    • Intent Classification: Classify the query intent.
    • Memory Access: Retrieve stored data on product inventory and specifications.
    • External API Calls: Fetch additional data if not available in memory.
  4. Data Processing and Response Planning
    • Planning Module: Split the query into “check availability” and “retrieve features”.
    • Data Synthesis: Combine information from memory.

Use Cases

  • Let’s look at a few agent use cases below:

Data Agent for Data Analysis

  • The image above (source) explains the flow we will use below.
  1. Identify the Use Case:
    • Define specific data analysis tasks, such as querying databases or analyzing financial reports.
  2. Select the Appropriate LLM:
    • Choose an LLM that handles the complexity of data queries and analysis.
  3. Agent Components:
    • Develop the agent with tools for data handling, a memory module for tracking interactions, and a planning module for strategic execution of tasks.
  4. Design the Data Interaction Tools:
    • Implement tools for interacting with databases or other data sources.

Tools Setup

class SQLExecutor:
    def __init__(self, database_url):
        self.database_url = database_url

    def execute_query(self, query):
        print(f"Executing SQL query: {query}")
        return "Query results"

class Calculator:
    @staticmethod
    def perform_calculation(data):
        print(f"Performing calculation on data: {data}")
        return "Calculation results"

Agent Core Logic

class DataAgent:
    def __init__(self, sql_executor, calculator):
        self.sql_executor = sql_executor
        self.calculator = calculator
        self.memory = []

    def analyze_data(self, query, calculation_needed=True):
        results = self.sql_executor.execute_query(query)
        self.memory.append(results)

        if calculation_needed:
            calculation_results = self.calculator.perform_calculation(results)
            self.memory.append(calculation_results)
            return calculation_results
        
        return results

database_url = "your_database_url_here"
sql_executor = SQLExecutor(database_url)
calculator = Calculator()

agent = DataAgent(sql_executor, calculator)
query = "SELECT * FROM sales_data WHERE year = 2021"
print(agent.analyze_data(query))

LLM-Powered API Agent for Task Execution

  1. Choose an LLM:
    • Select a suitable LLM for handling task execution.
  2. Select a Use Case:
    • Define the tasks the agent will execute.
  3. Build the Agent:
    • Develop the components required for the API agent: tools, planning module, and agent core.
  4. Define API Functions:
    • Create classes for each API call to the models.

Python Code Example

class ImageGenerator:
    def __init__(self, api_key):
        self.api_key = api_key

    def generate_image(self, description, negative_prompt=""):
        print(f"Generating image with description: {description}")
        return "Image URL or data"

class TextGenerator:
    def __init__(self, api_key):
        self.api_key = api_key

    def generate_text(self, text_prompt):
        print(f"Generating text with prompt: {text_prompt}")
        return "Generated text"

class CodeGenerator:
    def __init__(self, api_key):
        self.api_key = api_key

    def generate_code(self, problem_description):
        print(f"Generating code for: {problem_description}")
        return "Generated code"

Plan-and-Execute Approach

def plan_and_execute(question):
    if 'marketing' in question:
        plan = [
            {
                "function": "ImageGenerator",
                "arguments": {
                    "description": "A bright and clean laundry room with a large bottle of WishyWash detergent, featuring the new UltraClean formula and softener, placed prominently.",
                    "negative_prompt": "No clutter, no other brands, only WishyWash."
                }
            },
            {
                "function": "TextGenerator",
                "arguments": {
                    "text_prompt": "Compose a tweet to promote the new WishyWash detergent with the UltraClean formula and softener at $4.99. Highlight its benefits and competitive pricing."
                }
            },
            {
                "function": "TextGenerator",
                "arguments": {
                    "text_prompt": "Generate ideas for marketing campaigns to increase WishyWash detergent sales, focusing on the new UltraClean formula and softener."
                }
            }
        ]
        return plan
    else:
        pass

def execute_plan(plan):
    results = []
    for step in plan:
        if step["function"] == "ImageGenerator":
            generator = ImageGenerator(api_key="your_api_key")
            result = generator.generate_image(**step["arguments"])
            results.append(result)
        elif step["function"] == "TextGenerator":
            generator = TextGenerator(api_key="your_api_key")
            result = generator.generate_text(**step["arguments"])
            results.append(result)
        elif step["function"] == "CodeGenerator":
            generator = CodeGenerator(api_key="your_api_key")
            result = generator.generate_code(**step["arguments"])
            results.append(result)
    return results

question = "How can we create a marketing campaign for our new detergent?"
plan = plan_and_execute(question)
results = execute_plan(plan)
for result in results:
    print(result)

Build your own LLM Agent

  • Here’s a detailed explanation including some Python code examples as outlined in the NVIDIA blog for building a question-answering LLM agent:
  1. Set Up the Agent’s Components:
    • Tools: Include tools like a Retrieval-Augmented Generation (RAG) pipeline and mathematical tools necessary for data analysis.
    • Planning Module: A module to decompose complex questions into simpler parts for easier processing.
    • Memory Module: A system to track and remember previous interactions and solutions.
    • Agent Core: The central processing unit of the agent that uses the other components to solve user queries.
  2. Python Code Example for the Memory Module:
    class Ledger:
        def __init__(self):
            self.question_trace = []
            self.answer_trace = []
    
        def add_question(self, question):
            self.question_trace.append(question)
    
        def add_answer(self, answer):
            self.answer_trace.append(answer)
    
  3. Python Code Example for the Agent Core:
    • This part of the code defines how the agent processes questions, interacts with the planning module, and retrieves or computes answers.
      def agent_core(question, context):
        # Assume a function LLM is defined to handle LLM processing
        action = LLM(context + question)
      
        if action == "Decomposition":
            sub_questions = LLM(question)
            for sub_question in sub_questions:
                agent_core(sub_question, context)
        elif action == "Search Tool":
            answer = RAG_Pipeline(question)
            context += answer
            agent_core(question, context)
        elif action == "Generate Final Answer":
            return LLM(context)
        elif action == "<Another Tool>":
            # Execute another specific tool
            pass
      
  4. Execution Flow:
    • The agent receives a question, and based on the context and internal logic, decides if it needs to decompose the question, search for information, or directly generate an answer.
    • The agent can recursively handle sub-questions until a final answer is generated.
  5. Using the Components Together:
    • All the components are used in tandem to manage the flow of data and information processing within the agent. The memory module keeps track of all queries and responses, which aids in contextual understanding for the agent.
  6. Deploying and Testing the Agent:
    • Once all components are integrated, the agent is tested with sample queries to ensure it functions correctly and efficiently handles real-world questions.

Multi-agent Collaboration

Background

  • Multi-agent collaboration is increasingly used as a key AI design pattern for managing complex tasks. This approach divides large tasks into smaller subtasks assigned to specialized agents, such as software engineers, product managers, designers, and QA engineers. Each agent performs specific functions and can be built using the same or different Large Language Models (LLMs).
  • This concept parallels multi-threading in software development, where tasks are broken down to be handled efficiently by different processors or threads.

Motivation

  • The motivation for using multi-agent systems is threefold:
    1. Proven Effectiveness: Teams have reported positive results using this approach. Studies like those mentioned in the AutoGen paper have shown that multi-agent systems can outperform single-agent systems in complex tasks.
    2. Optimized Task Handling: Despite advancements in LLMs, focusing on specific, simpler tasks can yield better performance. This method allows developers to optimize each component by specifying critical aspects of subtasks.
    3. Complex Task Decomposition: This design pattern provides a framework to break down complex tasks into manageable subtasks, simplifying the development process and enhancing workflow and interaction among agents.

Further Reading

Practical Uses of LLM Agents

  • Customer Support: Automate and manage customer service interactions, offering 24/7 support.
  • Content Creation: Aid in generating articles, blog posts, and social media content.
  • Education: Act as virtual tutors to aid students and support language learning.
  • Coding Assistance: Offer coding suggestions and debugging help to developers.
  • Healthcare: Provide medical information, interpret medical literature, and offer counseling.
  • Accessibility: Enhance accessibility for individuals with disabilities by vocalizing written text.

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

  • This paper by Wu et al. from Microsoft Research, Pennsylvania State University, University of Washington, and Xidian University, introduces AutoGen, an open-source framework designed to facilitate the development of multi-agent large language model (LLM) applications. The framework allows the creation of customizable, conversable agents that can operate in various modes combining LLMs, human inputs, and tools.
  • AutoGen agents can be easily programmed using both natural language and computer code to define flexible conversation patterns for different applications. The framework supports hierarchical chat, joint chat, and other conversation patterns, enabling agents to converse and cooperate to solve tasks. The agents can hold multiple-turn conversations with other agents or solicit human inputs, enhancing their ability to solve complex tasks.

  • Key technical details include the design of conversable agents and conversation programming. Conversable agents can send and receive messages, maintain internal context, and be configured with various capabilities such as LLMs, human inputs, and tools. These agents can also be extended to include more custom behaviors. Conversation programming involves defining agent roles and capabilities and programming their interactions using a combination of natural and programming languages. This approach simplifies complex workflows into intuitive multi-agent conversations.
  • Implementation details:
    1. Conversable Agents: AutoGen provides a generic design for agents, enabling them to leverage LLMs, human inputs, tools, or a combination. The agents can autonomously hold conversations and solicit human inputs at certain stages. Developers can easily create specialized agents with different roles by configuring built-in capabilities and extending agent backends.
    2. Conversation Programming: AutoGen adopts a conversation programming paradigm to streamline LLM application workflows. This involves defining conversable agents and programming their interactions via conversation-centric computation and control. The framework supports various conversation patterns, including static and dynamic flows, allowing for flexible agent interactions.
    3. Unified Interfaces and Auto-Reply Mechanisms: Agents in AutoGen have unified interfaces for sending, receiving, and generating replies. An auto-reply mechanism enables conversation-driven control, where agents automatically generate and send replies based on received messages unless a termination condition is met. Custom reply functions can also be registered to define specific behavior patterns.
    4. Control Flow: AutoGen allows control over conversations using both natural language and programming languages. Natural language prompts guide LLM-backed agents, while Python code specifies conditions for human input, tool execution, and termination. This flexibility supports diverse multi-agent conversation patterns, including dynamic group chats managed by the GroupChatManager class.

  • The framework’s architecture defines agents with specific roles and capabilities, interacting through structured conversations to process tasks efficiently. This approach improves task performance, reduces development effort, and enhances application flexibility. Key technical aspects include using a unified interface for agent interaction, conversation-centric computation for defining agent behaviors, and conversation-driven control flows that manage interactions among agents.
  • Applications demonstrate AutoGen’s capabilities in various domains:
    • Math Problem Solving: AutoGen builds systems for autonomous and human-in-the-loop math problem solving, outperforming other approaches on the MATH dataset.
    • Retrieval-Augmented Code Generation and Question Answering: The framework enhances retrieval-augmented generation systems, improving performance on question-answering tasks through interactive retrieval mechanisms.
    • Decision Making in Text World Environments: AutoGen implements effective interactive decision-making applications using benchmarks like ALFWorld.
    • Multi-Agent Coding: The framework simplifies coding tasks by dividing responsibilities among agents, improving code safety and efficiency.
    • Dynamic Group Chat: AutoGen supports dynamic group chats, enabling collaborative problem-solving without predefined communication orders.
    • Conversational Chess: The framework creates engaging chess games with natural language interfaces, ensuring valid moves through a board agent.
  • The empirical results indicate that AutoGen significantly outperforms existing single-agent and some multi-agent systems in complex task environments by effectively integrating and managing multiple agents’ capabilities. The paper includes a figure illustrating the use of AutoGen to program a multi-agent conversation, showing built-in agents, a two-agent system with a custom reply function, and the resulting automated agent chat.
  • The authors highlight the potential for AutoGen to improve LLM applications by reducing development effort, enhancing performance, and enabling innovative uses of LLMs. Future work will explore optimal multi-agent workflows, agent capabilities, scaling, safety, and human involvement in multi-agent conversations. The open-source library invites contributions from the broader community to further develop and refine AutoGen.

References

Citation

If you found our work useful, please cite it as:

@article{Chadha2020DistilledAgents,
  title   = {Agents},
  author  = {Chadha, Aman and Jain, Vinija},
  journal = {Distilled AI},
  year    = {2020},
  note    = {\url{https://vinija.ai}}
}