AI Agents

Build a Python AI Agent in 50 Lines: Tools, Memory, and the Loop That Runs It All

Shane Larson

Tue Mar 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Frameworks hide the loop. This tutorial builds a working Python AI agent in 50 lines, then adds a tool registry, memory, and multi-agent patterns.

Here's a complete, working AI agent in under fifty lines of Python. No framework. No dependencies beyond the OpenAI SDK. Just the core loop that every production agent runs under the hood.

import json
from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from environment
model = "gpt-4o"

# --- Tool Implementation ---
def web_search(query: str) -> str:
    """Simulate a web search. Replace with a real search API in production."""
    fake_results = [
        {"title": "Result 1", "snippet": f"Relevant information about: {query}"},
        {"title": "Result 2", "snippet": f"More details regarding: {query}"},
    ]
    return json.dumps(fake_results, indent=2)

available_tools = {"web_search": web_search}

# --- Tool Schema ---
tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for current information on a topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query to look up."
                    }
                },
                "required": ["query"]
            }
        }
    }
]

# --- Agent Loop ---
def run_agent(user_message: str, max_iterations: int = 10) -> str:
    messages = [
        {"role": "system", "content": "You are a helpful assistant. Use web_search for current info."},
        {"role": "user", "content": user_message}
    ]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
        )
        message = response.choices[0].message

        # No tool calls means the model is done — return the answer
        if message.tool_calls is None:
            return message.content

        # Process each tool call the model requested
        messages.append(message)
        for tool_call in message.tool_calls:
            name = tool_call.function.name
            args = json.loads(tool_call.function.arguments)
            print(f"  [iteration {i+1}] calling {name}({args})")
            result = available_tools[name](**args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

    return "Agent reached maximum iterations without producing a final answer."

if __name__ == "__main__":
    answer = run_agent("What is the current population of Tokyo?")
    print(answer)

That's the entire agent. Read it again. The structure is: send messages to the model, check if it wants to call a tool, execute the tool, feed the result back, repeat. When the model returns text instead of a tool call, the loop exits. That's it.

The rest of this article explains why each piece matters, shows you the same agent built with the Anthropic SDK, then scales the architecture up to a proper tool registry that you can use in production. If you want the complete treatment through multi-agent systems and production error handling, I wrote it all up in Build Your Own AI Agent From Scratch on Amazon.

Why This Is an Agent and Not a Chatbot

Everyone is shipping "AI agents" right now. Most of them are lying.

What they've actually shipped is a chatbot with a system prompt, or a pipeline that calls an LLM three times in a fixed sequence, or a workflow with a few API integrations bolted on. Useful, maybe. Agents? Not quite.

An actual agent observes its environment, reasons about what to do next, takes action, evaluates the result, and loops until the task is done. The LLM doesn't just respond. It decides. What tool to call. Whether to call one at all. When it's finished. That decision-making loop is what separates an agent from a very fancy API wrapper.

The code above demonstrates this. Ask it "What is the current population of Tokyo?" and here's what happens:

The agent sends the system prompt and user message to the model.
The model decides it needs current data and outputs a structured tool call: web_search({"query": "current population of Tokyo"}).
Your code catches that, executes web_search, gets back JSON results.
Those results get appended to the conversation history and sent back to the model.
The model reads the results, synthesizes an answer, and returns plain text: no more tool calls.
The loop sees no tool calls, returns the answer, exits.

Now try "What is 2 + 2?" The model knows this, skips the tool call entirely, and responds with "4." The loop runs once. No tools are called.

That decision: to act or not. That's what makes this an agent. A script always searches. A chatbot never searches. The agent evaluates and chooses.

The Four Components Every Agent Needs

Before you start extending that 50-line agent, it helps to name the pieces precisely. Every agent, from a weekend prototype to a production system, has four components:

The LLM (the brain). The language model is the decision-maker. It reads the current state of the conversation: your messages, previous tool results, context. Then it decides what to do next. Call a tool? Respond to the user? Ask a clarifying question? The LLM makes that call. It does not execute anything directly.

Tools (the hands). An LLM alone can only produce text. Tools are what let the agent actually do things: read a file, query a database, call an API, search the web. The LLM never executes tools itself. It outputs a structured request, and your code executes it. The LLM is the decision-maker. You are the executor.

Memory (the filing cabinet). Short-term memory is the conversation history within a session: the rolling context that lets the LLM see what it already tried and what came back. Long-term memory is anything that persists across sessions: vector stores, databases, files on disk. Most agents start with short-term memory only. That gets you further than you'd expect.

The reasoning loop (the nervous system). The loop ties everything together. Send the current state to the LLM. Check if it wants to use a tool. Execute the tool. Append the result. Loop. When the LLM produces a final text response instead of a tool call, the loop exits.

That loop is the heartbeat. Everything else is elaboration.

Where Agents Sit on the Autonomy Spectrum

The terminology is polluted enough that it's worth drawing the lines clearly:

LESS AUTONOMOUS                                              MORE AUTONOMOUS
      |                                                              |
      v                                                              v

  Chatbot -----> Pipeline -----> Workflow -----> Agent -----> Autonomous Agent

  - Text in/out   - Fixed steps   - Branching     - Tool use     - Self-directed
  - No tools      - No decisions  - Pre-defined   - Loop         - Goal-seeking
  - No loop       - No loop         paths         - Evaluates    - Minimal human
  - Stateless     - Stateless    - LLM powers      results        oversight
                                   steps, not    - Decides next
                                   flow            action

A pipeline that calls summarize → extract → report in that order every time is not an agent. The sequence is hardcoded. There's no decision. An agent reads the current state and decides the next step, including whether there is a next step at all.

Same Agent, Different Provider: The Anthropic Version

Most tutorials lock you into one provider. The agent pattern doesn't. If you prefer Claude over GPT, the structure is nearly identical. The main differences are how tool results are formatted and how stop_reason works:

import json
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from environment
model = "claude-sonnet-4-20250514"

def web_search(query: str) -> str:
    fake_results = [
        {"title": "Result 1", "snippet": f"Relevant information about: {query}"},
    ]
    return json.dumps(fake_results, indent=2)

available_tools = {"web_search": web_search}

tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information on a topic.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query."}
            },
            "required": ["query"]
        }
    }
]

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    system = "You are a helpful assistant. Use web_search for current info."

    while True:
        response = client.messages.create(
            model=model,
            max_tokens=1024,
            system=system,
            messages=messages,
            tools=tools,
        )

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})

            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  -> calling {block.name}({block.input})")
                    result = available_tools[block.name](**block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })

            messages.append({"role": "user", "content": tool_results})
        else:
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text

The loop is the same loop. Send messages, check for tool calls, execute tools, feed results back, repeat. The API surface differs between providers; the pattern doesn't. Once you understand the pattern, switching providers is a couple of hours of work, not a rewrite.

Building a Proper Tool Registry

One tool hardcoded into the script is fine for a prototype. Real agents need dozens of tools, and adding a new one shouldn't require touching the agent loop at all.

A ToolRegistry solves this. It stores tool functions, auto-generates their JSON schemas from Python type hints, and handles dispatch. Here's the implementation:

import json
import inspect
from typing import Callable, Any, get_type_hints

class ToolRegistry:
    """Stores tools, their schemas, and handles execution."""

    def __init__(self):
        self._tools: dict[str, Callable] = {}
        self._schemas: dict[str, dict] = {}

    def tool(self, description: str = "", **param_descriptions):
        """Decorator to register a function as an agent tool."""
        def decorator(func: Callable) -> Callable:
            name = func.__name__
            schema = self._generate_schema(func, description, param_descriptions)
            self._tools[name] = func
            self._schemas[name] = schema
            return func
        return decorator

    def _generate_schema(self, func, description, param_descriptions):
        hints = get_type_hints(func)
        sig = inspect.signature(func)
        desc = description or (func.__doc__ or "").strip() or f"Call {func.__name__}"

        properties = {}
        required = []

        for param_name, param in sig.parameters.items():
            if param_name == "self":
                continue
            param_type = hints.get(param_name, str)
            json_type = self._python_type_to_json(param_type)
            prop = {"type": json_type}
            if param_name in param_descriptions:
                prop["description"] = param_descriptions[param_name]
            properties[param_name] = prop
            if param.default is inspect.Parameter.empty:
                required.append(param_name)

        return {
            "type": "function",
            "function": {
                "name": func.__name__,
                "description": desc,
                "parameters": {
                    "type": "object",
                    "properties": properties,
                    "required": required,
                },
            },
        }

    @staticmethod
    def _python_type_to_json(python_type: type) -> str:
        type_map = {
            str: "string", int: "integer", float: "number",
            bool: "boolean", list: "array", dict: "object",
        }
        return type_map.get(python_type, "string")

    def get_schemas(self) -> list[dict]:
        return list(self._schemas.values())

    def call(self, name: str, arguments: dict[str, Any]) -> str:
        if name not in self._tools:
            return json.dumps({"error": f"Unknown tool: {name}"})
        try:
            result = self._tools[name](**arguments)
            return str(result)
        except TypeError as e:
            return json.dumps({"error": f"Invalid arguments for {name}: {e}"})
        except Exception as e:
            return json.dumps({"error": f"Tool {name} failed: {e}"})

Now registering a tool is one decorator:

registry = ToolRegistry()

@registry.tool(
    description="Search the web for current information on a topic.",
    query="The search query string."
)
def web_search(query: str) -> str:
    # Real implementation here
    return json.dumps([{"snippet": f"Results for: {query}"}])

@registry.tool(
    description="Read the contents of a file from the local filesystem.",
    filepath="Absolute or relative path to the file."
)
def read_file(filepath: str) -> str:
    with open(filepath, "r") as f:
        return f.read()

The agent loop uses the registry without knowing about any specific tool:

def run_agent(user_message: str, max_iterations: int = 10) -> str:
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": user_message}
    ]

    for _ in range(max_iterations):
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=registry.get_schemas(),
        )
        message = response.choices[0].message

        if message.tool_calls is None:
            return message.content

        messages.append(message)
        for tool_call in message.tool_calls:
            name = tool_call.function.name
            args = json.loads(tool_call.function.arguments)
            result = registry.call(name, args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

    return "Max iterations reached."

Add a new tool with a decorator. The loop never changes. That's the design goal.

Writing Good Tool Descriptions (The Highest-Leverage Optimization)

The single highest-leverage thing you can do to improve agent behavior, outside of the loop design itself, is write better tool descriptions.

The LLM doesn't pattern-match on function names. It reads your description, understands the intent, and makes a judgment call about whether this is the right tool for the current situation. A vague description produces inconsistent tool selection. A precise description produces reliable tool selection.

Compare these two descriptions for a file-reading tool:

# Vague
"Read a file."

# Precise
"Read the complete contents of a text file from the local filesystem. 
Returns the raw file content as a string. Use when the user refers to 
a specific file by path, or when you need to examine code, configuration, 
or documentation stored locally."

The second version tells the model when to use the tool, not just what it does. The model has much better signal about whether the current task warrants this tool versus a different one.

The same logic applies to parameter descriptions. Don't write "The file path." Write "Absolute or relative path to the file. Use forward slashes. Include the file extension." The model reads those descriptions when generating arguments: the more precise they are, the more precisely the arguments come back.

The Five Problems Every Agent Must Eventually Solve

The 50-line version works, but it's fragile in predictable ways. Here's what breaks next and where the solutions live:

Too few tools, all hardcoded. The tool registry above is the fix. A mature agent might have dozens of tools loaded from a plugin directory or discovered dynamically via MCP (the Model Context Protocol, Anthropic's standard for tool connectivity).

No memory across conversations. Every session starts from zero. Short-term fixes involve persisting the message history to disk between runs. Long-term fixes involve vector stores: embed past interactions, retrieve semantically relevant ones, inject them into the context.

Can't plan multi-step tasks. Ask the simple agent to "research five EV competitors and write a comparison" and it tries to do everything in one search. Planning agents decompose the task first, then execute each step. The ReAct pattern (Reason → Act → Observe, repeat) makes this explicit and debuggable.

No error handling. An exception in any tool call crashes the agent. Production agents need retry logic, fallback tools, and graceful degradation: plus a way to report errors back to the LLM so it can try a different approach.

Single-agent bottleneck. One LLM can't be an expert at everything. Multi-agent systems solve this with a supervisor that delegates subtasks to specialized worker agents: one for research, one for writing, one for code review.

The minimal loop in this article is the foundation that all of those patterns build on. The skeleton doesn't change. What changes is the sophistication of what happens inside it.

Where to Go From Here

The patterns here: the loop, the tool registry, the function calling protocol. They're the complete foundation. Every agent framework (LangChain, LangGraph, CrewAI, AutoGen, the Anthropic agent SDK) is a more elaborate version of these same pieces.

If you want to go deeper:

MCP (Model Context Protocol) is the standard for tool connectivity in 2026. Instead of registering tools directly in your code, MCP lets you connect to external tool servers using a standardized protocol. Anthropic introduced it in late 2024, and it's now supported across most major providers.

ReAct (Reason + Act) is the prompting pattern that makes multi-step reasoning explicit. The model outputs its reasoning before each tool call, which both improves accuracy and makes the agent's logic transparent.

Vector memory is the standard solution for long-term agent memory. Embed text into vectors, store them, retrieve by semantic similarity. You can build a working version from scratch with Python and NumPy.

The companion repository for these patterns is at github.com/grizzlypeaksoftware/gps-ai-agent-fundamentals. Every pattern above: the minimal agent, the tool registry, the Anthropic version, the verbose debugging loop. They all exist as runnable scripts there.

For the complete treatment, from the minimal agent through multi-agent systems and production error handling, I wrote it all up in Build Your Own AI Agent From Scratch, available on Amazon.

The architecture isn't complicated once you see it. The loop is the loop. Build it yourself once and you'll never be confused by agent frameworks again.

Build a Python AI Agent in 50 Lines: Tools, Memory, and the Loop That Runs It All

Why This Is an Agent and Not a Chatbot

The Four Components Every Agent Needs

Where Agents Sit on the Autonomy Spectrum

Same Agent, Different Provider: The Anthropic Version

Building a Proper Tool Registry

Writing Good Tool Descriptions (The Highest-Leverage Optimization)

The Five Problems Every Agent Must Eventually Solve

Where to Go From Here

Quick Links

Recent Articles

Need Expert Help?

The Node.js Multi-Agent Book Engineers Are Dog-Earing.

Why This Is an Agent and Not a Chatbot

The Four Components Every Agent Needs

Where Agents Sit on the Autonomy Spectrum

Same Agent, Different Provider: The Anthropic Version

Building a Proper Tool Registry

Writing Good Tool Descriptions (The Highest-Leverage Optimization)

The Five Problems Every Agent Must Eventually Solve

Where to Go From Here

Quick Links

Recent Articles

Need Expert Help?