AI Agent Architecture

Interface

channels · routing · message surfaces

layer 1

Messages arrive from any connected channel, get normalized into a common format, routed to the right session, and delivered to the agent loop.

MattermostTelegramDiscord SignalSlackWhatsApp iMessageGoogle ChatIRCWebhooks

Channel Plugin

→

Normalize

→

Session Router

→

Agent Loop

Inbound

Platform-specific events converted to unified message format with metadata: author, channel, attachments, reply context.

Outbound

Replies route back through originating channel. Sessions track the source. Cross-channel messaging supported.

Session

conversation state · message history · thread binding

layer 2

Each conversation is a session — an isolated thread with its own message history. Different channels and conversations maintain independent context.

One conversation = one session

The session holds the full message thread: user messages, agent replies, and tool results.

Cross-channel isolation

Telegram and Mattermost create separate sessions. Context doesn't leak between them.

Session routing

Messages matched by channel + conversation ID. New conversations create new sessions.

Sub-sessions

Agent can spawn isolated sub-sessions for background work that report back when done.

Inbound Message

→

Route to Session

→

Append to History

→

Agent Loop

Agent

core loop · tool execution · decision cycle

layer 3

The agent is a while loop. Each iteration sends the conversation to the API, inspects the response, and either executes tool calls or returns the final text.

loop active

messages = [system_prompt, ...history]

Build message array with system prompt, context, conversation history

response = api.chat(messages, tools)

Send to LLM API with tool definitions. Stream response.

response.has_tool_calls?

Model decides: call tools or respond with text

yes → results = execute(tool_calls)

Execute tools respecting permissions. Append results.

messages.push(tool_results) → continue

Loop back — model sees results, decides next action

no → return response.text

Final response delivered to interface

while (conversation.active) { const response = await api.chat(messages, toolDefs) if (response.toolCalls.length > 0) { const results = await executeTools(response.toolCalls) messages.push(...results) continue // loop back with tool results } return response.text // done — send to interface }

Context

📋

Skills▸

Matched by description before each reply. If a skill applies, its SKILL.md is loaded. Contains instructions, scripts, and references.

🔧

Tools▸

Functions the model invokes: exec (shell), read/write/edit (files), browser, web_search, message (channels), nodes (devices).

🔒

Permissions▸

Tool policies: deny, allowlist, full. Elevated mode for host access. Ask modes require user approval.

🧠

Memory▸

MEMORY.md — curated long-term memory. memory/*.md — daily notes. memory_search — semantic search. Workspace files injected each session.

API

provider APIs · tool schemas · streaming

layer 4

Standardized communication with LLM providers. The agent sends messages and available tools; the model responds with either text or tool calls.

→ Request

// What the agent sends to the model { "model": "claude-opus-4", "messages": [ { "role": "system", "content": "You are a helpful assistant..." }, { "role": "user", "content": "What's in my notes file?" } ], "tools": [ { "name": "read_file", "description": "Read contents of a file", "parameters": { "path": { "type": "string" } } }, { "name": "write_file", "description": "Write content to a file", "parameters": { "path": { "type": "string" }, "content": { "type": "string" } } } ] }

← Response

// Model decides to use a tool { "content": null, "tool_calls": [ { "name": "read_file", "arguments": { "path": "notes.md" } } ], "usage": { "input_tokens": 284, "output_tokens": 31 } } // Agent executes read_file, appends // result to messages, loops back →

Model

inference · context window · reasoning

layer 5

The LLM processes the full message array and generates the next response. Stateless — all state lives in the messages sent each turn.

Context Window

Fixed token budget (e.g. 200k). All messages, tools, system prompt must fit. Older messages truncated.

Inference

Autoregressive generation. Each token conditioned on all prior tokens. Temperature / top-p control sampling.

Tool Use

Models emit structured tool calls based on task and available schemas.

Thinking

Chain-of-thought reasoning before final response. Toggleable. Produces reasoning trace.