AI Agent Architecture

interactive technical overview

Interface

channels · routing · message surfaces

layer 1
Messages arrive from any connected channel, get normalized into a common format, routed to the right session, and delivered to the agent loop.
MattermostTelegramDiscord SignalSlackWhatsApp iMessageGoogle ChatIRCWebhooks
Channel Plugin
Normalize
Session Router
Agent Loop

Inbound

Platform-specific events converted to unified message format with metadata: author, channel, attachments, reply context.

Outbound

Replies route back through originating channel. Sessions track the source surface. Cross-channel messaging supported.

Session

conversation state · message history · thread binding

layer 2
Each conversation is a session — an isolated thread with its own message history. A Mattermost DM, a Telegram chat, and a Discord channel are three separate sessions, each maintaining independent context for the agent.

One conversation = one session

The session holds the full message thread: user messages, agent replies, and tool call results. This is the conversation history the agent sees each turn.

Cross-channel isolation

Talking to the agent on Telegram and Mattermost creates two sessions. Each has its own history — context from one doesn't leak into the other.

Session routing

When a message arrives, the router matches it to an existing session (by channel + conversation ID) or creates a new one. The session then feeds its history into the agent loop.

Sub-sessions

The agent can spawn isolated sub-sessions for background work (cron jobs, delegated tasks). These run independently and report results back to the parent session.

Inbound Message
Route to Session
Append to History
Agent Loop

Agent

core loop · tool execution · decision cycle

layer 3
The agent is a while loop. Each iteration sends the full conversation to the API, inspects the response, and either executes tool calls or returns the final text.
loop active
messages = [system_prompt, ...history]
Build message array with system prompt, context injections, conversation history
response = api.chat(messages, tools)
Send to LLM API with tool definitions. Stream response tokens.
response.has_tool_calls?
Model decides: call tools or respond with text
yes → results = execute(tool_calls)
Execute tools respecting permissions. Append results to messages.
messages.push(tool_results) → continue
Loop back — model sees results and decides next action
no → return response.text
No more tool calls — deliver final response to interface
while (conversation.active) { const response = await api.chat(messages, toolDefs) if (response.toolCalls.length > 0) { const results = await executeTools(response.toolCalls) messages.push(...results) continue // loop back with tool results } return response.text // done — send to interface }

Context

📋
Skills
Matched by description before each reply. If a skill applies, its SKILL.md is loaded and followed. Contains instructions, scripts, and domain-specific references.
🔧
Tools
Functions the model can invoke: exec (shell), read/write/edit (files), browser (automation), web_search, message (channels), nodes (devices).
🔒
Permissions
Tool policies gate access: deny, allowlist, full. Elevated mode runs on host. Ask modes (off, on-miss, always) require user approval for sensitive actions.
🧠
Memory
MEMORY.md — curated long-term memory. memory/*.md — daily notes. memory_search — semantic vector search across all files. Workspace files injected as context each session.

API

provider APIs · tool schemas · streaming

layer 4
Standardized communication with LLM providers. Messages, tool definitions, and responses follow a common schema.

Request

messages[] — conversation + system prompt
tools[] — function schemas
model — target model
stream: true — token streaming

Response

content — text (streamed or complete)
tool_calls[] — function invocations
thinking — reasoning trace
usage — token counts

// tool definition schema { "name": "exec", "description": "Execute shell commands", "parameters": { "command": { "type": "string" }, "workdir": { "type": "string" }, "timeout": { "type": "number" } } }

Model

inference · context window · reasoning

layer 5
The LLM processes the full message array and generates the next response. Stateless — all state lives in the messages sent each turn.

Context Window

Fixed token budget (e.g. 200k). Messages, tools, system prompt, and results must fit. Older messages truncated or summarized.

Inference

Autoregressive generation. Each token sampled conditioned on all prior tokens. Temperature and top-p control randomness.

Tool Use

Models trained to emit structured tool calls. Chooses which tools to invoke based on task and available schemas.

Thinking

Extended chain-of-thought reasoning before the final response. Toggleable. Produces reasoning trace alongside output.