Cognitive Sidecar
Always-on memory layer that auto-recalls context before every LLM call and auto-learns from every exchange. No opt-in tools required — memory just works.
How It Works
The Cognitive Sidecar wraps your LLM calls with two phases:
Before your LLM call, the sidecar searches memory for relevant context and injects it as a system message. Your agent gets personal context without you writing retrieval code.
After the exchange, the sidecar analyzes the conversation and extracts learnings — preferences, facts, decisions — storing them as beliefs for future recall.
Quick Start
from memgraph_sdk import MemgraphClient
from memgraph_sdk.middleware import CognitiveSidecar
import openai
client = MemgraphClient(api_key="mg_your_key")
sidecar = CognitiveSidecar(
client=client,
user_id="alice",
agent_id="support-bot",
token_budget=4000, # Max tokens of memory context to inject
auto_learn=True, # Extract learnings after each exchange
)
# Your normal messages
messages = [
{"role": "user", "content": "How should I deploy to production?"}
]
# Wrap adds memory context + returns a learn callback
enriched_messages, learn = sidecar.wrap(messages)
# Call your LLM as usual — enriched_messages has memory injected
response = openai.chat.completions.create(
model="gpt-4o",
messages=enriched_messages,
)
# After getting the response, let sidecar learn from the exchange
learn(response.choices[0].message.content)Pre-flight Only
If you only want context injection without auto-learning:
# Just inject memory context into messages
enriched = sidecar.pre_flight(messages)
# enriched[0] is now a system message with memory context:
# {"role": "system", "content": "## Memory Context\n...beliefs and episodes..."}
print(f"Found {sidecar.memories_found} relevant memories")
print(f"Context: {sidecar.last_context[:200]}...")Post-flight Only
Extract learnings from an exchange without pre-flight:
# After your LLM exchange, extract and store learnings
full_messages = [
{"role": "user", "content": "I prefer Kubernetes over Docker Compose"},
{"role": "assistant", "content": "Noted! I'll recommend K8s for future deployments."},
]
sidecar.post_flight(full_messages)
# Extracts: preference "prefers Kubernetes over Docker Compose"
# Stored as belief with confidence 0.90Combined Process
Use process() for a single API call that does both pre-flight and post-flight:
enriched = sidecar.process(messages)
# Single call to /v1/sidecar/process
# Returns enriched messages with context injected
# AND triggers learning in the backgroundREST API
Use the sidecar endpoints directly if you're not using the Python SDK:
# Pre-flight: Get memory context
curl -X POST https://api.memgraph.ai/v1/sidecar/pre-flight \
-H "X-API-Key: mg_your_key" \
-H "Content-Type: application/json" \
-d '{
"user_id": "alice",
"agent_id": "support-bot",
"message": "How should I deploy to production?",
"token_budget": 4000,
"include_profile": true,
"include_prospective": true
}'
# Post-flight: Learn from exchange
curl -X POST https://api.memgraph.ai/v1/sidecar/post-flight \
-H "X-API-Key: mg_your_key" \
-H "Content-Type: application/json" \
-d '{
"user_id": "alice",
"agent_id": "support-bot",
"messages": [
{"role": "user", "content": "I prefer Kubernetes"},
{"role": "assistant", "content": "Noted, will recommend K8s"}
]
}'
# Combined: Pre-flight + Post-flight in one call
curl -X POST https://api.memgraph.ai/v1/sidecar/process \
-H "X-API-Key: mg_your_key" \
-H "Content-Type: application/json" \
-d '{
"user_id": "alice",
"agent_id": "support-bot",
"messages": [
{"role": "user", "content": "How should I deploy?"}
],
"token_budget": 4000
}'What Gets Learned
The sidecar's auto-learning engine detects 10 signal types from conversations:
Each signal is extracted with a confidence score and stored as a typed belief. The extraction uses gpt-4o-mini for optimal accuracy-cost balance.
Configuration
| Parameter | Default | Description |
|---|---|---|
token_budget | 4000 | Max tokens of memory context to inject |
include_profile | true | Include user profile summary in context |
include_prospective | true | Include forward-looking suggestions |
auto_learn | true | Extract and store learnings after each exchange |
thread_id | None | Thread ID for conversation continuity |
Next steps
- Decisions — Record and debug agent reasoning traces
- OpenAI Agents — Automatic sidecar with agent hooks
- MCP Server — Use memory tools with Claude, Cursor, and other MCP clients
- Python SDK — Full SDK reference
