MemgraphAI

Cognitive Sidecar

Always-on memory layer that auto-recalls context before every LLM call and auto-learns from every exchange. No opt-in tools required — memory just works.

How It Works

The Cognitive Sidecar wraps your LLM calls with two phases:

Pre-flight (Recall)

Before your LLM call, the sidecar searches memory for relevant context and injects it as a system message. Your agent gets personal context without you writing retrieval code.

Post-flight (Learn)

After the exchange, the sidecar analyzes the conversation and extracts learnings — preferences, facts, decisions — storing them as beliefs for future recall.

Quick Start

python
from memgraph_sdk import MemgraphClient
from memgraph_sdk.middleware import CognitiveSidecar
import openai

client = MemgraphClient(api_key="mg_your_key")

sidecar = CognitiveSidecar(
    client=client,
    user_id="alice",
    agent_id="support-bot",
    token_budget=4000,    # Max tokens of memory context to inject
    auto_learn=True,      # Extract learnings after each exchange
)

# Your normal messages
messages = [
    {"role": "user", "content": "How should I deploy to production?"}
]

# Wrap adds memory context + returns a learn callback
enriched_messages, learn = sidecar.wrap(messages)

# Call your LLM as usual — enriched_messages has memory injected
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=enriched_messages,
)

# After getting the response, let sidecar learn from the exchange
learn(response.choices[0].message.content)

Pre-flight Only

If you only want context injection without auto-learning:

python
# Just inject memory context into messages
enriched = sidecar.pre_flight(messages)

# enriched[0] is now a system message with memory context:
# {"role": "system", "content": "## Memory Context\n...beliefs and episodes..."}

print(f"Found {sidecar.memories_found} relevant memories")
print(f"Context: {sidecar.last_context[:200]}...")

Post-flight Only

Extract learnings from an exchange without pre-flight:

python
# After your LLM exchange, extract and store learnings
full_messages = [
    {"role": "user", "content": "I prefer Kubernetes over Docker Compose"},
    {"role": "assistant", "content": "Noted! I'll recommend K8s for future deployments."},
]

sidecar.post_flight(full_messages)
# Extracts: preference "prefers Kubernetes over Docker Compose"
# Stored as belief with confidence 0.90

Combined Process

Use process() for a single API call that does both pre-flight and post-flight:

python
enriched = sidecar.process(messages)
# Single call to /v1/sidecar/process
# Returns enriched messages with context injected
# AND triggers learning in the background

REST API

Use the sidecar endpoints directly if you're not using the Python SDK:

bash
# Pre-flight: Get memory context
curl -X POST https://api.memgraph.ai/v1/sidecar/pre-flight \
  -H "X-API-Key: mg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alice",
    "agent_id": "support-bot",
    "message": "How should I deploy to production?",
    "token_budget": 4000,
    "include_profile": true,
    "include_prospective": true
  }'

# Post-flight: Learn from exchange
curl -X POST https://api.memgraph.ai/v1/sidecar/post-flight \
  -H "X-API-Key: mg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alice",
    "agent_id": "support-bot",
    "messages": [
      {"role": "user", "content": "I prefer Kubernetes"},
      {"role": "assistant", "content": "Noted, will recommend K8s"}
    ]
  }'

# Combined: Pre-flight + Post-flight in one call
curl -X POST https://api.memgraph.ai/v1/sidecar/process \
  -H "X-API-Key: mg_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alice",
    "agent_id": "support-bot",
    "messages": [
      {"role": "user", "content": "How should I deploy?"}
    ],
    "token_budget": 4000
  }'

What Gets Learned

The sidecar's auto-learning engine detects 10 signal types from conversations:

preference
fact
decision
correction
capability
goal
relationship
emotion
feedback
context

Each signal is extracted with a confidence score and stored as a typed belief. The extraction uses gpt-4o-mini for optimal accuracy-cost balance.

Configuration

ParameterDefaultDescription
token_budget4000Max tokens of memory context to inject
include_profiletrueInclude user profile summary in context
include_prospectivetrueInclude forward-looking suggestions
auto_learntrueExtract and store learnings after each exchange
thread_idNoneThread ID for conversation continuity

Next steps

  • Decisions — Record and debug agent reasoning traces
  • OpenAI Agents — Automatic sidecar with agent hooks
  • MCP Server — Use memory tools with Claude, Cursor, and other MCP clients
  • Python SDK — Full SDK reference