Research Publication · 2026

Stop Overengineering
Agentic Memory

Why complex graph-based memory architectures may be overkill, and how simple, highly-optimized RAG achieves state-of-the-art results on every major benchmark.

Read the Paper View on Hugging Face

89.8%LongMemEvalState of the art

87.1%LoCoMo ScoreState of the art

Vector OnlyArchitectureNo graph databases

ZeroIngestion TaxNo LLM calls on write

The Problem

Long-term memory breaks at scale

AI systems today forget key facts over extended interactions, breaking context. When conversations span thousands of messages across dozens of distinct historical sessions, extracting the right fragments from a massive historical corpus to fit within a constrained context window remains the primary bottleneck for true agentic memory.

Even with today's massive context windows, indiscriminately dumping months of history degrades reasoning capabilities, leading to hallucinations and exorbitant inference costs.

Industry Response

The rush toward graph complexity

To solve retrieval at scale, competitors turned to complex graph databases. By explicitly mapping relationships between entities using Knowledge Graphs or hybrid semantic-graph engines, agents can traverse nodes to find answers.

While effective in theory, this approach introduces massive overhead: managing schemas, handling graph ingestion latency, and writing complex traversal queries. Every time new conversational data is ingested, the framework must invoke an LLM to extract entities and define relationships — turning the simple act of saving a memory into a compute-heavy, synchronous process.

Our Approach

Challenging the status quo with pure semantics

At Moorcheh, we built a highly-optimized Semantic Engine from the ground up to solve the "RAM Tax" of traditional vector databases. By moving away from bloated, in-memory HNSW + Cosine distance stacks in favor of information-theoretic retrieval, we created an engine capable of sub-40ms serverless latency across 100k+ namespaces.

Before Memanto, we asked a simple question: Is all this graph database complexity really necessary? We wanted to see just how far we could push a standard RAG pipeline, stripped of complexities, relying solely on high-quality semantic search.

Naive RAG Baseline

Simple semantic search against ingested memory chunks. Top-10 retrieval limit, 0.15 ITS similarity threshold, Claude Sonnet 4.

LongMemEval

56.6%

LoCoMo

76.2%

Relaxed Retrieval (k=40)

Multi-session questions reference events scattered across disjoint sessions. Raising the retrieval limit to 40 chunks and dropping the threshold to 0.10 delivered immediate gains.

Recall matters far more than precision for agentic memory.

LongMemEval

77.0%+20.4%

LoCoMo

82.8%+6.6%

Optimized Prompts

Adopted and modified prompts from the Hindsight evaluation framework to ensure apples-to-apples comparison with other systems.

Prompt engineering is a band-aid for structural retrieval deficits.

LongMemEval

79.2%+2.2%

LoCoMo

82.9%+0.1%

Dynamic Retrieval (k=100)

Expanded the dynamic retrieval limit to a maximum of 100 chunks at a 0.05 threshold. Retrieval is dynamic via Moorcheh's Information Theoretic Score — only pulls up to 100 chunks if they meet the threshold.

Giving the semantic engine a wider runway surfaces disparate fragments more effectively than hyper-precise vector queries.

LongMemEval

85.0%+5.8%

LoCoMo

86.3%+3.4%

Gemini 3 Inference

Switched the underlying model from Claude Sonnet 4 to Gemini 3 for advanced multi-hop reasoning and complex context synthesis — establishing parity with other leading benchmarked systems.

State of the art. No graphs. No ingestion tax. Single unified query.

LongMemEval

89.8%+4.8%

LoCoMo

87.1%+0.8%

LoCoMoLongMemEval

60%70%80%90%100%

Memanto

87.1%

89.8%

EmergenceMem

—

86%

Supermemory

—

85.2%

Memobase

75.8%

—

Zep

75.1%

71.2%

Letta

74%

—

Full context

72.9%

60.2%

Mem0 G

68.4%

—

Mem0

66.9%

—

LangMem

58.1%

—

Memanto leads both LoCoMo and LongMemEval benchmarks. Architecture details reflect each system's configuration at time of evaluation.

LongMemEval89.8% overall

Single-session User95.7%

Single-session Assistant100.0%

Single-session Preference93.3%

Knowledge Update93.6%

Temporal Reasoning88.0%

Multi-session81.2%

LoCoMo87.1% overall

Open Domain92.4%

Temporal85.4%

Single-Hop78.7%

Multi-Hop70.8%

Full system comparison

System	LoCoMo	LongMemEval	Architecture	Retrieval	Query Method
SOTAMemanto	87.1%	89.8%	Vector Only	RAG	Single Query
EmergenceMem	—	86.0%	Graph + Vector	Parallel	Multi-Query
Supermemory	—	85.2%	Graph + Vector	Parallel	Multi-Query
Memobase	75.8%	—	Graph + Vector	Parallel	Single Query
Zep	75.1%	71.2%	Graph + Vector	Parallel	Single Query
Letta	74.0%	—	Local Filesystem	RAG	Recursive
Full context	72.9%	60.2%	Full Context	Full Context	Single Query
Mem0 G	68.4%	—	Graph + Vector	Parallel	Single Query
Mem0	66.9%	—	Vector Only	Parallel	Single Query
LangMem	58.1%	—	Vector Only	RAG	Single Query

Recall over precision

In agentic memory, the semantic search recall/precision trade-off skews heavily toward recall. It is far better to retrieve noisy chunks and let a capable LLM filter context than to miss critical fragments entirely.

Zero ingestion tax

Memanto ingests raw conversational chunks directly into the vector store — bypassing the LLM extraction overhead entirely. Memories are semantically available for retrieval immediately after write.

Architecture simplicity wins

The next phase of agentic AI will not be won by the most complicated memory diagram. Strong, highly-optimized semantic infrastructure can already deliver SOTA performance without graph orchestration.

Start building with Memanto

Explore the docs and get started with Memanto — the universal memory layer for agentic AI.

Read the Docs View on Hugging Face

Stop Overengineering
Agentic Memory

Why we built Memanto differently

Long-term memory breaks at scale

The rush toward graph complexity

Challenging the status quo with pure semantics

How we got to SOTA

Naive RAG Baseline

Relaxed Retrieval (k=40)

Optimized Prompts

Dynamic Retrieval (k=100)

Gemini 3 Inference

Benchmark results

Full system comparison

The future of agentic memory

Recall over precision

Zero ingestion tax

Architecture simplicity wins

Start building with Memanto

Stop OverengineeringAgentic Memory

Why we built Memanto differently

Long-term memory breaks at scale

The rush toward graph complexity

Challenging the status quo with pure semantics

How we got to SOTA

Naive RAG Baseline

Relaxed Retrieval (k=40)

Optimized Prompts

Dynamic Retrieval (k=100)

Gemini 3 Inference

Benchmark results

Full system comparison

The future of agentic memory

Recall over precision

Zero ingestion tax

Architecture simplicity wins

Start building with Memanto

Stop Overengineering
Agentic Memory