EigenRun: How EigenLake Lets Agents Compute, Not Just Retrieve

By the EigenLake Team

In our launch post, we introduced EigenLake as a single SDK with two primitives: EigenStore for storage and EigenRun for execution. In our follow-up, we argued that trillion-scale embeddings demand computation over the space, not just retrieval of neighbors. Here, we show what that computation looks like when the consumer is an autonomous agent — and why the difference between "searching for context" and "commissioning a workload" is the defining architectural shift of this era.

The agent-tool gap

Four years ago, Yao et al. introduced ReAct (2022), a framework where LLMs interleave reasoning traces with tool actions. The core insight was that reasoning alone suffers from hallucination, while action alone lacks interpretability. ReAct showed that an agent asking "What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?" could reason about which tool to call, execute the search, observe the result, and revise its plan. It was a breakthrough — but the tools were fixed: a Wikipedia API, a calculator, a search engine.

Since then, the evidence has converged on a single conclusion: agents are bounded by the action space they are given. Wang et al. (CodeAct, ICML 2024) demonstrated that replacing constrained JSON tool calls with executable Python code in a sandbox raises agent success rates by up to 20%. The reason is straightforward: a sandboxed interpreter gives an agent general compute — loops, conditionals, libraries, custom logic — rather than a closed menu of pre-approved operations.

An agent that can write and execute code is no longer a chatbot with attachments. It is a computational actor.

The problem is that most vector infrastructure still treats agents as retrievers. An agent asks a question, the system returns nearest neighbors, and the agent is expected to do the rest. But at trillion scale, the rest is the hard part. Clustering a billion vectors, detecting anomalies across a continent, trending a concept over time — these are not actions you can express in a JSON tool call. They require long-running, stateful, multi-step computation.

That is the gap EigenRun closes.

What agents need at trillion scale

The research on agent capabilities at scale points to three non-negotiable requirements:

First, interface design determines performance. Yang et al. (SWE-agent, 2024) showed that a custom agent-computer interface (ACI) — tailored editing commands, repository navigation, and test execution — achieves state-of-the-art on SWE-bench, far exceeding non-interactive LLMs. The implication is profound: the environment an agent operates within is as important as the model that reasons within it. A retrieval-only vector interface cripples the agent before it starts.

Second, planning scales with test-time compute. Koh et al. (Tree Search for LM Agents, 2024) applied best-first tree search to web automation tasks and achieved a 39.7% relative improvement over the same GPT-4o baseline without search. Performance scaled with compute budget. The lesson: agents need the ability to explore, backtrack, and execute long-horizon plans — not single-shot queries.

Third, compositional skills compound. Wang et al. (Voyager, 2023) built a Minecraft agent that writes executable code to a persistent skill library. Each new skill is temporally extended, interpretable, and compositional. The result: 3.3x more unique items discovered, 15.3x faster tech-tree progression. An agent that can store, retrieve, and compose its own compute routines outlearns one that cannot.

Sumers et al. (CoALA, 2023) formalized this into a cognitive architecture: a language agent needs modular memory, a structured action space, and a decision-making process to choose actions. In this framework, EigenRun is the external environment — the action space where agents execute, observe, and iterate.

The vector space is that environment. And at trillion scale, it cannot be searched. It must be computed over.

The EigenRun architecture

EigenRun is the vector compute layer inside EigenLake. It exposes six analytical workloads — Search, Clustering, Anomaly Detection, Topic Modeling, Time Series, and Agent Queries — as first-class operations that run where the data lives. But the critical design decision is not the workload list. It is the dual execution model.

Two paths, one substrate

Path 1: Natural language agent queries. An agent (or human) submits a goal in plain language. EigenRun's planner decomposes the goal into a directed graph of workloads, executes them in a sandboxed environment, and returns a structured result artifact — not raw vectors, but a conclusion with evidence and recommended action.

Path 2: Direct SDK programming. An agent uses the same Python surface as a developer. It retrieves results, chains workloads, applies custom logic, and triggers downstream operations. The agent owns the orchestration. EigenLake owns the primitives.

Both paths share the same substrate: EigenStore for durable, model-ready data and EigenRun for execution. There is no ETL pipeline to build, no Spark job to schedule, no IAM policy to reconcile between systems. The data and the compute live in the same place.

SDK in practice: the agent.query path

Here is what an agentic query looks like in practice. A biology research agent is investigating a new receptor target. It does not need "proteins similar to this one." It needs a structured investigation.

import eigenlake as el

client = el.connect(
    url="https://api.eigenlake.dev",
    api_key="<sk_sbx_your_api_key_here>",
)
idx = client.indexes.open(
    namespace="biology",
    index="alphafold-proteome",
)

# The agent submits a goal in natural language
result = idx.agent.query(
    "Find proteins with similar folding patterns to receptor target 7TM_2847, "
    "cluster them by structural family, detect any anomalous folds that don't "
    "match known topology templates, and tell me which families have grown "
    "significantly in the last 90 days as new structures were deposited"
)

# EigenRun plans and executes in the sandbox:
# Step 1: Search — retrieve 50,000 nearest neighbors to 7TM_2847
# Step 2: Cluster — HDBSCAN on structural embeddings, min_cluster_size=100
# Step 3: Detect — isolation forest on outlier folds vs. CATH topology templates
# Step 4: Time Series — count new depositions per family, week-over-week
# Step 5: Python post-processing — correlate family growth with PubMed citations

print(result.get("summary", result.get("action")))
# Output:
# Emerging structural family: Beta-propeller fold variant (BPV-III)
# Members: 4,217 proteins (up from 3,104 ninety days ago)
# Anomalous members: 12 folds with topology scores < 0.3 vs. CATH templates
# Trend: +36% in 90 days; highest growth in GPCR-related superfamilies
# PubMed correlation: 8 new papers link BPV-III variants to allosteric modulation

print(result.get("recommended_action", "Review the returned action, filter, and clusters."))
# Output:
# Prioritize 12 anomalous BPV-III members for cryo-EM validation.
# Cross-reference with ChEMBL bioactivity data for allosteric binding evidence.

The agent did not retrieve a neighbor. It commissioned a multi-step analytical job. The sandboxed environment handled the orchestration, the intermediate data, and the custom Python correlation step. The agent received a structured artifact it could act on — or feed into its next reasoning cycle.

Compare this to the RAG pattern. In RAG, the agent asks a question, retrieves documents, stuffs them into context, and generates an answer. The retrieval is the end of the infrastructure's job. In the EigenRun pattern, the agent states a goal, the infrastructure plans and executes a compute graph, and returns a structured conclusion.

Retrieval is the beginning. Computation is the product.

SDK in practice: the direct programming path

Not every agent wants to delegate planning. Some agents — especially those with domain-specific reasoning — prefer to own the orchestration themselves. EigenLake's SDK supports this natively.

# A fraud-detection agent with custom logic
neighbors = idx.search.nearest(
    vector=suspicious_txn_embedding,
    limit=10_000,
    filter={"region": {"$eq": "southeast_asia"}, "timestamp": {"$gte": "2025-01-01T00:00:00Z"}},
)

# The agent decides which workload to run next based on its own policy
if len(neighbors["vectors"]) > 5_000:
    fraud_filter = {
        "region": {"$eq": "southeast_asia"},
        "timestamp": {"$gte": "2025-01-01T00:00:00Z"},
    }

    clusters = idx.search.cluster(
        filter=fraud_filter,
        limit=10_000,
        algorithm="dbscan",
        dbscan_min_samples=20,
        representatives_per_cluster=3,
    )
    anomalies = idx.search.anomalies(
        filter=fraud_filter,
        limit=10_000,
        n_neighbors=20,
        top_n=100,
        text_fields=["merchant_description", "dispute_reason"],
    )
    
    # Custom Python: the agent applies its own fraud-family classifier
    from my_agent_logic import classify_fraud_family
    families = classify_fraud_family(clusters, anomalies)
    
    # Store intermediate findings for downstream agents
    idx.records.add_many(
        [
            {
                "id": family.id,
                "properties": {
                    "stage": "agent_classified",
                    "family": family.name,
                    "confidence": family.confidence,
                    "timestamp": family.detected_at,
                },
                "vector": family.centroid,
            }
            for family in families
        ],
        on_error="continue",
    )
    
    # Trigger topic modeling only on high-confidence clusters
    topics = idx.search.topics(
        filter={"confidence": {"$gte": 0.85}},
        limit=10_000,
        min_topics=5,
        max_topics=15,
        text_fields=["family", "merchant_description", "dispute_reason"],
    )
    
    # Time-series trend on the top topic
    top_topic = max(topics["topics"], key=lambda topic: topic["count"])
    shifts = idx.search.temporal_shift(
        baseline={"start": "2025-01-01T00:00:00Z", "end": "2025-03-31T23:59:59Z"},
        current={"start": "2025-04-01T00:00:00Z", "end": "2025-06-30T23:59:59Z"},
        timestamp_field="timestamp",
        filter={"topic_id": {"$eq": top_topic["topic_id"]}},
        limit_per_window=10_000,
        text_fields=["family", "merchant_description", "dispute_reason"],
    )
    
    top_shift = max(shifts["shifts"], key=lambda shift: shift["score"])
    if top_shift["kind"] in ("emerging", "growing") and top_shift["score"] > 0.8:
        alert_ops(top_shift["explanation"])

The key difference: the agent is not a client calling an API. It is a programmer using a library. It makes flow-control decisions, applies custom logic, stores intermediate state, and triggers alerts. EigenLake provides the vector-native primitives. The agent provides the intelligence.

This is the same programming model a human data scientist would use. That is intentional. Cognitive architectures research (CoALA, 2023) shows that agents benefit from the same structured interfaces humans use. A unified SDK eliminates the translation layer between agent reasoning and infrastructure execution.

Workloads across modalities: the evidence

Each EigenRun workload is not a text-only operation. The same mathematical primitive applies across every modality that has been collapsed into an embedding space. Here is the research evidence.

Clustering: from proteins to molecules

Jumper et al. (AlphaFold, Nature 2021) predicted structures for over 200 million proteins, representing each as a high-dimensional embedding of sequence and fold geometry. The resulting space naturally clusters by structural family — TIM barrels, immunoglobulins, Rossmann folds — without human labeling. In drug discovery, Axelrod & Gomez-Bombarelli (2022) showed that clustering molecular conformers in a learned latent space identifies bioactive candidates 10x faster than brute-force docking. A drug-discovery agent using EigenRun's clustering primitive over a molecular embedding index can discover candidate families without running a single simulation.

Anomaly detection: from geospatial to industrial audio

Chen et al. (2021) demonstrated that aircraft and maritime trajectory embeddings — projected from AIS and ADS-B signals — expose anomalous routing patterns indicative of smuggling or mechanical failure. The anomalies are not outliers in a single dimension; they are geometric distortions in the latent space. Purohit et al. (2019, MIMII dataset) showed the same principle in industrial audio: machine sounds embedded via spectrogram encoders reveal bearing wear and valve faults as manifold punctures detectable by isolation forests. An agent monitoring a global logistics fleet or a factory floor can use EigenRun's anomaly detection over geospatial or audio embeddings to surface failures before human operators notice symptoms.

Time series: from concept drift to structural epidemiology

Goh et al. (2021) analyzed CLIP's multimodal neurons and showed that the embedding space tracks visual concepts over time — not statically, but as trajectories. A "pizza" neuron responds to pizza images, yes, but its activation pattern shifts as pizza styles evolve on the internet. The same principle applies to protein structures: as AlphaFold DB grows, new families emerge and old families expand. An agent tracking these trajectories in EigenRun's time-series workload is not monitoring a metric. It is watching the evolution of knowledge itself.

Topic modeling: from audio to cross-modal alignment

Radford et al. (Whisper, 2022) trained on 680,000 hours of multilingual audio, producing a latent space where unsupervised topic clusters emerge — conversational themes in customer support calls, acoustic signatures in medical recordings, linguistic drift in broadcast media. Bepler & Berger (2021) extended this to protein-text cross-modal embeddings, showing that topic clusters in the joint space align functional descriptions with structural motifs. A trust-and-safety agent can topic-model Whisper embeddings to detect emergent harmful audio trends. A biology agent can topic-model protein-text embeddings to map uncharacterized structures to known functional descriptions.

The pattern is universal: once a modality is embedded, the same five workloads apply. The agent does not need modality-specific infrastructure. It needs a compute layer that treats the vector space as a first-class environment.

The relevance: new senses for agents

There is a useful analogy to coding agents. GitHub Copilot, SWE-agent, and Devin-style systems scaled because code search gave them relevant context. Before retrieval-augmented code generation, LLMs hallucinated APIs. After, they grounded their suggestions in real repositories. Search was the sense that made coding agents viable.

Vector intelligence is the next set of senses.

Clustering gives an agent pattern recognition: "These 12,000 fraud transactions are not random. They form three distinct behavioral families."
Anomaly detection gives an agent attention: "This protein fold does not match any known topology template. It is worth investigating."
Time series gives an agent foresight: "This topic cluster has grown 340% in Southeast Asia over 14 days. It will hit the global fleet next week."
Topic modeling gives an agent naming: "The emergent pattern in your unlabeled audio corpus is 'coordinated inauthentic behavior via AI-generated crisis imagery.'"

Each workload is a perceptual modality. A fraud agent with only search finds transactions. A fraud agent with time-series finds emerging tactics before they peak. A biology agent with only search finds proteins. A biology agent with clustering finds structural families that predict function. The difference is not incremental. It is the difference between an agent that reads and an agent that understands.

At trillion scale, these senses are not optional. No human can scroll through a billion search results to find the three fraud families. No human can trend a concept across a billion unlabeled videos. The agent must perceive the geometry directly — and act on it.

The invisible compute layer

The future of AI is not agents with better search. It is agents with general-purpose analytical compute over the compressed model of the world.

This requires an infrastructure layer that four years of research has converged on: an environment where agents can reason, plan, execute long-running workloads, observe results, and iterate. An environment where the action space is not a fixed menu of JSON tools, but a sandboxed compute runtime with vector-native primitives. An environment where the data and the compute live in the same place, so no pipeline stitching is required.

That is EigenRun.

In our launch post, we showed the SDK and the workloads. In our category post, we argued that trillion-scale embeddings demand computation, not retrieval. Here, we have shown what that computation looks like when the consumer is an agent: natural-language goals decomposed into multi-step analytical jobs, executed in a sandbox, returning structured artifacts with evidence and recommended actions.

The vector space is full of signal. Most of it is still invisible — not because we lack the embeddings, but because we lack the compute layer to expose it. EigenRun is that layer.

Launch sandbox →

Or read the docs and schedule a call for production scale and deployment questions.

References

Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
Wang, X., et al. (2024). Executable Code Actions Elicit Better LLM Agents. ICML. arXiv:2402.01030.
Yang, J., et al. (2024). SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. arXiv:2405.15793.
Koh, J. Y., et al. (2024). Tree Search for Language Model Agents. arXiv:2407.01476.
Wang, G., et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291.
Sumers, T. R., et al. (2023). Cognitive Architectures for Language Agents. arXiv:2309.02427.
Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589.
Radford, A., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020.
Goh, G., et al. (2021). Multimodal Neurons in Artificial Neural Networks. Distill.
Radford, A., et al. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356.
Bepler, T., & Berger, B. (2021). Learning the protein language: evolution, structure, and function. Cell Systems, 12(6), 654–669.
Axelrod, M. E., & Gomez-Bombarelli, R. (2022). GEOM: Energy-annotated molecular conformations for property prediction and geometry generation. arXiv:2006.05531.
Chen, Y., et al. (2021). Trajectory Anomaly Detection via Variational Autoencoder. IEEE Transactions on Intelligent Transportation Systems.
Purohit, H., et al. (2019). MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection. arXiv:1909.09347.