EigenLake
Blog
Blog/

Launching EigenLake: Vector Workloads Where Your AI Data Lives

EigenLake is an agentic compute layer for vector intelligence. One SDK to store, search, and run ML workloads on the same indexed data.

vector-intelligenceagentic-computeworkloadssdkeigenstoreeigenrun

By the EigenLake Team

Today we're launching EigenLake.

EigenLake is an agentic compute layer for vector intelligence. It gives developers and AI agents one SDK to store vector records, search them, and run ML-style workloads — clustering, anomaly detection, topic modeling, time-series analysis, and agent queries — on the same indexed data. No pipelines. No glue. No copying embeddings between systems.

Try it free in the live sandbox →


Vector search is only the beginning

Vector search on code unlocked Copilot, Cursor, and Cody. An agent that could find the right function signature across millions of embeddings suddenly became useful.

But here is the ceiling: that agent could only search. It could find the neighbor. It could not cluster the bugs, detect the drift, or trend the patterns across the entire codebase.

Now apply that same ceiling to every other domain. Modern AI teams routinely accumulate 100M–1B+ embeddings — support tickets, user sessions, sensor readings, product reviews, operational logs. The intelligence is already there. But the agents accessing it are still stuck on search.

When a product manager asks, Which warranty patterns changed this week? or an agent asks, Where is fraud behavior emerging globally? — a vector database alone cannot answer that. It can retrieve similar vectors. It cannot cluster, detect, model, or trend across the corpus.

So teams do what they have always done: they build pipelines.


The execution gap

Today, answering a question like find the warranty patterns changing this week requires stitching together six separate systems:

ComponentWhat it doesWhat breaks
Vector DBNearest-neighbor searchStops at neighbors. Cannot cluster or detect.
Metadata storeJoins vector IDs to labelsSchema drift. Stale joins.
Spark / GPUBatch compute over vectorsOne-off jobs. Slow iteration.
Lambda / GlueOrchestrates the pipelineTimeouts. Retries. Observability gaps.
IAMAccess controlManual gates between systems.
Result storeCatches the outputTemporary. Ephemeral. Not reusable.

No shared runtime owns the full query path. Every question becomes a bespoke data pipeline. Every pipeline brings its own failure modes. And slowly, inevitably, analysis work moves from product logic into infrastructure glue.

This is the execution gap. It is where vector projects go to stall — and where AI agents hit a wall.


The insight

Agent workloads should run where vector data already lives.

If an agent needs to answer Which fraud patterns are emerging? it should not have to orchestrate five API calls, handle retries, and stitch metadata. It should ask one question. One runtime should own the answer.

You should store once. And run workloads where the data lives.

That is the bet behind EigenLake.


What we built: EigenStore + EigenRun

EigenLake is two primitives and one SDK.

EigenStore — the vector data layer

EigenStore is a durable home for embeddings, records, metadata, documents, and events. Create indexes. Add records. Attach metadata. Keep model-ready vector data in one place.

import eigenlake as el
from eigenlake import schema as s

schema, index_options = (
    s.SchemaBuilder(additional_properties=False)
    .add("ticket_id", s.string(required=True, filterable=True))
    .add("created_at", s.datetime(filterable=True))
    .add("text", s.string(filterable=False))
    .build()
)

with el.connect(
    url="https://api.eigenlake.dev",
    api_key="<sk_sbx_your_api_key_here>",
) as client:
    idx = client.indexes.create_or_get(
        namespace="support",
        index="tickets",
        dimensions=768,
        schema=schema,
        index_options=index_options,
    )
    idx.records.add_many(records, on_error="continue")

EigenRun — the vector execution layer

EigenRun runs workloads against the records already stored in EigenStore. No ETL. No copies. No one-off batch jobs.

# Search
idx.search.nearest(vector=query_vector, limit=100)

# Cluster
idx.search.cluster(
    algorithm="kmeans",
    auto_tune=True,
    min_clusters=20,
    max_clusters=50,
)

# Anomaly detection
idx.search.anomalies(n_neighbors=20, top_n=100)

# Topic modeling
idx.search.topics(min_topics=5, max_topics=15)

# Time series
idx.search.temporal_shift(
    baseline={"start": "2026-06-01T00:00:00Z", "end": "2026-06-08T00:00:00Z"},
    current={"start": "2026-06-08T00:00:00Z", "end": "2026-06-15T00:00:00Z"},
    timestamp_field="created_at",
)

# Agent query
idx.agent.query("find the warranty patterns changing this week")

Built for agents

EigenLake is not just a library developers import. It is a runtime agents can call.

When an agent receives a goal like find the warranty patterns changing this week, it does not need to know whether the answer requires search, clustering, or anomaly detection. It sends an agentic query to EigenLake. EigenLake decides which workloads to run, executes them in a sandboxed environment, and returns a structured result artifact the agent can act on.

The agent can also request a specific workload directly — cluster these records or trend this signal — as a natural query. Either way, the agent is not orchestrating infrastructure. It is reasoning over results.

This is the difference between an agent that retrieves and an agent that understands.

One surface for humans and agents

Developers and AI agents use the same programmable interface. The same query path. The same runtime. Whether you are writing a Python script or an autonomous agent is reasoning over your vector space, the surface does not change.

Store once. Run workloads where the data lives.


A 30-second example

Here is what one agentic query looks like end to end.

Agent goal: Which warranty issue is emerging, where, and what should ops do?

The agent sends this as a natural query to EigenLake. EigenLake determines the path: search the corpus, cluster the results, detect anomalies, label themes, and trend over time. All inside the sandbox. All on the same stored data.

Corpus: 9.8M events — tickets, reviews, repairs, logs, returns, events.

StepWorkloadResult
1. SearchFind related complaints9.8M events scanned
2. ClusterGroup issue families27 distinct patterns
3. DetectSpot abnormal spikes1 spike above baseline
4. NameLabel the themeBattery swelling
5. TrendLocate growth+38% WoW in EU · Model X

Result artifact returned to the agent: Emerging warranty pattern — Battery swelling after firmware update. 18.4K records. 4.1x baseline. EU · Model X · fw 4.8.2.

Recommended action: Pause rollout. Open recall investigation.

The agent does not orchestrate five systems. It asks one question. EigenLake runs the full query path. The agent receives evidence it can act on.


The workloads

These are not just workloads. They are capabilities you can hand to an agent.

  • Search — An agent can find relevant records across a billion-vector corpus.
  • Clustering — An agent can group incidents into families without predefined labels.
  • Anomaly Detection — An agent can surface unusual behavior as it emerges.
  • Topic Modeling — An agent can extract themes from unstructured feedback.
  • Time Series — An agent can track how signals change and project what happens next.

Search, clustering, anomaly detection, topic modeling, and temporal shift are available as direct SDK workloads today. Agent queries sit above those primitives as the natural-language routing layer, so an agent can ask for an outcome without hand-selecting every workload up front.

The agent does not need to know which capability to use. It states the goal. EigenLake matches the goal to the workload.


Who EigenLake is for

  • Agent builders — You are building autonomous systems that need to reason over vector data at scale. You need a runtime where agents can query, cluster, and detect without hand-rolling infrastructure or managing sandboxes.
  • RAG developers — You have built retrieval. You are ready to move beyond naive nearest-neighbor into structured understanding.
  • Platform teams — You are tired of maintaining one-off pipelines for every new analytical question.
  • Ops / product teams — You need to turn vector data into evidence, trends, and recommended actions.

How it differs from a vector database

A vector database retrieves neighbors. EigenLake runs the full query path.

Pinecone, Weaviate, and Qdrant are excellent at what they do: fast, scalable nearest-neighbor search. If your problem is find me documents like this one, they are the right tool.

EigenLake is for when your problem becomes find me the pattern, the anomaly, the trend, and the drift — and tell me what to do about it.

EigenLake keeps storage and compute unified. You do not copy embeddings into Spark. You do not maintain glue code. You store once, and you run analytical workloads where the data already lives.


What EigenLake is not

  • Not just a vector DB. Pinecone, Weaviate, and Qdrant solve retrieval. EigenLake solves the execution layer above it.
  • Not an ETL tool. You do not extract, transform, and load into EigenLake. You store once and compute there.
  • Not a general-purpose data warehouse. It is purpose-built for vector-scale data and the workloads that run on it.
  • Not a model provider. We do not host embeddings models. We run workloads on the embeddings you already have.

From millions to trillions

Vector search on code unlocked Copilot. A single agentic capability — find the relevant function — transformed how developers write software.

But that was one workload on one domain at million-vector scale.

Now imagine an agent with access to all EigenLake workloads — search, cluster, detect, model, trend — running over trillions of vectors across your entire business.

A global payment network processing 1.2 trillion transaction vectors — user profiles, device fingerprints, merchant records, chargeback logs, geolocation events.

One agent query: Which fraud patterns are emerging globally, where are they concentrated, and what should risk ops do?

EigenLake determines the execution path. The agent does not manage pipelines. It receives a result artifact:

  • Search flagged 847M suspicious events.
  • Cluster organized them into 41 fraud families.
  • Detect surfaced 3 spikes above historical baseline.
  • Topic model extracted emerging tactics: synthetic identity via BNPL, card testing on micro-merchants, account takeover with instant payout.
  • Time series tracked +340% week-over-week growth in Southeast Asia, correlated with a new merchant category code introduced 14 days prior.

Result: Synthetic identity fraud exploiting buy-now-pay-later onboarding. Action: Pause instant-approval pipeline. Deploy enhanced KYC. Notify partner risk teams.

One agent. One query. Trillion-scale corpus. The agent did not retrieve a neighbor. It computed over the space — and decided what to do next.


Try it

Vector DBs give you nearest neighbors. EigenLake gives agents a complete vector intelligence workspace.

Launch sandbox →

Or read the docs and schedule a call for production scale and deployment questions.

Related reading

Blog

Why We Call It Vector Intelligence, Not Vector Search

A name is a promise. Vector database describes the floor, not the ceiling. At trillion scale, the space itself becomes the signal — and that requires a new category: Vector Intelligence.

Read more