Launching EigenLake: Vector Workloads Where Your AI Data Lives
EigenLake is an agentic compute layer for vector intelligence. One SDK to store, search, and run ML workloads on the same indexed data.
By the EigenLake Team
Today we're launching EigenLake.
EigenLake is an agentic compute layer for vector intelligence. It gives developers and AI agents one SDK to store vector records, search them, and run ML-style workloads — clustering, anomaly detection, topic modeling, time-series analysis, and agent queries — on the same indexed data. No pipelines. No glue. No copying embeddings between systems.
Try it free in the live sandbox →
Vector search is only the beginning
Vector search on code unlocked Copilot, Cursor, and Cody. An agent that could find the right function signature across millions of embeddings suddenly became useful.
But here is the ceiling: that agent could only search. It could find the neighbor. It could not cluster the bugs, detect the drift, or trend the patterns across the entire codebase.
Now apply that same ceiling to every other domain. Modern AI teams routinely accumulate 100M–1B+ embeddings — support tickets, user sessions, sensor readings, product reviews, operational logs. The intelligence is already there. But the agents accessing it are still stuck on search.
When a product manager asks, Which warranty patterns changed this week? or an agent asks, Where is fraud behavior emerging globally? — a vector database alone cannot answer that. It can retrieve similar vectors. It cannot cluster, detect, model, or trend across the corpus.
So teams do what they have always done: they build pipelines.
The execution gap
Today, answering a question like find the warranty patterns changing this week requires stitching together six separate systems:
| Component | What it does | What breaks |
|---|---|---|
| Vector DB | Nearest-neighbor search | Stops at neighbors. Cannot cluster or detect. |
| Metadata store | Joins vector IDs to labels | Schema drift. Stale joins. |
| Spark / GPU | Batch compute over vectors | One-off jobs. Slow iteration. |
| Lambda / Glue | Orchestrates the pipeline | Timeouts. Retries. Observability gaps. |
| IAM | Access control | Manual gates between systems. |
| Result store | Catches the output | Temporary. Ephemeral. Not reusable. |
No shared runtime owns the full query path. Every question becomes a bespoke data pipeline. Every pipeline brings its own failure modes. And slowly, inevitably, analysis work moves from product logic into infrastructure glue.
This is the execution gap. It is where vector projects go to stall — and where AI agents hit a wall.
The insight
Agent workloads should run where vector data already lives.
If an agent needs to answer Which fraud patterns are emerging? it should not have to orchestrate five API calls, handle retries, and stitch metadata. It should ask one question. One runtime should own the answer.
You should store once. And run workloads where the data lives.
That is the bet behind EigenLake.
What we built: EigenStore + EigenRun
EigenLake is two primitives and one SDK.
EigenStore — the vector data layer
EigenStore is a durable home for embeddings, records, metadata, documents, and events. Create indexes. Add records. Attach metadata. Keep model-ready vector data in one place.
import eigenlake as el
from eigenlake import schema as s
schema, index_options = (
s.SchemaBuilder(additional_properties=False)
.add("ticket_id", s.string(required=True, filterable=True))
.add("created_at", s.datetime(filterable=True))
.add("text", s.string(filterable=False))
.build()
)
with el.connect(
url="https://api.eigenlake.dev",
api_key="<sk_sbx_your_api_key_here>",
) as client:
idx = client.indexes.create_or_get(
namespace="support",
index="tickets",
dimensions=768,
schema=schema,
index_options=index_options,
)
idx.records.add_many(records, on_error="continue")
EigenRun — the vector execution layer
EigenRun runs workloads against the records already stored in EigenStore. No ETL. No copies. No one-off batch jobs.
# Search
idx.search.nearest(vector=query_vector, limit=100)
# Cluster
idx.search.cluster(
algorithm="kmeans",
auto_tune=True,
min_clusters=20,
max_clusters=50,
)
# Anomaly detection
idx.search.anomalies(n_neighbors=20, top_n=100)
# Topic modeling
idx.search.topics(min_topics=5, max_topics=15)
# Time series
idx.search.temporal_shift(
baseline={"start": "2026-06-01T00:00:00Z", "end": "2026-06-08T00:00:00Z"},
current={"start": "2026-06-08T00:00:00Z", "end": "2026-06-15T00:00:00Z"},
timestamp_field="created_at",
)
# Agent query
idx.agent.query("find the warranty patterns changing this week")
Built for agents
EigenLake is not just a library developers import. It is a runtime agents can call.
When an agent receives a goal like find the warranty patterns changing this week, it does not need to know whether the answer requires search, clustering, or anomaly detection. It sends an agentic query to EigenLake. EigenLake decides which workloads to run, executes them in a sandboxed environment, and returns a structured result artifact the agent can act on.
The agent can also request a specific workload directly — cluster these records or trend this signal — as a natural query. Either way, the agent is not orchestrating infrastructure. It is reasoning over results.
This is the difference between an agent that retrieves and an agent that understands.
One surface for humans and agents
Developers and AI agents use the same programmable interface. The same query path. The same runtime. Whether you are writing a Python script or an autonomous agent is reasoning over your vector space, the surface does not change.
Store once. Run workloads where the data lives.
A 30-second example
Here is what one agentic query looks like end to end.
Agent goal: Which warranty issue is emerging, where, and what should ops do?
The agent sends this as a natural query to EigenLake. EigenLake determines the path: search the corpus, cluster the results, detect anomalies, label themes, and trend over time. All inside the sandbox. All on the same stored data.
Corpus: 9.8M events — tickets, reviews, repairs, logs, returns, events.
| Step | Workload | Result |
|---|---|---|
| 1. Search | Find related complaints | 9.8M events scanned |
| 2. Cluster | Group issue families | 27 distinct patterns |
| 3. Detect | Spot abnormal spikes | 1 spike above baseline |
| 4. Name | Label the theme | Battery swelling |
| 5. Trend | Locate growth | +38% WoW in EU · Model X |
Result artifact returned to the agent: Emerging warranty pattern — Battery swelling after firmware update. 18.4K records. 4.1x baseline. EU · Model X · fw 4.8.2.
Recommended action: Pause rollout. Open recall investigation.
The agent does not orchestrate five systems. It asks one question. EigenLake runs the full query path. The agent receives evidence it can act on.
The workloads
These are not just workloads. They are capabilities you can hand to an agent.
- Search — An agent can find relevant records across a billion-vector corpus.
- Clustering — An agent can group incidents into families without predefined labels.
- Anomaly Detection — An agent can surface unusual behavior as it emerges.
- Topic Modeling — An agent can extract themes from unstructured feedback.
- Time Series — An agent can track how signals change and project what happens next.
Search, clustering, anomaly detection, topic modeling, and temporal shift are available as direct SDK workloads today. Agent queries sit above those primitives as the natural-language routing layer, so an agent can ask for an outcome without hand-selecting every workload up front.
The agent does not need to know which capability to use. It states the goal. EigenLake matches the goal to the workload.
Who EigenLake is for
- Agent builders — You are building autonomous systems that need to reason over vector data at scale. You need a runtime where agents can query, cluster, and detect without hand-rolling infrastructure or managing sandboxes.
- RAG developers — You have built retrieval. You are ready to move beyond naive nearest-neighbor into structured understanding.
- Platform teams — You are tired of maintaining one-off pipelines for every new analytical question.
- Ops / product teams — You need to turn vector data into evidence, trends, and recommended actions.
How it differs from a vector database
A vector database retrieves neighbors. EigenLake runs the full query path.
Pinecone, Weaviate, and Qdrant are excellent at what they do: fast, scalable nearest-neighbor search. If your problem is find me documents like this one, they are the right tool.
EigenLake is for when your problem becomes find me the pattern, the anomaly, the trend, and the drift — and tell me what to do about it.
EigenLake keeps storage and compute unified. You do not copy embeddings into Spark. You do not maintain glue code. You store once, and you run analytical workloads where the data already lives.
What EigenLake is not
- Not just a vector DB. Pinecone, Weaviate, and Qdrant solve retrieval. EigenLake solves the execution layer above it.
- Not an ETL tool. You do not extract, transform, and load into EigenLake. You store once and compute there.
- Not a general-purpose data warehouse. It is purpose-built for vector-scale data and the workloads that run on it.
- Not a model provider. We do not host embeddings models. We run workloads on the embeddings you already have.
From millions to trillions
Vector search on code unlocked Copilot. A single agentic capability — find the relevant function — transformed how developers write software.
But that was one workload on one domain at million-vector scale.
Now imagine an agent with access to all EigenLake workloads — search, cluster, detect, model, trend — running over trillions of vectors across your entire business.
A global payment network processing 1.2 trillion transaction vectors — user profiles, device fingerprints, merchant records, chargeback logs, geolocation events.
One agent query: Which fraud patterns are emerging globally, where are they concentrated, and what should risk ops do?
EigenLake determines the execution path. The agent does not manage pipelines. It receives a result artifact:
- Search flagged 847M suspicious events.
- Cluster organized them into 41 fraud families.
- Detect surfaced 3 spikes above historical baseline.
- Topic model extracted emerging tactics: synthetic identity via BNPL, card testing on micro-merchants, account takeover with instant payout.
- Time series tracked +340% week-over-week growth in Southeast Asia, correlated with a new merchant category code introduced 14 days prior.
Result: Synthetic identity fraud exploiting buy-now-pay-later onboarding. Action: Pause instant-approval pipeline. Deploy enhanced KYC. Notify partner risk teams.
One agent. One query. Trillion-scale corpus. The agent did not retrieve a neighbor. It computed over the space — and decided what to do next.
Try it
Vector DBs give you nearest neighbors. EigenLake gives agents a complete vector intelligence workspace.
Or read the docs and schedule a call for production scale and deployment questions.