speridlabsJoin us
The shape of intelligence.

The shape of intelligence.

Foundation models that understand the 3D world — perception, reasoning, simulation, generation, in one representation. This is spatial AI.

Today we are introducing Speridlabs. We are coming out of stealth.

We are a research lab building Spatial AI — single foundation models that understand the 3D world and can perceive, reason, simulate, and generate within it. Most of what we still call “computer vision” is, foundationally, a three-dimensional problem. We think the field is ready for a foundation model that treats it that way.

Fig. 01.AFounding presentation

Computer vision is still a 2D field.

For the last decade-plus, computer vision has been getting better at narrow tasks — detection, segmentation, tracking, reconstruction. Each became its own model, its own pipeline, its own benchmark. The field works. It ships products. It has not solved the underlying problem.

The underlying problem is that most of these tasks are 3D in disguise. Reconstruction is 3D. Generation is 3D. Reasoning about what is behind a wall, where an object goes when it leaves the frame, what stays the same when you move — all 3D. The symptom of the gap, in one number: a modern autonomous-driving stack runs more than 40 separate AI models, stitched together by heuristics and a human in the loop. A humanoid robot looks roughly the same. That is not the architecture of a system that understands the world. It is the architecture of a system that approximates pieces of it. Today, when the viewpoint changes, the world does not hold.

Why no foundation model has emerged yet.

If spatial foundation models were straightforward, the largest labs would already have them. They do not. Four reasons, which all interact.

Data. Text foundation models train on trillions of tokens. Image and video models train on billions of samples. The total set of 3D scenes across public datasets is around 100,000, with roughly 5,000 added per day. The gap is two to three orders of magnitude. No amount of architectural work closes it on its own.

Representation. Language has tokens. Images have pixels. 3D has no universally accepted primitive that is compact, learnable, editable, and good for reasoning at the same time. The representation is not an implementation detail — it decides whether the model can think in space or only imitate it.

Evaluation. The benchmarks the field grew up with measure 2D tasks. Spatial consistency across viewpoints, long-horizon reasoning, partial observation, occlusion — those are not yet in the standard sets. You cannot optimise what you cannot measure.

Systems. 3D is heavy. Training, serving, tooling, viewers, formats, pipelines — none of it is optional, and most of it is research in its own right.

What is different now is that the ingredients exist for the first time at once. There is a real foundation-model training culture. Generative models are strong enough to be used as priors. Compute is expensive but accessible enough that serious work can happen outside the largest labs. And the demand is no longer niche — every industry trying to automate physical workflows hits the same wall the moment it needs consistency across viewpoints and time. None of those industries are 3D companies. All of them have a 3D problem.

What we are building.

One foundation model. It holds a coherent 3D representation of what it sees, and stays consistent across viewpoint, occlusion, and time. Reconstruction, understanding, editing, navigation, planning, generation — different queries to the same model. Not one model per task. One shared intelligence layer.

A spatial foundation model has to do two things at once. Imagination — the ability to generate plausible worlds and scenarios for simulation, design, and creation. Grounding — the ability to lock onto real-world structure with verifiable geometry and semantics when precision matters. In the physical world, plausible is not enough. You also need true.

The path forward is staged. We do not think a true Spatial AI foundation model arrives as a single leap. We start by building stable spatial priors from real-world capture — reconstruction as the first foundation layer. We then build a controllable 3D generative model that doubles as a data engine, because the only honest way to attack the 3D data gap is to generate the data we need at scale. On those two, we build the world model itself — first static, then dynamic. Every stage is a useful model on its own and a step closer to the end.

Open by default. And the lab.

We publish. We benchmark. We release weights. Spatial AI is barely a public scientific field yet — there is almost no shared knowledge about how 3D generative world models work internally, and a field cannot accelerate on private knowledge. We intend to be one of the labs that puts that knowledge into the world. Not because openness is a strategy. Because progress in this space is a shared problem.

We are backed by Pear VC and Base10, who are investing in spatial intelligence because they understand the same thing we do — that this is one of the most consequential and hardest unsolved problems in AI right now.

This is spatial intelligence.

— Team Speridlabs

TagsAnnouncementVision

Next · No. 02 · Research

Publishing our SOTA models.

Soon