We are a
Spatial AI lab.
We build foundation models that understand the 3D world — perceive, reason, simulate, and generate within it. One representation, many tasks.
For the last decade, computer vision mostly meant getting better at reading pixels. Detect, segment, track, classify. It worked, it shipped, and it stayed flat.
Spatial AI is the idea that the model should carry a coherent 3D representation of what it sees — the way humans do without thinking. Objects exist in space. Surfaces have geometry. Occlusion is real. The world stays consistent when you move.
Reconstruction, understanding, editing, and generation become different ways of interrogating the same underlying world model. That is a platform shift for the whole field.
Image models produce images. Video models produce motion. Spatial models produce scenes you can walk into.

A 3D world model.
Foundation model · spatial reconstruction & generation
Build 3D scenes and objects from a text prompt, or from a handful of images. One foundation, many ways to interrogate it.
- 01text and/or sparse views in
- 023D coherent scene out
- 03queryable, navigable, persistent
The world is not flat,
AI shouldn't be either.
CV is fragmented.
Computer vision — how AI perceives the world — is still many narrow models for many narrow tasks. CV today sits where NLP did seven years ago: lots of pipelines, no shared foundation.
3D is foundational.
3D is not a feature you bolt on. It is the substrate — under perception, under planning, under interaction.
The 3D gap.
Text, images, and video each have foundation models. 3D does not — yet. Closing that gap is the open problem we are building for.