Back to Blog

AMI Labs vs. World Labs: Two Billion-Dollar Visions for World Models

artifocialMarch 29, 202613 min read

World Models landscape and engineering challenges in 2026-2027

AMI Labs vs. World Labs: Two Billion-Dollar Visions for World Models

Figure: illustration of WorldLab's Marble

W13 Basic Tutorial 2 · Intermediate · March 2026

Research Area: World Models

Companion Notebooks

#NotebookFocusCompute
0000_lewm_toy_world_model.ipynbToy JEPA world model from scratch — encoder, predictor, SIGReg regularizationCPU only
0101_jepa_latent_dynamics_planning.ipynbLatent dynamics and planning — CEM, MPC, speed benchmarksCPU only

Overview

World models in 2025–2026 have split into two major approaches backed by unprecedented funding. This tutorial maps the full landscape — with deep comparisons of AMI Labs (energy-based) vs. World Labs (generative), updated with this week's LeWorldModel release and Sora's shutdown.

We first covered the AMI–World Labs rivalry in our W11 blog post. This tutorial deepens that analysis with technical specifics and practical implications.


1. The $2 Billion Rivalry: AMI Labs vs. World Labs

AMI Labs: Predict, Don't Generate

AMI Labs was co-founded by Yann LeCun — founder of LeNet, Turing Award winner (2018) and former VP & Chief AI Scientist at Meta — after decades of arguing that autoregressive generation is the wrong paradigm for understanding the physical world.

Funding: $1.03 billion seed (March 2026), $3.5B pre-money valuation. Backed by Bezos Expeditions, Eric Schmidt, Mark Cuban, Xavier Niel, Tim Berners-Lee.

Team: Saining Xie (Chief Science Officer, creator of DiT — the architecture behind Sora), Pascale Fung (Chief Research & Innovation Officer), Michael Rabbat (VP of World Models), Laurent Solly (COO, formerly Meta VP Europe).

Technical approach: JEPA (Joint Embedding Predictive Architecture) — learn abstract representations of the world by predicting in latent space, never generating pixels. Train on video, audio, sensor data, and lidar.

Target applications: Industrial automation, robotics, healthcare — domains where hallucinating physics has real consequences.

The irony: AMI's Chief Science Officer created the architecture that powered Sora, the model OpenAI just killed for being too expensive. Xie's move from generative to energy-based approaches is itself a signal about where the field is heading.

World Labs: Generate the World

World Labs was founded by Fei-Fei Li — the Stanford professor who created ImageNet, the dataset that catalyzed the deep learning revolution.

Funding: $1 billion total including $200M strategic investment from Autodesk (February 2026), $5B valuation.

Product: Marble — generally available since November 2025, a generative world model that creates persistent, editable 3D environments from any input modality.

Technical approach: Generative synthesis via diffusion + 3D Gaussian splatting. Build visual worlds you can navigate, edit, and export.

Target applications: Creative tools, gaming, film, architecture, robotics simulation. Autodesk integration puts it directly into professional 3D workflows.

Head-to-Head Comparison

DimensionAMI Labs (LeCun)World Labs (Fei-Fei Li)
PhilosophyUnderstanding first — representations enable reasoningGeneration first — visual worlds enable everything
ArchitectureJEPA — joint embedding predictiveGenerative — diffusion + 3D Gaussian splatting
OutputLatent representations (invisible — for planning/control)3D visual worlds (visible — for navigation/editing)
StageResearch-first (no product yet)Product-first (Marble shipping since Nov 2025)
Revenue modelTBD — long-term scientific projectFreemium SaaS + enterprise (Autodesk integration)
Compute footprintLeWM: 15M params, 1 GPU, hoursMarble: large-scale generation infrastructure
Open researchV-JEPA 2, VL-JEPA, LeWM — all open-sourceNo full architecture paper; commercial product
Robotics playDirect control via latent planning (V-JEPA 2-AC)Indirect — generate sim environments via NVIDIA Isaac
Creative playNone — can't render visualsStrong — Chisel editor, 3D world creation
Biggest strengthCompute efficiency + physical understandingImmediate utility + visual quality
Biggest weaknessCan reason about worlds it can't show youCan show you worlds it may not understand

The Complementarity Thesis

These companies may be building different layers of the same stack. A complete world model system needs both capabilities: understanding (what will happen if the robot pushes this object?) and rendering (what does this room look like from the robot's perspective?). AMI's representations could feed World Labs' renderer, or vice versa.


2. World Labs Marble: Deep Dive

Architecture and Capabilities

Marble generates persistent, editable 3D environments from multimodal inputs. While World Labs hasn't published a full architectural paper, the technical approach is clear from their product.

Input modalities: Text prompts, photographs, videos, panoramas, or coarse 3D layouts. Marble is a truly multimodal world model — any of these inputs can seed a 3D world.

3D Representation: Gaussian Splats

Marble's primary output format is 3D Gaussian splatting. Instead of traditional polygon meshes, scenes are represented as millions of semitransparent 3D Gaussians, each defined by position, scale, color (with spherical harmonics for view-dependent appearance), and opacity.

The rendering pipeline projects these Gaussians onto the camera plane and alpha-composites them, enabling real-time, photorealistic rendering from any viewpoint. This approach was pioneered by 3D Gaussian Splatting for Real-Time Radiance Field Rendering (Kerbl et al., 2023) and Marble scales it to AI-generated worlds.

The Chisel Editor

Marble includes a hybrid editing interface called Chisel. Users draw rough spatial layouts (walls, floors, objects as blocks), add natural language descriptions for each element, and Marble fills in photorealistic visual detail. This coarse-to-fine workflow — human intent at the structural level, AI generation at the detail level — is a practical solution to the control problem in generative AI.

Export Formats: Gaussian splats (highest fidelity), triangle meshes (including collider meshes for physics simulation), or video (rendered walkthroughs from specified camera paths).

Marble for Robotics: NVIDIA Isaac Integration

Perhaps the most significant capability for our field: Marble worlds can be imported into NVIDIA Isaac Sim for robotics training:

  1. Describe an environment in text or provide a photo
  2. Marble generates a 3D world with physics-compatible meshes
  3. Import into Isaac Sim as a simulation environment
  4. Train robotic agents in this generated world

This pipeline dramatically reduces the cost of creating diverse simulation environments — a major bottleneck in sim-to-real robotics research.


3. LeWorldModel: This Week's Breakthrough

Why LeWM Changes the Landscape

LeWorldModel (March 23, 2026) is the first JEPA that trains stably end-to-end from raw pixels — no stop-gradients, no EMA, no pretrained encoders, no multi-term losses.

For the intermediate reader, this matters because previous JEPA world models (like DINO-WM) required pretrained DINOv2 features — meaning they weren't truly learning from scratch. LeWM uses a ViT-Tiny encoder (~5M params) and a transformer predictor (~10M params), totaling ~15M parameters, trained from pixel input with just two loss terms.

The engineering angle: LeWM achieves 96% success rate on Push-T (block manipulation) while planning in <1 second. Foundation-model-based alternatives take 48× longer. The code is open source, building on stable-worldmodel for environment management and evaluation.

What it validates: AMI's thesis that energy-based world models can work efficiently from raw sensory input — not just from pretrained features. This is the missing engineering proof that JEPA scales down as well as it scales up (V-JEPA 2 at 1.2B parameters).


4. The Post-Sora Generative Landscape

Sora's Shutdown and What It Means

On March 24, 2026, OpenAI discontinued Sora — shutting down the video generation platform and unwinding a $1B Disney deal. The reason: generative video consumed too much compute relative to its revenue potential.

For the world models field, this is a watershed. The most well-resourced generative video model in the world couldn't sustain itself commercially. This doesn't invalidate all generative world models (Marble generates 3D environments, not video — a different problem with different economics), but it confirms that pixel-level video simulation faces a fundamental cost problem.

What Remains: Veo and Open Source

Veo 3.1 (Google, January 2026) pushes to 4K with reference-image conditioning. Google has deeper pockets and more infrastructure patience than OpenAI, so Veo may survive where Sora couldn't.

Open-source video generation is thriving: LTX 2.3 (Lightricks, 22B params, 4K, open-source desktop editor), Helios (Peking U / ByteDance / Canva, 14B, real-time at 19.5 FPS on H100), and HunyuanVideo WorldPlay (Tencent, with RL post-training code for interactive world models at 24 FPS).

Genie: The Interactive Frontier

Genie (Google DeepMind, 2024) demonstrated early interactive capabilities — generating playable 2D game environments from images. Scaling this to 3D, physically-consistent interactive worlds remains the open frontier for generative approaches.


5. Robotics-Focused World Models

DayDreamer: Real-World Robot Learning

DayDreamer (Wu et al., 2022) showed that Dreamer-style world models can train robots in the real world within hours. The robot alternates between brief real-world interactions (collecting data) and extended imagination episodes (training the policy in the world model).

TD-MPC2: Scaling Model-Based Control

TD-MPC2 (Hansen et al., 2024) scales model-based RL to hundreds of millions of parameters across 80+ continuous control tasks. The key design choice: keep everything in latent space — no pixel reconstruction, just latent dynamics and trajectory optimization.

TD-MPC2 demonstrates that model-based RL follows scaling laws similar to language models: bigger models, more diverse training tasks, better zero-shot transfer.

Self-Improving World Models (ASIM)

From the ICLR 2026 RSI Workshop (our W12 coverage): ASIM (OpenReview) pairs forward world models with inverse models for self-improvement through cycle-consistency, enabling self-supervised improvement with 50%+ less data.

R2-Dreamer: Decoder-Free World Models

R2-Dreamer (March 18, 2026) proposes a decoder-free MBRL framework using a Barlow Twins-inspired redundancy-reduction objective. On DeepMind Control Suite and Meta-World, R2-Dreamer matches DreamerV3 and TD-MPC2 while training 1.59× faster. Notably, it excels on tasks with tiny task-relevant objects — exactly where decoder-based methods waste capacity.


6. Emerging Techniques: Flow-Matching and Hybrids

Flow-Matching / Rectified Flow

A key technical trend making generative world models faster: flow matching (Lipman et al., ICLR 2023) learns a direct transport path between noise and data distributions, avoiding the many small steps required by diffusion models. Rectified flow (Liu et al., ICLR 2023) straightens these paths further, enabling few-step generation. The combination proved its commercial viability in Stable Diffusion 3 (Esser et al., 2024), which uses rectified flow transformers to achieve high-resolution image synthesis with fewer sampling steps than traditional diffusion.

For world models, this means: real-time or near-real-time generation of new environmental states — critical for interactive applications where latency matters.

Hybrid Architectures

The most promising direction may be combining approaches:

  • Energy-based encoders (JEPA/LeWM-style) for perception and state estimation
  • Flow-based or diffusion decoders for visual rendering when needed
  • Latent dynamics models (RSSM as in DreamerV3, TD-MPC style) for temporal prediction
  • Planning modules that operate in the shared latent space

No single system currently implements this full stack, but the components exist. Integration is the engineering challenge for 2026–2027.


Landscape Summary

ApproachKey PlayersStrengthLimitationThis Week
Generative 3DWorld Labs (Marble)Persistent, editable, exportable 3D worldsNo physical dynamics — static environmentsStrongest generative play after Sora's exit
Video foundationGoogle (Veo 3.1), open-source (LTX, Helios)Photorealistic, implicit physicsPassive, physically inconsistent, expensiveSora killed — field loses its flagship
Energy-basedAMI Labs, Meta (V-JEPA 2), LeWMData-efficient, planning-ready, compute-lightNo visual generation — representations onlyLeWM proves end-to-end training from pixels
Robotics-focusedDayDreamer, TD-MPC2, R2-DreamerReal-world deployment, imagination-based trainingTask-specific, limited generalizationR2-Dreamer: decoder-free, 1.59× faster than DreamerV3
Hybrids / flow-basedEmerging researchSpeed + quality, real-time interaction potentialEarly stage, no dominant architecture

Connecting the Threads: RSI Meets World Models

Readers following our W10–W12 series on recursive self-improvement will recognize familiar failure modes in the world models landscape. Representation collapse in JEPA (the encoder mapping everything to a constant) is structurally identical to reward hacking in self-play (the agent exploiting a shortcut instead of genuinely improving). Both are cases where the learning objective is technically satisfied while the system learns nothing useful.

The solutions rhyme too. LeWM's SIGReg regularizer forces distributional structure on the latent space — preventing the encoder from collapsing. Verification-based self-training (STaR, ReST) forces correctness structure on generated solutions — preventing the model from reward-hacking. R2-Dreamer's Barlow Twins objective reduces redundancy in learned representations, echoing how diversity-promoting mechanisms prevent mode collapse in self-play.

ASIM (from our W12 RSI Workshop coverage) makes the connection explicit: it applies RSI principles — cycle-consistency between forward and inverse models — directly to world model self-improvement. This is the intersection of our W10–W12 and W13–W15 arcs: self-improving world models that bootstrap their own understanding of physical reality.

The practical implication for our notebooks: the regularization techniques we implement in NB 00 (SIGReg-style collapse prevention) are the world model analog of how verification filtering prevents reward hacking in W11's STaR notebook. Same principle, new domain, same engineering instinct. NB 01 then extends this with latent-space planning via CEM and Model Predictive Control.


Key Takeaways

  1. AMI Labs and World Labs represent two sides of the same coin — understanding vs. generation — each backed by ~$1B and legendary founders
  2. World Labs Marble is the most commercially advanced generative world model, with real products and Autodesk integration
  3. LeWorldModel proves that JEPA works from raw pixels on a single GPU — removing the biggest engineering barrier to energy-based world models
  4. Sora's shutdown shows that pixel-level video world simulation faces a fundamental cost problem, strengthening the case for latent-space approaches
  5. Robotics world models (DayDreamer, TD-MPC2, R2-Dreamer) are converging on decoder-free, latent-space architectures
  6. The hybrid thesis — energy-based encoders + generative decoders — is gaining traction as the most complete architecture
  7. The RSI throughline: Collapse prevention in world models mirrors reward-hack prevention in self-play — same challenge, same engineering principles, connecting our entire W10–W15 content arc

References



Stay connected:

Comments