DSG-JIT Architecture
A unified, differentiable 3D Dynamic Scene Graph and SLAM engine — built on JAX, manifolds, and factor graphs.
Overview
DSG-JIT is built around one central idea:
A 3D world model should be fully differentiable, jointly optimized, and JIT-compiled for real-time performance.
To achieve this, the system is organized into five coordinated layers, each built on top of a general-purpose factor graph and SE3 manifold engine.

Each layer is optional, modular, and composable — enabling applications in classical SLAM, neural fields, robotics, and world-modeling.
1. Core Layer — Differentiable Factor Graph Engine
The core provides the primitive mathematical and structural building blocks:
Core Responsibilities
- Variable management (poses, voxels, arbitrary vectors), via a backend‑agnostic data structure.
- Factor definitions (priors, odometry, smoothness, observations).
- Differentiable residual registry (now owned by the WorldModel).
- Backend‑agnostic factor graph interface (supports pluggable backends).
- JIT‑compiled residual builders (vmap‑optimized for performance).
- Unified flat‑state storage and index maps for optimization.
- Automatic Jacobian computation (via JAX autodiff).
Key Modules
| Module | Purpose |
|---|---|
world.model |
Central owner of variables, factors, residuals, and state packing |
core.factor_graph |
Lightweight backend factor graph used by WorldModel (pluggable) |
core.types |
Data structures (Variable, Factor, NodeId, FactorId) |
core.math3d |
Vector/rotation utilities, SE3 helpers |
This layer is geometry-agnostic — it does not know about voxels, rooms, or trajectories.
Everything above is built on these abstractions.
2. Optimization Layer — JIT Gauss–Newton on Manifolds
The optimization layer transforms the factor graph into a high-performance nonlinear solver.
Features
- Gauss–Newton with line-search, fully JIT‑compiled.
- Manifold retractors for SE3 and Euclidean variables.
- Pure JAX solver loop with vmap‑accelerated residual evaluation.
- JIT‑compiled objective + residuals with caching of compiled solvers in WorldModel.
- Ultra‑low Python overhead — solver runs nearly entirely in XLA.
Modules
| Module | Purpose |
|---|---|
optimization.solvers |
Gauss–Newton, manifold-aware logic |
optimization.jit_wrappers |
One‑line JIT interfaces using the new WorldModel residual API |
This layer is responsible for the 50–1000x performance boost seen in benchmarks.
3. SLAM Layer — Residual Functions & Manifolds
This layer provides the differentiable measurements registered inside the WorldModel. These residuals implement the geometric logic of SE3 constraints, landmarks, voxel consistency, and smoothness. The factor graph simply stores variable/factor connectivity; the WorldModel owns the residual functions and packing logic.
Supported Manifolds
- SE3 poses
- 3D Euclidean points
- Voxel nodes (R³)
Provided Residuals
| Residual Type | Description |
|---|---|
se3_geodesic |
Pose‑to‑pose geodesic constraint |
odom_se3 |
Learnable or fixed SE3 odometry factor |
pose_landmark_relative |
Relative landmark measurement |
pose_voxel_point |
Voxel observation constraint |
voxel_smoothness |
Grid regularization term |
prior |
Generic variable prior |
All residuals are now registered with WorldModel.register_residual. The FactorGraph stores only structure — not residual logic.
Modules
| Module | Purpose |
|---|---|
slam.measurements |
All differentiable SLAM residuals |
slam.manifold |
Manifold utilities (exp/log maps, Jacobian-safe ops) |
This layer enables standard SLAM, learnable SLAM, and hybrid SLAM + voxel systems.
4. Scene Graph Layer — 3D Dynamic Scene Graph
This layer introduces the structural and semantic relationships that turn raw geometry into a world model.
Entities
- Poses
- Places
- Rooms
- Agents
- Voxels
- Attachments
- Trajectories
Relations
- Pose → Place membership
- Agent → Pose trajectory
- Room → Place grouping
- Voxel → Place attachment
Modules
| Module | Purpose |
|---|---|
scene_graph.entities |
Node classes (PoseNode, PlaceNode, RoomNode, VoxelNode, etc.) |
scene_graph.relations |
Relations & constraints that form the DSG |
Features
- Geometry + semantics in one structure
- Differentiable constraints between DSG entities
- Realtime graph growth (future)
Future extensions allow the Scene Graph to store multi‑resolution geometric layers (voxels, meshes, raw points, NeRF latent fields). DSG‑JIT is designed so that future NeRF modules can attach object‑specific neural fields directly to DSG nodes while still participating in global optimization.
The DSG operates as a high-level structural layer over the SLAM + voxel system.
5. World Layer — Unified World Model
The world layer combines SLAM, voxels, and the scene graph into one coherent system.
SceneGraphWorld
SceneGraphWorld is the highest‑level API and the primary entry point for users. It manages:
- Variables and factors (via the underlying factor graph backend)
- The residual registry
- JIT‑compiled residual and solver construction
- Optimization of full DSG + voxel + SLAM systems
- Integration with differentiable learning pipelines
- Multi‑resolution storage formats (planned)
- Planned NeRF attachments for objects and rooms
WorldModel Responsibilities
- Owns variables, factors, manifold types, and residual functions.
- Provides unified
pack_state/unpack_statelogic. - Builds JIT‑optimized residuals via
build_residualand specialized builders. - Groups factors by type and applies
vmapfor high‑throughput evaluation. - Caches compiled solvers for real‑time re‑optimization.
- Supports backend‑pluggable FactorGraph implementations.
Training & Learning
The world layer exposes the learnable parameters used in the experiments:
- Odom SE3 measurements
- Voxel observation points
- Factor-type weights
- Joint hybrid SE3 + voxel models
Modules
| Module | Purpose |
|---|---|
world.model |
SceneGraphWorld implementation |
world.voxel_grid |
Voxel layers + smoothness |
world.training |
Trainer-style learning loop |
Information Flow Between Layers
Sensors → SLAM Residuals → WorldModel (variables + residuals + factor graph)
→ JIT Solver (vmap‑optimized) → SceneGraphWorld → Applications
- Sensors provide point clouds, images, IMU.
- SLAM residuals convert observations to geometric factors.
- WorldModel manages variables, residuals, and factor graph structure.
- JIT solver (vmap‑optimized) optimizes the entire state.
- SceneGraphWorld structures it into a semantic world.
- Applications consume the optimized graph (robotics, NeRFs, planning, etc.).
Why This Design?
1. Performance
- JIT residuals + JIT Gauss–Newton
- SE3 on manifolds
- Minimal Python overhead
- 50–1000× faster than naïve Python solvers
- vmap grouping of factor types for large batching speedups
2. Differentiability
- JAX grad through:
- Odom parameters
- Voxel obs
- Factor weights
- Entire scene graph
3. Modularity
- SLAM or NeRF-only systems
- DSG-only semantic reasoning
- Hybrid world models
- Incremental or batch optimization
- Supports pluggable factor‑graph backends (Python, Rust, C++) as long as they implement the FactorGraph Standard.
4. Extensibility
This architecture supports future additions:
- Photometric residuals
- Neural field appearance models
- Real-world robotics datasets
- Differentiable planning & policies
- Joint SLAM + segmentation
- NeRF‑augmented objects and rooms (future)
- Multi‑resolution geometric storage (voxels, meshes, point clouds)
The Modern DSG‑JIT Architecture (2025+)
DSG‑JIT has evolved into a WorldModel‑centric system:
- The FactorGraph is a backend implementation detail.
- All differentiability, residuals, and packing logic live in the WorldModel.
- Solvers operate on JIT‑compiled, vmap‑batched functions.
- SceneGraphWorld provides a rich, extensible API that will support NeRFs, hierarchical geometry, and large‑scale world‑modeling.
This architecture enables DSG‑JIT to function as a real‑time differentiable world model suitable for robotics, simulation, mapping, and neural field research.