Skip to content

Datasets

DSG-JIT includes light-weight loaders for common SLAM / VO datasets so you can quickly hook real sequences into the sensor stack, factor graph, and dynamic scene graph.

The goal is:

  • No heavy dependencies (no OpenCV required just to list frames).
  • Simple dataclasses with timestamps + file paths.
  • Easy integration with sensors.* streams and world.* components.

Currently supported:


datasets.tum_rgbd

TumRgbdFrame(t, rgb_path, depth_path=None, pose_quat=None) dataclass

Single RGB-D frame from a TUM RGB-D sequence.

This dataclass stores only light-weight metadata: timestamps and relative file paths. Consumers are responsible for actually loading images/depth (e.g., via OpenCV or Pillow) if desired.

:param t: Timestamp in seconds (as parsed from the TUM text files). :type t: float

:param rgb_path: Relative or absolute path to the RGB image file corresponding to this frame. Typically something like "rgb/1341847980.722988.png". :type rgb_path: str

:param depth_path: Optional path to the depth image associated with this frame. May be None if depth is not available or use_depth=False was passed to the loader. :type depth_path: Optional[str]

:param pose_quat: Optional ground-truth pose as a 7-tuple (tx, ty, tz, qx, qy, qz, qw) in TUM convention. May be None if ground truth is unavailable or alignment was disabled. :type pose_quat: Optional[Tuple[float, float, float, float, float, float, float]]

load_tum_rgbd_sequence(root, use_depth=True, use_groundtruth=False, max_frames=None, max_time_diff=0.02)

Load a TUM RGB-D sequence directory into a list of frames.

The directory is expected to contain standard TUM files such as rgb.txt, depth.txt, and optionally groundtruth.txt. This loader parses metadata and returns a list of :class:TumRgbdFrame instances, but does not actually load images or depth maps.

:param root: Path to the TUM sequence root directory. :type root: Union[str, os.PathLike]

:param use_depth: Whether to attempt to associate depth frames from depth.txt with each RGB frame. :type use_depth: bool

:param use_groundtruth: Whether to attempt to associate ground-truth poses from groundtruth.txt with each RGB frame. :type use_groundtruth: bool

:param max_frames: Optional maximum number of frames to return. If None, the full sequence is loaded. :type max_frames: Optional[int]

:param max_time_diff: Maximum allowed absolute difference in timestamps (seconds) when associating RGB with depth and ground truth. :type max_time_diff: float

:return: A list of TUM RGB-D frames with timestamps, file paths, and optionally ground-truth poses. :rtype: List[TumRgbdFrame]

Source code in dsg-jit/dsg_jit/datasets/tum_rgbd.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
def load_tum_rgbd_sequence(
    root: str | os.PathLike,
    use_depth: bool = True,
    use_groundtruth: bool = False,
    max_frames: Optional[int] = None,
    max_time_diff: float = 0.02,
) -> List[TumRgbdFrame]:
    """
    Load a TUM RGB-D sequence directory into a list of frames.

    The directory is expected to contain standard TUM files such as
    ``rgb.txt``, ``depth.txt``, and optionally ``groundtruth.txt``. This
    loader parses metadata and returns a list of :class:`TumRgbdFrame`
    instances, but does not actually load images or depth maps.

    :param root:
        Path to the TUM sequence root directory.
    :type root: Union[str, os.PathLike]

    :param use_depth:
        Whether to attempt to associate depth frames from ``depth.txt`` with
        each RGB frame.
    :type use_depth: bool

    :param use_groundtruth:
        Whether to attempt to associate ground-truth poses from
        ``groundtruth.txt`` with each RGB frame.
    :type use_groundtruth: bool

    :param max_frames:
        Optional maximum number of frames to return. If ``None``, the full
        sequence is loaded.
    :type max_frames: Optional[int]

    :param max_time_diff:
        Maximum allowed absolute difference in timestamps (seconds) when
        associating RGB with depth and ground truth.
    :type max_time_diff: float

    :return:
        A list of TUM RGB-D frames with timestamps, file paths, and optionally
        ground-truth poses.
    :rtype: List[TumRgbdFrame]
    """
    root_path = Path(root)

    rgb_entries = _parse_tum_list_file(root_path / "rgb.txt")

    depth_entries: List[Tuple[float, str]] = []
    if use_depth and (root_path / "depth.txt").exists():
        depth_entries = _parse_tum_list_file(root_path / "depth.txt")

    gt_entries: List[Tuple[float, Tuple[float, float, float, float, float, float, float]]] = []
    if use_groundtruth and (root_path / "groundtruth.txt").exists():
        gt_entries = _parse_tum_groundtruth(root_path / "groundtruth.txt")

    depth_assoc: Dict[int, Optional[int]] = {}
    gt_assoc: Dict[int, Optional[int]] = {}

    if depth_entries:
        depth_assoc = _associate_by_timestamp(rgb_entries, depth_entries, max_diff=max_time_diff)

    if gt_entries:
        # Convert to (t, dummy_path) to reuse association logic
        gt_ts_paths = [(t, "") for (t, _pose) in gt_entries]
        gt_assoc = _associate_by_timestamp(rgb_entries, gt_ts_paths, max_diff=max_time_diff)

    frames: List[TumRgbdFrame] = []

    for i, (t_rgb, rgb_rel) in enumerate(rgb_entries):
        depth_rel: Optional[str] = None
        pose_quat: Optional[Tuple[float, float, float, float, float, float, float]] = None

        if depth_entries and i in depth_assoc and depth_assoc[i] is not None:
            depth_rel = depth_entries[depth_assoc[i]][1]

        if gt_entries and i in gt_assoc and gt_assoc[i] is not None:
            pose_quat = gt_entries[gt_assoc[i]][1]

        frames.append(
            TumRgbdFrame(
                t=t_rgb,
                rgb_path=str(root_path / rgb_rel),
                depth_path=str(root_path / depth_rel) if depth_rel is not None else None,
                pose_quat=pose_quat,
            )
        )

        if max_frames is not None and len(frames) >= max_frames:
            break

    return frames

datasets.kitti_odometry

KittiOdomFrame(seq, idx, t, left_path, right_path=None, velo_path=None, T_w_cam0=None) dataclass

Single frame from the KITTI Odometry dataset.

This dataclass provides file paths and optional ground-truth pose for a particular frame index in a given sequence.

:param seq: KITTI odometry sequence ID (e.g. "00", "05"). :type seq: str

:param idx: Integer frame index within the sequence. :type idx: int

:param t: Approximate timestamp in seconds. Many downstream pipelines assume KITTI odometry runs at 10 Hz, so a common convention is t = idx / 10.0. :type t: float

:param left_path: Path to the left camera image (image_0) for this frame. :type left_path: str

:param right_path: Optional path to the right camera image (image_1). May be None if not available or desired. :type right_path: Optional[str]

:param velo_path: Optional path to the LiDAR point cloud (velodyne). May be None if not available or desired. :type velo_path: Optional[str]

:param T_w_cam0: Optional 4x4 homogeneous transform from camera-0 to world frame as a flattened 16-element tuple in row-major order. This is derived from the official poses/<seq>.txt file if available. :type T_w_cam0: Optional[Tuple[float, ...]]

load_kitti_odometry_sequence(root, seq, load_right=True, load_velodyne=False, with_poses=True, max_frames=None)

Load a KITTI Odometry sequence into a list of frames.

This helper assumes the standard KITTI directory structure:

.. code-block::

root/
  sequences/
    00/
      image_0/
      image_1/
      velodyne/
      calib.txt
  poses/
    00.txt

Only metadata (paths and ground-truth transforms) is loaded; images and point clouds are not read into memory.

:param root: Path to the KITTI odometry dataset root directory. :type root: Union[str, os.PathLike]

:param seq: Sequence ID string (e.g. "00", "01"). :type seq: str

:param load_right: Whether to populate right_path pointing to image_1. If False, the field will always be None. :type load_right: bool

:param load_velodyne: Whether to populate velo_path pointing to velodyne scans. If False, the field will always be None. :type load_velodyne: bool

:param with_poses: Whether to attempt to load ground-truth poses from poses/<seq>.txt. :type with_poses: bool

:param max_frames: Optional maximum number of frames to return. If None, all available frames in the left camera directory are used. :type max_frames: Optional[int]

:return: List of :class:KittiOdomFrame entries with timestamps, paths, and optionally ground-truth transforms. :rtype: List[KittiOdomFrame]

Source code in dsg-jit/dsg_jit/datasets/kitti_odometry.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
def load_kitti_odometry_sequence(
    root: str | os.PathLike,
    seq: str,
    load_right: bool = True,
    load_velodyne: bool = False,
    with_poses: bool = True,
    max_frames: Optional[int] = None,
) -> List[KittiOdomFrame]:
    """
    Load a KITTI Odometry sequence into a list of frames.

    This helper assumes the standard KITTI directory structure:

    .. code-block::

        root/
          sequences/
            00/
              image_0/
              image_1/
              velodyne/
              calib.txt
          poses/
            00.txt

    Only metadata (paths and ground-truth transforms) is loaded; images and
    point clouds are not read into memory.

    :param root:
        Path to the KITTI odometry dataset root directory.
    :type root: Union[str, os.PathLike]

    :param seq:
        Sequence ID string (e.g. ``"00"``, ``"01"``).
    :type seq: str

    :param load_right:
        Whether to populate ``right_path`` pointing to ``image_1``. If ``False``,
        the field will always be ``None``.
    :type load_right: bool

    :param load_velodyne:
        Whether to populate ``velo_path`` pointing to ``velodyne`` scans. If
        ``False``, the field will always be ``None``.
    :type load_velodyne: bool

    :param with_poses:
        Whether to attempt to load ground-truth poses from ``poses/<seq>.txt``.
    :type with_poses: bool

    :param max_frames:
        Optional maximum number of frames to return. If ``None``, all available
        frames in the left camera directory are used.
    :type max_frames: Optional[int]

    :return:
        List of :class:`KittiOdomFrame` entries with timestamps, paths, and
        optionally ground-truth transforms.
    :rtype: List[KittiOdomFrame]
    """
    root_path = Path(root)
    seq_str = f"{int(seq):02d}"  # normalize "0" -> "00"

    seq_dir = root_path / "sequences" / seq_str
    left_dir = seq_dir / "image_0"
    right_dir = seq_dir / "image_1"
    velo_dir = seq_dir / "velodyne"

    if not left_dir.exists():
        raise FileNotFoundError(f"Left image directory not found: {left_dir}")

    # Determine frame indices from left camera images
    left_files = sorted(left_dir.glob("*.png"))
    if not left_files:
        raise FileNotFoundError(f"No left images found in {left_dir}")

    # Optional poses
    poses: List[Tuple[float, ...]] = []
    if with_poses:
        poses_path = root_path / "poses" / f"{seq_str}.txt"
        poses = _load_kitti_poses(poses_path)

    frames: List[KittiOdomFrame] = []

    for idx, left_path in enumerate(left_files):
        t = idx / 10.0  # KITTI odometry is ~10 Hz; good enough for indexing.

        right_path: Optional[str] = None
        velo_path: Optional[str] = None
        T_w_cam0: Optional[Tuple[float, ...]] = None

        if load_right:
            candidate = right_dir / left_path.name
            if candidate.exists():
                right_path = str(candidate)

        if load_velodyne:
            # KITTI velodyne uses "*.bin" with same numeric frame index
            stem = left_path.stem  # e.g. "000000"
            candidate = velo_dir / f"{stem}.bin"
            if candidate.exists():
                velo_path = str(candidate)

        if with_poses and idx < len(poses):
            T_w_cam0 = poses[idx]

        frames.append(
            KittiOdomFrame(
                seq=seq_str,
                idx=idx,
                t=t,
                left_path=str(left_path),
                right_path=right_path,
                velo_path=velo_path,
                T_w_cam0=T_w_cam0,
            )
        )

        if max_frames is not None and len(frames) >= max_frames:
            break

    return frames