Datasets

DSG-JIT includes light-weight loaders for common SLAM / VO datasets so you can quickly hook real sequences into the sensor stack, factor graph, and dynamic scene graph.

The goal is:

No heavy dependencies (no OpenCV required just to list frames).
Simple dataclasses with timestamps + file paths.
Easy integration with sensors.* streams and world.* components.

Currently supported:

`datasets.tum_rgbd`

`TumRgbdFrame(t, rgb_path, depth_path=None, pose_quat=None)` `dataclass`

Single RGB-D frame from a TUM RGB-D sequence.

This dataclass stores only light-weight metadata: timestamps and relative file paths. Consumers are responsible for actually loading images/depth (e.g., via OpenCV or Pillow) if desired.

:param t: Timestamp in seconds (as parsed from the TUM text files). :type t: float

:param rgb_path: Relative or absolute path to the RGB image file corresponding to this frame. Typically something like "rgb/1341847980.722988.png". :type rgb_path: str

:param depth_path: Optional path to the depth image associated with this frame. May be None if depth is not available or use_depth=False was passed to the loader. :type depth_path: Optional[str]

:param pose_quat: Optional ground-truth pose as a 7-tuple (tx, ty, tz, qx, qy, qz, qw) in TUM convention. May be None if ground truth is unavailable or alignment was disabled. :type pose_quat: Optional[Tuple[float, float, float, float, float, float, float]]

`load_tum_rgbd_sequence(root, use_depth=True, use_groundtruth=False, max_frames=None, max_time_diff=0.02)`

Load a TUM RGB-D sequence directory into a list of frames.

The directory is expected to contain standard TUM files such as rgb.txt, depth.txt, and optionally groundtruth.txt. This loader parses metadata and returns a list of :class:TumRgbdFrame instances, but does not actually load images or depth maps.

:param root: Path to the TUM sequence root directory. :type root: Union[str, os.PathLike]

:param use_depth: Whether to attempt to associate depth frames from depth.txt with each RGB frame. :type use_depth: bool

:param use_groundtruth: Whether to attempt to associate ground-truth poses from groundtruth.txt with each RGB frame. :type use_groundtruth: bool

:param max_frames: Optional maximum number of frames to return. If None, the full sequence is loaded. :type max_frames: Optional[int]

:param max_time_diff: Maximum allowed absolute difference in timestamps (seconds) when associating RGB with depth and ground truth. :type max_time_diff: float

:return: A list of TUM RGB-D frames with timestamps, file paths, and optionally ground-truth poses. :rtype: List[TumRgbdFrame]

Source code in dsg-jit/dsg_jit/datasets/tum_rgbd.py

def load_tum_rgbd_sequence(
    root: str | os.PathLike,
    use_depth: bool = True,
    use_groundtruth: bool = False,
    max_frames: Optional[int] = None,
    max_time_diff: float = 0.02,
) -> List[TumRgbdFrame]:
    """
    Load a TUM RGB-D sequence directory into a list of frames.

    The directory is expected to contain standard TUM files such as
    ``rgb.txt``, ``depth.txt``, and optionally ``groundtruth.txt``. This
    loader parses metadata and returns a list of :class:`TumRgbdFrame`
    instances, but does not actually load images or depth maps.

    :param root:
        Path to the TUM sequence root directory.
    :type root: Union[str, os.PathLike]

    :param use_depth:
        Whether to attempt to associate depth frames from ``depth.txt`` with
        each RGB frame.
    :type use_depth: bool

    :param use_groundtruth:
        Whether to attempt to associate ground-truth poses from
        ``groundtruth.txt`` with each RGB frame.
    :type use_groundtruth: bool

    :param max_frames:
        Optional maximum number of frames to return. If ``None``, the full
        sequence is loaded.
    :type max_frames: Optional[int]

    :param max_time_diff:
        Maximum allowed absolute difference in timestamps (seconds) when
        associating RGB with depth and ground truth.
    :type max_time_diff: float

    :return:
        A list of TUM RGB-D frames with timestamps, file paths, and optionally
        ground-truth poses.
    :rtype: List[TumRgbdFrame]
    """
    root_path = Path(root)

    rgb_entries = _parse_tum_list_file(root_path / "rgb.txt")

    depth_entries: List[Tuple[float, str]] = []
    if use_depth and (root_path / "depth.txt").exists():
        depth_entries = _parse_tum_list_file(root_path / "depth.txt")

    gt_entries: List[Tuple[float, Tuple[float, float, float, float, float, float, float]]] = []
    if use_groundtruth and (root_path / "groundtruth.txt").exists():
        gt_entries = _parse_tum_groundtruth(root_path / "groundtruth.txt")

    depth_assoc: Dict[int, Optional[int]] = {}
    gt_assoc: Dict[int, Optional[int]] = {}

    if depth_entries:
        depth_assoc = _associate_by_timestamp(rgb_entries, depth_entries, max_diff=max_time_diff)

    if gt_entries:
        # Convert to (t, dummy_path) to reuse association logic
        gt_ts_paths = [(t, "") for (t, _pose) in gt_entries]
        gt_assoc = _associate_by_timestamp(rgb_entries, gt_ts_paths, max_diff=max_time_diff)

    frames: List[TumRgbdFrame] = []

    for i, (t_rgb, rgb_rel) in enumerate(rgb_entries):
        depth_rel: Optional[str] = None
        pose_quat: Optional[Tuple[float, float, float, float, float, float, float]] = None

        if depth_entries and i in depth_assoc and depth_assoc[i] is not None:
            depth_rel = depth_entries[depth_assoc[i]][1]

        if gt_entries and i in gt_assoc and gt_assoc[i] is not None:
            pose_quat = gt_entries[gt_assoc[i]][1]

        frames.append(
            TumRgbdFrame(
                t=t_rgb,
                rgb_path=str(root_path / rgb_rel),
                depth_path=str(root_path / depth_rel) if depth_rel is not None else None,
                pose_quat=pose_quat,
            )
        )

        if max_frames is not None and len(frames) >= max_frames:
            break

    return frames

`datasets.kitti_odometry`

`KittiOdomFrame(seq, idx, t, left_path, right_path=None, velo_path=None, T_w_cam0=None)` `dataclass`

Single frame from the KITTI Odometry dataset.

This dataclass provides file paths and optional ground-truth pose for a particular frame index in a given sequence.

:param seq: KITTI odometry sequence ID (e.g. "00", "05"). :type seq: str

:param idx: Integer frame index within the sequence. :type idx: int

:param t: Approximate timestamp in seconds. Many downstream pipelines assume KITTI odometry runs at 10 Hz, so a common convention is t = idx / 10.0. :type t: float

:param left_path: Path to the left camera image (image_0) for this frame. :type left_path: str

:param right_path: Optional path to the right camera image (image_1). May be None if not available or desired. :type right_path: Optional[str]

:param velo_path: Optional path to the LiDAR point cloud (velodyne). May be None if not available or desired. :type velo_path: Optional[str]

:param T_w_cam0: Optional 4x4 homogeneous transform from camera-0 to world frame as a flattened 16-element tuple in row-major order. This is derived from the official poses/<seq>.txt file if available. :type T_w_cam0: Optional[Tuple[float, ...]]

`load_kitti_odometry_sequence(root, seq, load_right=True, load_velodyne=False, with_poses=True, max_frames=None)`

Load a KITTI Odometry sequence into a list of frames.

This helper assumes the standard KITTI directory structure:

.. code-block::

root/
  sequences/
    00/
      image_0/
      image_1/
      velodyne/
      calib.txt
  poses/
    00.txt

Only metadata (paths and ground-truth transforms) is loaded; images and point clouds are not read into memory.

:param root: Path to the KITTI odometry dataset root directory. :type root: Union[str, os.PathLike]

:param seq: Sequence ID string (e.g. "00", "01"). :type seq: str

:param load_right: Whether to populate right_path pointing to image_1. If False, the field will always be None. :type load_right: bool

:param load_velodyne: Whether to populate velo_path pointing to velodyne scans. If False, the field will always be None. :type load_velodyne: bool

:param with_poses: Whether to attempt to load ground-truth poses from poses/<seq>.txt. :type with_poses: bool

:param max_frames: Optional maximum number of frames to return. If None, all available frames in the left camera directory are used. :type max_frames: Optional[int]

:return: List of :class:KittiOdomFrame entries with timestamps, paths, and optionally ground-truth transforms. :rtype: List[KittiOdomFrame]

Source code in dsg-jit/dsg_jit/datasets/kitti_odometry.py

def load_kitti_odometry_sequence(
    root: str | os.PathLike,
    seq: str,
    load_right: bool = True,
    load_velodyne: bool = False,
    with_poses: bool = True,
    max_frames: Optional[int] = None,
) -> List[KittiOdomFrame]:
    """
    Load a KITTI Odometry sequence into a list of frames.

    This helper assumes the standard KITTI directory structure:

    .. code-block::

        root/
          sequences/
            00/
              image_0/
              image_1/
              velodyne/
              calib.txt
          poses/
            00.txt

    Only metadata (paths and ground-truth transforms) is loaded; images and
    point clouds are not read into memory.

    :param root:
        Path to the KITTI odometry dataset root directory.
    :type root: Union[str, os.PathLike]

    :param seq:
        Sequence ID string (e.g. ``"00"``, ``"01"``).
    :type seq: str

    :param load_right:
        Whether to populate ``right_path`` pointing to ``image_1``. If ``False``,
        the field will always be ``None``.
    :type load_right: bool

    :param load_velodyne:
        Whether to populate ``velo_path`` pointing to ``velodyne`` scans. If
        ``False``, the field will always be ``None``.
    :type load_velodyne: bool

    :param with_poses:
        Whether to attempt to load ground-truth poses from ``poses/<seq>.txt``.
    :type with_poses: bool

    :param max_frames:
        Optional maximum number of frames to return. If ``None``, all available
        frames in the left camera directory are used.
    :type max_frames: Optional[int]

    :return:
        List of :class:`KittiOdomFrame` entries with timestamps, paths, and
        optionally ground-truth transforms.
    :rtype: List[KittiOdomFrame]
    """
    root_path = Path(root)
    seq_str = f"{int(seq):02d}"  # normalize "0" -> "00"

    seq_dir = root_path / "sequences" / seq_str
    left_dir = seq_dir / "image_0"
    right_dir = seq_dir / "image_1"
    velo_dir = seq_dir / "velodyne"

    if not left_dir.exists():
        raise FileNotFoundError(f"Left image directory not found: {left_dir}")

    # Determine frame indices from left camera images
    left_files = sorted(left_dir.glob("*.png"))
    if not left_files:
        raise FileNotFoundError(f"No left images found in {left_dir}")

    # Optional poses
    poses: List[Tuple[float, ...]] = []
    if with_poses:
        poses_path = root_path / "poses" / f"{seq_str}.txt"
        poses = _load_kitti_poses(poses_path)

    frames: List[KittiOdomFrame] = []

    for idx, left_path in enumerate(left_files):
        t = idx / 10.0  # KITTI odometry is ~10 Hz; good enough for indexing.

        right_path: Optional[str] = None
        velo_path: Optional[str] = None
        T_w_cam0: Optional[Tuple[float, ...]] = None

        if load_right:
            candidate = right_dir / left_path.name
            if candidate.exists():
                right_path = str(candidate)

        if load_velodyne:
            # KITTI velodyne uses "*.bin" with same numeric frame index
            stem = left_path.stem  # e.g. "000000"
            candidate = velo_dir / f"{stem}.bin"
            if candidate.exists():
                velo_path = str(candidate)

        if with_poses and idx < len(poses):
            T_w_cam0 = poses[idx]

        frames.append(
            KittiOdomFrame(
                seq=seq_str,
                idx=idx,
                t=t,
                left_path=str(left_path),
                right_path=right_path,
                velo_path=velo_path,
                T_w_cam0=T_w_cam0,
            )
        )

        if max_frames is not None and len(frames) >= max_frames:
            break

    return frames

Datasets

datasets.tum_rgbd

TumRgbdFrame(t, rgb_path, depth_path=None, pose_quat=None) dataclass

load_tum_rgbd_sequence(root, use_depth=True, use_groundtruth=False, max_frames=None, max_time_diff=0.02)

datasets.kitti_odometry

KittiOdomFrame(seq, idx, t, left_path, right_path=None, velo_path=None, T_w_cam0=None) dataclass

load_kitti_odometry_sequence(root, seq, load_right=True, load_velodyne=False, with_poses=True, max_frames=None)