D
DreamLake

Overview

S3 Path Convention

All binary data in DreamLake lives in a single S3 bucket managed by BSS. Paths follow a strict convention.

Staging Pool

Raw uploaded files land in a flat staging area, content-addressed by hash:

{owner}/{project}/staging/{hash}
  • owner = namespace slug (e.g. alice)
  • project =project slug (e.g. robotics)
  • hash = SHA256[:16] of the file content

The staging pool is shared across all file types (video, audio, labels, text tracks). Files are never moved out — they remain as the source of truth for re-processing.

Video

{owner}/{project}/videos/{videoId}/
  meta.json                    # Video metadata (length, fps, streams, thumbnail, sprite)
  stream/{streamHash}.m3u8     # HLS playlist (bare hex hashes, rewritten by BSS)
  thumb.jpg                    # Thumbnail (640px wide, optional)
  sprite.jpg                   # Sprite sheet (160x90 tiles, optional)

Chunks (Global Dedup Pool)

chunks/{hash}.ts               # HLS TS segments, content-addressed

Chunks are shared globally across all videos. A 16-char hex hash (SHA256[:16]) is the filename. Before uploading, Lambda checks if the chunk already exists — duplicates are skipped.

Track Metadata

{owner}/{project}/tracks/
  labels/{labelId}/meta.json
  text/{textTrackId}/meta.json
  audio/{audioId}/meta.json

Full Layout

bucket/
├── chunks/
│   ├── a1b2c3d4e5f67890.ts        # global dedup pool
│   └── ...

├── alice/robotics/                  # {owner}/{project}
│   ├── staging/
│   │   ├── abc123def456.mp4        # raw video upload
│   │   ├── 789abc012def.wav        # raw audio upload
│   │   └── ...
│   │
│   ├── videos/
│   │   └── {videoId}/
│   │       ├── meta.json
│   │       ├── stream/{hash}.m3u8
│   │       ├── thumb.jpg
│   │       └── sprite.jpg
│   │
│   └── tracks/
│       ├── labels/{labelId}/meta.json
│       ├── text/{textTrackId}/meta.json
│       └── audio/{audioId}/meta.json

└── bob/demo/                        # another owner/project
    └── ...

Key Design Principles

PrincipleImplementation
Content addressingStaging files and chunks use SHA256[:16] hash as filename
Global dedupChunks pool is flat at bucket root, shared across all videos
ImmutabilityChunks are uploaded with Cache-Control: immutable
No nesting by typeStaging pool is flat — file type is tracked in DB, not S3 path
Owner isolationEach owner/project has its own prefix — no cross-contamination

URL Rewriting

HLS playlists store bare hex hashes as chunk references. BSS's rewriteM3u8() replaces each hash with a full URL before sending to the client:

# Stored in S3:
a1b2c3d4e5f67890
 
# Rewritten for client:
https://cdn.example.com/chunks/a1b2c3d4e5f67890.ts?X-Amz-...

This keeps playlists portable — the same playlist works with any CDN or presigned URL scheme.