Overview
S3 Path Convention
All binary data in DreamLake lives in a single S3 bucket managed by BSS. Paths follow a strict convention.
Staging Pool
Raw uploaded files land in a flat staging area, content-addressed by hash:
{owner}/{project}/staging/{hash}owner= namespace slug (e.g.alice)project=project slug (e.g.robotics)hash= SHA256[:16] of the file content
The staging pool is shared across all file types (video, audio, labels, text tracks). Files are never moved out — they remain as the source of truth for re-processing.
Video
{owner}/{project}/videos/{videoId}/
meta.json # Video metadata (length, fps, streams, thumbnail, sprite)
stream/{streamHash}.m3u8 # HLS playlist (bare hex hashes, rewritten by BSS)
thumb.jpg # Thumbnail (640px wide, optional)
sprite.jpg # Sprite sheet (160x90 tiles, optional)Chunks (Global Dedup Pool)
chunks/{hash}.ts # HLS TS segments, content-addressedChunks are shared globally across all videos. A 16-char hex hash (SHA256[:16]) is the filename. Before uploading, Lambda checks if the chunk already exists — duplicates are skipped.
Track Metadata
{owner}/{project}/tracks/
labels/{labelId}/meta.json
text/{textTrackId}/meta.json
audio/{audioId}/meta.jsonFull Layout
bucket/
├── chunks/
│ ├── a1b2c3d4e5f67890.ts # global dedup pool
│ └── ...
│
├── alice/robotics/ # {owner}/{project}
│ ├── staging/
│ │ ├── abc123def456.mp4 # raw video upload
│ │ ├── 789abc012def.wav # raw audio upload
│ │ └── ...
│ │
│ ├── videos/
│ │ └── {videoId}/
│ │ ├── meta.json
│ │ ├── stream/{hash}.m3u8
│ │ ├── thumb.jpg
│ │ └── sprite.jpg
│ │
│ └── tracks/
│ ├── labels/{labelId}/meta.json
│ ├── text/{textTrackId}/meta.json
│ └── audio/{audioId}/meta.json
│
└── bob/demo/ # another owner/project
└── ...Key Design Principles
| Principle | Implementation |
|---|---|
| Content addressing | Staging files and chunks use SHA256[:16] hash as filename |
| Global dedup | Chunks pool is flat at bucket root, shared across all videos |
| Immutability | Chunks are uploaded with Cache-Control: immutable |
| No nesting by type | Staging pool is flat — file type is tracked in DB, not S3 path |
| Owner isolation | Each owner/project has its own prefix — no cross-contamination |
URL Rewriting
HLS playlists store bare hex hashes as chunk references. BSS's rewriteM3u8() replaces each hash with a full URL before sending to the client:
# Stored in S3:
a1b2c3d4e5f67890
# Rewritten for client:
https://cdn.example.com/chunks/a1b2c3d4e5f67890.ts?X-Amz-...This keeps playlists portable — the same playlist works with any CDN or presigned URL scheme.