D
DreamLake

Key Design

Chunked Upload

The CLI uploads files to BSS using S3 multipart upload with resumable state and parallel workers.

Constants

SettingValue
Chunk size10 MB (S3 minimum is 5 MB except last part)
Max parallel workers4
Hash algorithmSHA256, first 16 hex chars
S3 PUT timeout300s per part
State directory~/.dreamlake/uploads/

Full Upload Flow

Example: uploading a video. All 4 types follow the same pattern — only the BSS URL prefix differs.

dreamlake upload ./run01.mp4 --episode alice@robotics:run-042 --to /camera/front

 ├─ 1. Read file, compute hash
hash = SHA256(content)[:16]
 │     parts = ceil(fileSize / 10MB)

 ├─ 2. Check for resume state
~/.dreamlake/uploads/{hash}.json
 │     If exists → verify with BSS parts-done endpoint
 │     If valid → skip already-uploaded parts

 ├─ 3. Init multipart (if no resume)
POST /videos/upload/multipart/init
 │     → { uploadId, key }
 │     Save state to disk

 ├─ 4. Get presigned S3 URLs for remaining parts
POST /videos/upload/multipart/parts
 │     → { "1": "https://s3...", "2": "https://s3...", ... }

 ├─ 5. Upload parts in parallel (4 workers)
PUT chunk bytes directly to S3 presigned URLs
 │     Extract ETag from response
 │     Save state after each part completes

 ├─ 6. Complete multipart
POST /videos/upload/multipart/complete
 │     → { success: true }
 │     Clear state file

 ├─ 7. Register in BSS
POST /videos { owner, project, stagingHash, ... }
 │     → { id: bssVideoId }

 ├─ 8. Register in DreamLake Server
POST /assets/video { namespace, space, episodeName, name, bssVideoId }
 │     → { id, lambdaUrl }
 │     Server creates: episode node → folder hierarchy → asset leaf node

 └─ 9. Trigger Lambda via presigned URL
       POST {lambdaUrl}   (no auth header — URL is HMAC-signed)
202 dispatched
       Lambda splits file into HLS segments in the background

Resume & Retry

Uploads are resumable across CLI invocations. State is persisted to ~/.dreamlake/uploads/{hash}.json:

{
  "uploadId": "s3-multipart-upload-id",
  "key": "alice/robotics/staging/abc123def456",
  "totalParts": 12,
  "completedParts": [
    { "partNumber": 1, "etag": "\"abc...\"" },
    { "partNumber": 2, "etag": "\"def...\"" }
  ]
}

On resume:

  1. Load state file by hash
  2. Call GET /{type}/upload/multipart/parts-done?uploadId=...&key=...
  3. If not expired → skip completed parts, upload only remaining
  4. If expired → start fresh

On failure mid-upload, the CLI prints upload paused — re-run to resume and exits. There is no explicit abort — the S3 multipart upload expires naturally (typically 7 days).

Parallel Upload

Parts are uploaded concurrently using ThreadPoolExecutor:

Worker 1: ──── part 1 ──── part 5 ──── part 9  ────
Worker 2: ──── part 2 ──── part 6 ──── part 10 ────
Worker 3: ──── part 3 ──── part 7 ──── part 11 ────
Worker 4: ──── part 4 ──── part 8 ──── part 12 ────
  • State is saved (with a thread lock) after each part completes
  • On first failure, all remaining futures are cancelled
  • ETag from S3 response is stored per part for the complete call

BSS Endpoints (per type)

Each file type uses its own set of multipart endpoints. The protocol is identical — only the URL prefix differs.

TypePrefix
Video/videos/upload/multipart/
Audio/audio/upload/multipart/
Labels/labels/upload/multipart/
Text Tracks/text-tracks/upload/multipart/

Each prefix supports: POST .../init, POST .../parts, GET .../parts-done, POST .../complete, POST .../abort

S3 Staging Pool

All types land in the same flat staging pool, content-addressed by hash:

{owner}/{project}/staging/{hash}

Re-uploading the same file produces the same S3 key — the file is overwritten, not duplicated.

Lambda Trigger

After registration, DreamLake Server returns a lambdaUrl — a presigned BSS URL with HMAC signature. The CLI POSTs to this URL without any auth header. BSS verifies the signature and dispatches the Lambda.

This avoids the CLI needing a BSS auth token. See the Presigned URL Auth design doc for details.