D
DreamLake

Key Design

Label Track Splitting

Label tracks (JSONL annotation files) are split into time-windowed chunks, following the same pattern as text track splitting. Labels are always JSONL format.

Input Format

Each line is a JSON object. Required fields for time windowing: ts (timestamp) and dt (duration). Optional frame fields: fs (frame start) and fe (frame end).

{"ts": 0.0, "dt": 0.5, "label": "person", "confidence": 0.95, "bbox": [100, 50, 200, 150]}
{"ts": 0.5, "dt": 0.033, "fs": 15, "fe": 15, "label": "car", "bbox": [300, 200, 450, 350]}
{"ts": 1.0, "dt": 0.033, "fs": 30, "fe": 30, "label": "person", "bbox": [110, 55, 210, 155]}

Field Reference

FieldTypeRequiredDescription
tsnumberYesTimestamp in seconds (used for time windowing)
dtnumberYesDuration in seconds
fsnumberNoFrame start number (for frame-accurate indexing)
fenumberNoFrame end number
Other fieldsanyNoFlexible — label, bbox, confidence, etc.

Frame fields (fs/fe) are optional and backward compatible. Old labels without them continue to work. When present, the Lambda tracks global startFrame/endFrame in meta.json.

Pipeline

Raw JSONL in staging

 ├─ 1. Download from S3
 ├─ 2. Parse each line, extract ts/dt for time range
 ├─ 3. Group into 30-second windows
 │     Entry belongs to window if ts < windowEnd AND ts+dt > windowStart
 ├─ 4. Write each window as .jsonl chunk
 ├─ 5. Hash + upload to chunks/{hash}.jsonl
 ├─ 6. Build m3u8 playlist
 ├─ 7. Upload to tracks/labels/{id}/stream/{streamHash}.m3u8
 ├─ 8. Update meta.json with streams[], entryCount, startTime, endTime
 └─ 9. Callback to BSS

Difference from Text Tracks

AspectText TrackLabel Track
Input formatsVTT, SRT, JSONLJSONL only
Chunk format.vtt or .jsonl.jsonl only
Time parsingVTT timestamps or JSON ts/dtJSON ts/dt
Typical contentCaptions, transcriptsBounding boxes, detections, annotations
Lambda actiontext-processlabel-process

Overlap Handling

An entry with ts=29.5, dt=1.0 spans the boundary between two 30-second windows (0–30s and 30–60s). The entry appears in both chunks — the client deduplicates at render time if needed.

Meta.json After Splitting

{
  "labelId": "69df3cba...",
  "owner": "testuser",
  "project": "test-project",
  "name": "detections",
  "entryCount": 2400,
  "startTime": 0,
  "endTime": 120,
  "startFrame": 0,
  "endFrame": 3600,
  "fields": ["bbox", "confidence", "label"],
  "streams": ["da2321b20fbcf81e"],
  "updatedAt": "2026-04-15T..."
}

startFrame/endFrame are null if no entries have fs/fe fields.

S3 Layout

{owner}/{project}/
  staging/{hash}                                # original JSONL file
  tracks/labels/{labelId}/
    meta.json                                   # updated with streams[]
    stream/{streamHash}.m3u8                    # HLS playlist
chunks/
  {hash}.jsonl                                  # 30-second JSONL chunks