D
DreamLake

Key Design

Semantic Video Search

Search video content using natural language. Upload any video format, and the system automatically transcodes, segments, embeds, and indexes it for semantic search.

Architecture

Upload (.mp4, any codec)

BSS → S3 staging

Lambda: detect codec → transcode if needed → 2s HLS chunks
   ↓  chunks/{hash}.ts in S3
   
dreamlake vectorize

CLI dispatches chunk jobs → Zaku queue (Redis)

GPU Worker(s):
  download chunk → ffmpeg frame → CLIP ViT-L/14 (768d)
  LLaVA 13B → caption → CLIP text embed

Worker writes directly to Qdrant

 
GET /semantic-search?q=robot+arm+cup

dreamlake-server: CLIP text embed → Qdrant nearest neighbor

Return matched 2s clips with playback URLs

Pipeline Steps

1. Upload

Any video format (H.264, AV1, VP9, etc.) is accepted. The CLI auto-detects the type and uploads via multipart to S3.

dreamlake upload ./video.mp4 --episode robotics@alice:run-042 --to /camera/front

2. HLS Splitting (Lambda)

The Lambda function automatically:

  • Probes the video codec
  • If MPEG-TS compatible (H.264, HEVC): stream-copies into 2s .ts chunks
  • If not (AV1, VP9, etc.): transcodes to H.264 with keyframes every 2s, then splits
  • Uploads chunks to S3 at chunks/{hash}.ts (content-addressed, deduplicated)
  • Creates m3u8 playlist

Each 2s chunk = one atomic unit for vectorization and search.

3. Vectorize

Two modes:

Direct mode — sequential, no infrastructure needed:

dreamlake vectorize --episode robotics@alice:run-042

Distributed mode — parallel via Zaku task queue:

dreamlake vectorize --episode robotics@alice:run-042 --zaku-url http://localhost:9000

Scoping:

FlagScope
--episodeAll videos in one episode
--collectionAll episodes in a collection
--datasetAll collections in a dataset

Per chunk, the worker:

  1. Downloads the 2s .ts chunk from S3
  2. Extracts the middle frame via ffmpeg
  3. Runs CLIP ViT-L/14 → 768-dimensional image embedding
  4. Runs LLaVA 13B → natural language caption → CLIP text embedding
  5. Writes the point directly to Qdrant

4. Search

Natural language queries against the Qdrant vector database:

GET /namespaces/:ns/projects/:space/semantic-search?q=robot+picking+up+cup

Query parameters:

ParamDescription
qNatural language search text (required)
episodeScope to episode name
collectionScope to collection name
datasetScope to dataset name
limitMax results (default 10, max 50)
usingVector type: image (default) or caption

Response:

{
  "query": "robot picking up cup",
  "results": [
    {
      "score": 0.211,
      "videoId": "69e5e727...",
      "episodeName": "run-042",
      "chunkHash": "a3f8b2c1...",
      "chunkIndex": 26,
      "timeStart": 52,
      "timeEnd": 54,
      "caption": "A robotic arm reaches toward a red cup...",
      "playUrl": "https://s3.../chunks/a3f8b2c1.ts"
    }
  ],
  "total": 10,
  "using": "image"
}

5. Playback

Each result includes a playUrl pointing to the 2s .ts chunk in S3. Playable directly in a browser:

<video src="https://s3.../chunks/a3f8b2c1.ts" controls></video>

Or seek to the matched time in the full video using videoId + timeStart.

Vector Storage

All vectors and metadata stored in Qdrant (not MongoDB):

FieldDescription
image vector (768d)CLIP ViT-L/14 image embedding
caption vector (768d)CLIP text embedding of LLaVA caption
videoIdBSS video ID
episodeIdDreamLake episode ID
projectIdNamespace/space slug
chunkHashS3 chunk key
chunkIndexPosition in m3u8 playlist
timeStart / timeEndTime range in seconds
captionLLaVA-generated description

Distributed Processing (Zaku)

When --zaku-url is provided, the CLI dispatches all chunk jobs to a Zaku task queue instead of processing sequentially.

CLI (dispatcher)              GPU Server
  add N jobs ──────────→  Zaku (Redis)
  poll count ←─────────     ↓ pop
                          Worker 1: process + write to Qdrant
                          Worker 2: process + write to Qdrant
  count == 0 → done       ...

Benefits:

  • Parallel: multiple workers process chunks concurrently
  • Resilient: failed jobs auto-retry (Zaku resets on exception)
  • Detached: Ctrl+C stops the CLI, workers keep processing
  • Scalable: add workers without changing the CLI

Codec Support

The Lambda auto-transcodes any input codec to H.264 for MPEG-TS compatibility:

Input CodecAction
H.264Stream copy (fast)
HEVC / H.265Stream copy
MPEG-2Stream copy
AV1Transcode to H.264
VP9Transcode to H.264
OtherTranscode to H.264

Performance

StepTimeNotes
Upload (3MB video)~2sMultipart to S3
Lambda split (60s video)~3sAV1→H.264, 30 chunks
Vectorize per chunk~14.5sCLIP + LLaVA 13B
Vectorize per chunk (with caption)~14sCLIP + LLaVA 13B
Search query~50msCLIP text embed + Qdrant

Storage: ~6KB per chunk (768d × 4 bytes × 2 vectors + payload). For 1 hour of video at 2s chunks: 1,800 points, ~11MB in Qdrant.