Key Design

Semantic Video Search

Search video content using natural language. Upload any video format, and the system automatically transcodes, segments, embeds, and indexes it for semantic search.

Architecture

Upload (.mp4, any codec)
   ↓
BSS → S3 staging
   ↓
Lambda: detect codec → transcode if needed → 2s HLS chunks
   ↓  chunks/{hash}.ts in S3
   
dreamlake vectorize
   ↓
CLI dispatches chunk jobs → Zaku queue (Redis)
   ↓
GPU Worker(s):
  download chunk → ffmpeg frame → CLIP ViT-L/14 (768d)
  LLaVA 13B → caption → CLIP text embed
   ↓
Worker writes directly to Qdrant
   ↓
 
GET /semantic-search?q=robot+arm+cup
   ↓
dreamlake-server: CLIP text embed → Qdrant nearest neighbor
   ↓
Return matched 2s clips with playback URLs

Pipeline Steps

1. Upload

Any video format (H.264, AV1, VP9, etc.) is accepted. The CLI auto-detects the type and uploads via multipart to S3.

dreamlake upload ./video.mp4 --episode robotics@alice:run-042 --to /camera/front

2. HLS Splitting (Lambda)

The Lambda function automatically:

Probes the video codec
If MPEG-TS compatible (H.264, HEVC): stream-copies into 2s .ts chunks
If not (AV1, VP9, etc.): transcodes to H.264 with keyframes every 2s, then splits
Uploads chunks to S3 at chunks/{hash}.ts (content-addressed, deduplicated)
Creates m3u8 playlist

Each 2s chunk = one atomic unit for vectorization and search.

3. Vectorize

Two modes:

Direct mode — sequential, no infrastructure needed:

dreamlake vectorize --episode robotics@alice:run-042

Distributed mode — parallel via Zaku task queue:

dreamlake vectorize --episode robotics@alice:run-042 --zaku-url http://localhost:9000

Scoping:

Flag	Scope
`--episode`	All videos in one episode
`--collection`	All episodes in a collection
`--dataset`	All collections in a dataset

Per chunk, the worker:

Downloads the 2s .ts chunk from S3
Extracts the middle frame via ffmpeg
Runs CLIP ViT-L/14 → 768-dimensional image embedding
Runs LLaVA 13B → natural language caption → CLIP text embedding
Writes the point directly to Qdrant

4. Search

Natural language queries against the Qdrant vector database:

GET /namespaces/:ns/projects/:space/semantic-search?q=robot+picking+up+cup

Query parameters:

Param	Description
`q`	Natural language search text (required)
`episode`	Scope to episode name
`collection`	Scope to collection name
`dataset`	Scope to dataset name
`limit`	Max results (default 10, max 50)
`using`	Vector type: `image` (default) or `caption`

Response:

{
  "query": "robot picking up cup",
  "results": [
    {
      "score": 0.211,
      "videoId": "69e5e727...",
      "episodeName": "run-042",
      "chunkHash": "a3f8b2c1...",
      "chunkIndex": 26,
      "timeStart": 52,
      "timeEnd": 54,
      "caption": "A robotic arm reaches toward a red cup...",
      "playUrl": "https://s3.../chunks/a3f8b2c1.ts"
    }
  ],
  "total": 10,
  "using": "image"
}

5. Playback

Each result includes a playUrl pointing to the 2s .ts chunk in S3. Playable directly in a browser:

<video src="https://s3.../chunks/a3f8b2c1.ts" controls></video>

Or seek to the matched time in the full video using videoId + timeStart.

Vector Storage

All vectors and metadata stored in Qdrant (not MongoDB):

Field	Description
`image` vector (768d)	CLIP ViT-L/14 image embedding
`caption` vector (768d)	CLIP text embedding of LLaVA caption
`videoId`	BSS video ID
`episodeId`	DreamLake episode ID
`projectId`	Namespace/space slug
`chunkHash`	S3 chunk key
`chunkIndex`	Position in m3u8 playlist
`timeStart` / `timeEnd`	Time range in seconds
`caption`	LLaVA-generated description

Distributed Processing (Zaku)

When --zaku-url is provided, the CLI dispatches all chunk jobs to a Zaku task queue instead of processing sequentially.

CLI (dispatcher)              GPU Server
  add N jobs ──────────→  Zaku (Redis)
  poll count ←─────────     ↓ pop
                          Worker 1: process + write to Qdrant
                          Worker 2: process + write to Qdrant
  count == 0 → done       ...

Benefits:

Parallel: multiple workers process chunks concurrently
Resilient: failed jobs auto-retry (Zaku resets on exception)
Detached: Ctrl+C stops the CLI, workers keep processing
Scalable: add workers without changing the CLI

Codec Support

The Lambda auto-transcodes any input codec to H.264 for MPEG-TS compatibility:

Input Codec	Action
H.264	Stream copy (fast)
HEVC / H.265	Stream copy
MPEG-2	Stream copy
AV1	Transcode to H.264
VP9	Transcode to H.264
Other	Transcode to H.264

Performance

Step	Time	Notes
Upload (3MB video)	~2s	Multipart to S3
Lambda split (60s video)	~3s	AV1→H.264, 30 chunks
Vectorize per chunk	~14.5s	CLIP + LLaVA 13B
Vectorize per chunk (with caption)	~14s	CLIP + LLaVA 13B
Search query	~50ms	CLIP text embed + Qdrant

Storage: ~6KB per chunk (768d × 4 bytes × 2 vectors + payload). For 1 hour of video at 2s chunks: 1,800 points, ~11MB in Qdrant.