D
DreamLake

CLI Reference

vectorize

Run CLIP + LLaVA captioning on video chunks for semantic search.

How It Works

  1. Resolves scope → finds all videos
  2. Reads each video's HLS playlist → collects 2s chunk references
  3. Sends each chunk to the vectorize service (GPU server)
  4. Vectorize service extracts a frame, runs CLIP embedding + LLaVA captioning
  5. Results are stored in Qdrant for semantic search

Usage

dreamlake vectorize --episode space[@namespace][:episode]
dreamlake vectorize --collection <name> --project space[@namespace]
dreamlake vectorize --dataset <name> --project space[@namespace]

Flags

FlagTypeRequiredDescription
--episodestring*Target episode: space[@namespace][:episode]
--collectionstring*Collection name (requires --project)
--datasetstring*Dataset name (requires --project)
--projectstringNoSpace target: space[@namespace]
--zaku-urlstringNoZaku task queue URL (enables distributed mode)

* At least one scope is required.

Scoping

ScopeWhat gets vectorized
--episodeAll videos in that episode
--collectionAll videos in all episodes belonging to the collection
--datasetAll videos across all collections in the dataset
--project onlyAll videos in the entire project

Examples

# Vectorize a single episode
dreamlake vectorize --episode robotics@alice:run-042
 
# Vectorize all episodes in a collection
dreamlake vectorize --collection "front-camera" --project robotics@alice
 
# Vectorize an entire dataset
dreamlake vectorize --dataset "training-v1" --project robotics@alice
 
# Distributed mode via Zaku task queue
dreamlake vectorize --episode robotics@alice:run-042 --zaku-url http://localhost:9000

Vector Storage

Vectors are stored in Qdrant (not MongoDB) with named vectors:

VectorDimSource
image768CLIP ViT-L/14 frame embedding
caption768CLIP ViT-L/14 embedding of LLaVA caption text

Each point's payload contains:

{
  "videoId": "abc123",
  "episodeId": "ep456",
  "episodeName": "run-042",
  "projectId": "space789",
  "chunkHash": "ff3a2b...",
  "chunkIndex": 42,
  "timeStart": 84.0,
  "timeEnd": 86.0,
  "caption": "A robot arm picks up a red cup from the table"
}

Searching

After vectorizing, use the semantic search API:

GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup

Or with scope filters:

GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup&episode=run-042
GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup&collection=front-camera

Results include playUrl for direct 2s clip playback.

Performance

StepTime per chunk
Download chunk~1.3s
Frame extraction~0.1s
CLIP image embed~0.2s
LLaVA caption~13s
CLIP text embed~0.05s
Total~14.5s per chunk

For a 1-hour video (~1,800 chunks at 2s each): ~7.2 hours with 1 worker. Scale with multiple Zaku workers.