CLI Reference

vectorize

Run CLIP + LLaVA captioning on video chunks for semantic search.

How It Works

Resolves scope → finds all videos
Reads each video's HLS playlist → collects 2s chunk references
Sends each chunk to the vectorize service (GPU server)
Vectorize service extracts a frame, runs CLIP embedding + LLaVA captioning
Results are stored in Qdrant for semantic search

Usage

dreamlake vectorize --episode space[@namespace][:episode]
dreamlake vectorize --collection <name> --project space[@namespace]
dreamlake vectorize --dataset <name> --project space[@namespace]

Flags

Flag	Type	Required	Description
`--episode`	string	*	Target episode: `space[@namespace][:episode]`
`--collection`	string	*	Collection name (requires `--project`)
`--dataset`	string	*	Dataset name (requires `--project`)
`--project`	string	No	Space target: `space[@namespace]`
`--zaku-url`	string	No	Zaku task queue URL (enables distributed mode)

* At least one scope is required.

Scoping

Scope	What gets vectorized
`--episode`	All videos in that episode
`--collection`	All videos in all episodes belonging to the collection
`--dataset`	All videos across all collections in the dataset
`--project` only	All videos in the entire project

Examples

# Vectorize a single episode
dreamlake vectorize --episode robotics@alice:run-042
 
# Vectorize all episodes in a collection
dreamlake vectorize --collection "front-camera" --project robotics@alice
 
# Vectorize an entire dataset
dreamlake vectorize --dataset "training-v1" --project robotics@alice
 
# Distributed mode via Zaku task queue
dreamlake vectorize --episode robotics@alice:run-042 --zaku-url http://localhost:9000

Vector Storage

Vectors are stored in Qdrant (not MongoDB) with named vectors:

Vector	Dim	Source
`image`	768	CLIP ViT-L/14 frame embedding
`caption`	768	CLIP ViT-L/14 embedding of LLaVA caption text

Each point's payload contains:

{
  "videoId": "abc123",
  "episodeId": "ep456",
  "episodeName": "run-042",
  "projectId": "space789",
  "chunkHash": "ff3a2b...",
  "chunkIndex": 42,
  "timeStart": 84.0,
  "timeEnd": 86.0,
  "caption": "A robot arm picks up a red cup from the table"
}

Searching

After vectorizing, use the semantic search API:

GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup

Or with scope filters:

GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup&episode=run-042
GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup&collection=front-camera

Results include playUrl for direct 2s clip playback.

Performance

Step	Time per chunk
Download chunk	~1.3s
Frame extraction	~0.1s
CLIP image embed	~0.2s
LLaVA caption	~13s
CLIP text embed	~0.05s
Total	~14.5s per chunk

For a 1-hour video (~1,800 chunks at 2s each): ~7.2 hours with 1 worker. Scale with multiple Zaku workers.