CLI Reference
vectorize
Run CLIP + LLaVA captioning on video chunks for semantic search.
How It Works
- Resolves scope → finds all videos
- Reads each video's HLS playlist → collects 2s chunk references
- Sends each chunk to the vectorize service (GPU server)
- Vectorize service extracts a frame, runs CLIP embedding + LLaVA captioning
- Results are stored in Qdrant for semantic search
Usage
dreamlake vectorize --episode space[@namespace][:episode]
dreamlake vectorize --collection <name> --project space[@namespace]
dreamlake vectorize --dataset <name> --project space[@namespace]Flags
| Flag | Type | Required | Description |
|---|---|---|---|
--episode | string | * | Target episode: space[@namespace][:episode] |
--collection | string | * | Collection name (requires --project) |
--dataset | string | * | Dataset name (requires --project) |
--project | string | No | Space target: space[@namespace] |
--zaku-url | string | No | Zaku task queue URL (enables distributed mode) |
* At least one scope is required.
Scoping
| Scope | What gets vectorized |
|---|---|
--episode | All videos in that episode |
--collection | All videos in all episodes belonging to the collection |
--dataset | All videos across all collections in the dataset |
--project only | All videos in the entire project |
Examples
# Vectorize a single episode
dreamlake vectorize --episode robotics@alice:run-042
# Vectorize all episodes in a collection
dreamlake vectorize --collection "front-camera" --project robotics@alice
# Vectorize an entire dataset
dreamlake vectorize --dataset "training-v1" --project robotics@alice
# Distributed mode via Zaku task queue
dreamlake vectorize --episode robotics@alice:run-042 --zaku-url http://localhost:9000Vector Storage
Vectors are stored in Qdrant (not MongoDB) with named vectors:
| Vector | Dim | Source |
|---|---|---|
image | 768 | CLIP ViT-L/14 frame embedding |
caption | 768 | CLIP ViT-L/14 embedding of LLaVA caption text |
Each point's payload contains:
{
"videoId": "abc123",
"episodeId": "ep456",
"episodeName": "run-042",
"projectId": "space789",
"chunkHash": "ff3a2b...",
"chunkIndex": 42,
"timeStart": 84.0,
"timeEnd": 86.0,
"caption": "A robot arm picks up a red cup from the table"
}Searching
After vectorizing, use the semantic search API:
GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cupOr with scope filters:
GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup&episode=run-042
GET /namespaces/:slug/projects/:projectSlug/semantic-search?q=robot+picks+up+cup&collection=front-cameraResults include playUrl for direct 2s clip playback.
Performance
| Step | Time per chunk |
|---|---|
| Download chunk | ~1.3s |
| Frame extraction | ~0.1s |
| CLIP image embed | ~0.2s |
| LLaVA caption | ~13s |
| CLIP text embed | ~0.05s |
| Total | ~14.5s per chunk |
For a 1-hour video (~1,800 chunks at 2s each): ~7.2 hours with 1 worker. Scale with multiple Zaku workers.