Semantic Search
Search video content with natural language. Videos are split into 2s HLS chunks, embedded with CLIP, and indexed in Qdrant.
Pipeline
Vectorize
Add --zaku-url http://host:9000 for distributed processing via Zaku task queue.
Per chunk: CLIP ViT-L/14 (768d image embedding) + LLaVA 13B (caption → CLIP text embedding).
Search
| Param | Description |
|---|---|
q | Natural language query |
episode | Scope to episode |
bindr | Scope to bindr |
limit | Max results (default 10) |
using | image (default) or caption |
Returns matched 2s clips with playback URLs and scores.
Performance
| Step | Time |
|---|---|
| Vectorize per chunk | ~14.5s (CLIP + LLaVA) |
| Search query | ~50ms |
| Storage per chunk | ~6KB |
| 1 hour video | 1,800 points, ~11MB in Qdrant |