D
DreamLake

DB Schema

Embedding

Vector embeddings for semantic search across track content. Uses MongoDB Atlas Vector Search.

FieldTypeDescription
idObjectIdPrimary key
videoIdString→ Video.id
trackHashStringchunks/{hash}.jsonl in S3
lineNumIntLine number within the chunk file
textStringCached text for display (avoids re-fetching S3)
vectorFloat[]Embedding vector (MongoDB Atlas Vector Search)
startMsInt?Start time in milliseconds
endMsInt?End time in milliseconds
createdAtDateTimeCreated timestamp

Unique: [videoId, trackHash, lineNum]

Indexes: videoId

How it works

  1. Track content (JSONL chunks) is stored in S3
  2. Each line is embedded via a model (text, audio, frame)
  3. The vector + a cached copy of the text are stored here
  4. trackHash + lineNum uniquely identify the source line in S3
  5. Queries use MongoDB Atlas Vector Search for nearest-neighbor lookup