D
DreamLake

Preview

Size Strategy

How <FilePreview> decides what to fetch and what to refuse. The short version: images get hard limits, text-shaped formats get soft limits, and binary formats with footers get keyhole reads.

Per-kind table

KindStrategyDefault limitWhat happens at the limit
ImageHard20 MBNo fetch. Render "too large" placeholder + download.
VideoNative streamingBrowser handles via <video preload="metadata">.
AudioNative streamingBrowser handles via <audio>.
TextSoft (Range)5 MBFetch first 5 MB; show truncation banner.
CodeSoft (Range)5 MBSame as text; partial highlight at the cut.
MarkdownSoft (Range)5 MBFetch first 5 MB; render what parses.
CSV / TSVSoft (Range)10 MBFetch first 10 MB; drop incomplete final row.
JSONLSoft (Range)10 MBFetch first 10 MB; drop incomplete final line.
NPYBounded1024 elementsHeader read first; fetch data only if it fits the cap.
MCAPFooter + summaryTwo small reads, never the full file.

Hard vs soft

Hard limit — applies to images. The browser's <img src> cannot stream or partial-load. Once the request is in flight a 200 MB PNG will pin the tab. So when meta.size > imageMaxBytes, we render a placeholder without fetching. Only size matters here; if size is unknown the image is loaded.

Soft limit — applies to text-shaped formats. The fetch uses a Range: bytes=0-N header where N is the configured limit minus one. The server can either:

  • Return 206 Partial Content with the prefix → previewer shows truncation banner.
  • Return 200 OK with the whole file (server doesn't support Range) → still capped client-side at N bytes.

Range request flow (soft limit)

client                                  server
  │                                       │
GET /file.csv                        │
  │  Range: bytes=0-10485759   ─────────► │
  │                                       │
206 Partial Content
  │                              Content-Range: bytes 0-10485759/47000000
  │  ◄──────────────────────────  Content-Length: 10485760
  │                                       │

parser receives 10 MB prefix

  ├─ scan back from end to last newline
  ├─ drop incomplete final row/line
  └─ render rows + truncation banner:
      "Showing first 10.0 MB of 44.8 MB"

The "scan back to last newline" step is what makes truncation safe for CSV / JSONL / text. The parser never sees a half-row; the previewer never renders garbage.

NPY — bounded, two-stage

NumPy .npy files have a fixed-layout header followed by raw data. The previewer reads them in two stages:

stage 1: GET .npy  Range: bytes=0-4095
  ├─ parse magic + version
  ├─ read header dict length
  └─ parse header → { dtype, shape, fortran_order }
 
stage 2 (only if shape product ≤ npyPreviewElements):
  GET .npy  Range: bytes=<headerEnd>-<headerEnd + dataBytes - 1>
  └─ decode into typed array

If the shape product exceeds the preview cap (default 1024 elements), stage 2 is skipped. The previewer shows shape, dtype, and a "preview disabled (N elements > 1024)" notice plus a download link. This means a 10 GB tensor file consumes ~4 KB of network on preview.

MCAP — footer + summary only

MCAP files are designed to be indexed: the last bytes of the file describe the rest. The previewer never fetches message bodies. See MCAP kind for the IReadable adapter pattern.

GET sample.mcap  Range: bytes=<size-8192>-<size-1>     # footer + tail
GET sample.mcap  Range: bytes=<summaryStart>-<summaryEnd>   # summary section

Two reads, regardless of file size. A 50 GB MCAP previews in two ~KB-scale requests.

Tuning limits

All defaults are overridable via the limits prop. See API for the full shape.

<FilePreview
  url={url}
  limits={{
    imageMaxBytes: 100 * 1024 * 1024,  // 100 MB images allowed
    csvMaxBytes:   50  * 1024 * 1024,  // 50 MB CSV
    npyPreviewElements: 4096,          // larger tensor previews
  }}
/>

When the server doesn't support Range

If a server returns 200 OK instead of 206 Partial Content for a Range request, the previewer still aborts the fetch at the configured limit using a streaming reader. The truncation banner appears the same way. The only cost is bandwidth: the server may have started sending the whole file.