Hugging Face · huggingface.js #1979 · Francesco

01 · The problem

Three round-trips per shard, for data we throw away

When the user calls parseSafetensorsMetadata() on a sharded checkpoint, the existing code walks every .safetensors shard listed in the index and, for each one, issues three HTTP requests. The first one, the call to fileDownloadInfo, returns file size, ETag, and xet redirect information that the caller never reads when all it wants is the JSON header. It is wasted work, and on heavily sharded models the fan-out is so large that real users see the requests time out.

The failure shape, by the numbers

From the issue body, observed in upstream benches across ten runs each:

Model	Shards	Old requests	New requests	Outcome
bigscience/bloom	72	216	144	≈ same wall time
Kimi-K2.5	64	192	128	34 % faster
Qwen3.5-397B	94	282	188	26 % faster
DeepSeek-Math-V2	163	489	326	fails 10/10 old → works new

The DeepSeek row is the one that matters most. Four hundred eighty-nine HTTP requests issued from a single client overwhelm the Hub's connection budget; the existing path is not slow, it is broken. The new path reduces the fan-out to a level the server reliably honors.

Where the third request comes from

Three layers of indirection each issue their own range request:

downloadFile() calls fileDownloadInfo with Range: bytes=0-0, one round trip just to learn size + ETag + xet status, none of which we use when parsing shard headers.
WebBlob.slice(0, 8).arrayBuffer() issues a Range: bytes=0-7 request to read the 8-byte little-endian header length.
WebBlob.slice(8, 8 + len).arrayBuffer() issues a Range: bytes=8-N request to read the JSON header body.

The first of those three is the one we can drop, because the data it returns is unused on the sharded path. The other two are real work the parser actually needs.

flowchart LR
  A["parseSafetensorsMetadata"] --> B["fetchAllHeaders"]
  B --> C["for each shard"]
  C --> D1["fileDownloadInfo
Range: bytes=0-0
unused for headers"]
  D1 --> D2["WebBlob.slice(0,8)
Range: bytes=0-7"]
  D2 --> D3["WebBlob.slice(8, 8+len)
Range: bytes=8-N"]
  D3 --> E["JSON.parse(header)"]
  classDef wasted fill:#ef4444,stroke:#ef4444,stroke-width:0,color:#fff
  class D1 wasted

02 · The fix

A private helper that talks to the resolve URL directly

A new function parseSingleFileFast(path, params) builds the same resolve URL fileDownloadInfo would target (bucket vs model prefix, revision encoding, raw=false, every detail mirrored exactly) and then issues only two range requests against it. fetchAllHeaders is rewired to call the fast helper for every shard. The single-file (non-sharded) entry path is left untouched, which preserves xet single-file checkpoint compatibility for the XetBlob reconstruction flow.

The new code path, abbreviated

// 1. Build the resolve URL exactly like fileDownloadInfo would
const url = buildResolveUrl(repo, path, hubUrl, revision);

// 2. Request #1, 8-byte little-endian header length
const lenResp = await fetch(url, {
  headers: { ...auth, Range: "bytes=0-7" },
});
if (lenResp.status !== 206) {
  await lenResp.body?.cancel();    // drain instead of buffering
  throw new SafetensorParseError("server did not honor Range");
}
const len = new DataView(await lenResp.arrayBuffer()).getBigUint64(0, true);
if (len > BigInt(MAX_HEADER_LENGTH)) throw /* refuse before the second request */;

// 3. Request #2, JSON header body
const headerResp = await fetch(url, {
  headers: { ...auth, Range: `bytes=8-${8 + Number(len) - 1}` },
});
if (headerResp.status !== 206) throw /* same refusal logic */;
return JSON.parse(await headerResp.text());
      

Two requests, no fileDownloadInfo, no WebBlob intermediary. Plain web standards: fetch, Response.body, DataView, and a refusal to buffer a multi-gigabyte body if the server ignores the Range header.

flowchart LR
  subgraph OLD["BEFORE - 3 requests / shard"]
    direction TB
    O1["fileDownloadInfo
Range: 0-0"] --> O2["WebBlob slice(0,8)
Range: 0-7"] --> O3["WebBlob slice(8,N)
Range: 8-N"]
  end
  subgraph NEW["AFTER - 2 requests / shard"]
    direction TB
    N1["fetch
Range: 0-7"] --> N2["fetch
Range: 8-N"]
  end
  OLD ==>|"-33% HTTP fan-out"| NEW
  style OLD fill:#1e0a0a,stroke:#ef4444,color:#fca5a5
  style NEW fill:#0a1e10,stroke:#10b981,color:#86efac

Safety invariants preserved

Auth: Authorization: Bearer <token> forwarded identically.
Header cap: MAX_HEADER_LENGTH (25 MB) enforced before issuing the body request.
200 refusal: A non-206 response means the server is streaming the whole shard. We drain and throw, not buffer.
Custom fetch: The user-overridable fetch parameter (proxy / header-rewrite) is preserved end-to-end.
Xet single-file: Untouched. XetBlob reconstruction logic continues to flow through downloadFile.
URL construction: Bucket vs model prefix, revision encoding, raw=false, every detail copy of fileDownloadInfo.

03 · The tests

Three offline tests, mocked fetch, no network

The new spec file uses an in-process fetch stub that records every call and asserts exactly the request shape the fix promises.

sharded path issues exactly 2 HTTP requests per shard (not 3), instruments fetch, parses a 3-shard fake repo, asserts 2 × 3 = 6 shard requests with the correct Range headers.
rejects a shard response that returns 200 (server ignored Range), the mock returns a 1 MB 200 body, the test asserts the code throws with /did not honor Range/.
rejects an oversized header length, the mock returns 8 bytes encoding 100 MB, the test asserts the code throws with /header is too big/ before issuing the body request.

Note on integration tests. The pre-existing spec file (parse-safetensors-metadata.spec.ts) issues real HTTP calls against bigscience/bloom, Alignment-Lab-AI/ALAI-gemma-7b and three other live repositories. Those tests exercise the new sharded path end-to-end against the production Hub. They are not run in this session (network + pnpm install ≈ 5-10 min) but they will run automatically in CI on the PR.

04 · The outreach

Where this stands in the conversation funnel

The branch is committed locally and ready to push. No public PR is open yet, that is a deliberate choice that leaves the conversation initiation in Francesco's hands.

Done

Branch + commit + tests + README

Local branch fix/1979-sharded-safetensors-2hops, commit 04a0909, three unit tests written, full PR-shaped README authored.

Two-line shell command. The fork goes onto github.com/999purple999/huggingface.js; the PR opens as draft so the maintainer can self-pace review.

Then

Cold mail to a named @huggingface/hub maintainer

Mail body anchored to the branch URL, with the explicit two-way CTA: 15-minute call OR async PR review. Compensation range stated up front: €40-70/h freelance, €2,500-3,000/mo retainer, part-time afternoon-evening CEST until July, full-time after.

Goal

Conversation within 5 business days, contract within 4 weeks

Realistic odds for this specific outreach given the proof shape: ~50-70% conversation, ~10-25% contract from this lead alone.

05 · How to verify

Reproduce every claim on this page in 4 commands

# Confirm the branch + commit exist locally
cd c:/Users/FRA/Documents/github/workrepo/huggingface.js
git switch fix/1979-sharded-safetensors-2hops
git log --stat -1               # expect commit 04a0909, 3 files

# Install workspace deps and run the package test suite
pnpm install
pnpm --filter @huggingface/hub test

# Expected: 3 new unit tests PASS; all pre-existing safetensors
# integration tests PASS (those hit real Hub URLs).
    

If any of these fail on your machine, the PR cannot land. Run them first.

Three round-trips per shard, for data we throw away

The failure shape, by the numbers

Where the third request comes from

A private helper that talks to the resolve URL directly

The new code path, abbreviated

Safety invariants preserved

Three offline tests, mocked fetch, no network

Where this stands in the conversation funnel

Branch + commit + tests + README

Fork + push + open draft PR

Cold mail to a named @huggingface/hub maintainer

Conversation within 5 business days, contract within 4 weeks

Reproduce every claim on this page in 4 commands