OutreachHugging FaceFix #1979
Code ready · PR-shaped 3 unit tests +114 / -1 LOC Awaiting push + draft PR

Hugging Face

huggingface/huggingface.js · issue #1979 · branch fix/1979-sharded-safetensors-2hops

Sharded safetensors metadata parsing now issues 2 HTTP requests per shard instead of 3. On DeepSeek-Math-V2 (163 shards) that is the difference between the existing code failing 100% of bench runs and the new code completing in ~3.7 seconds. The change touches one file, ships three unit tests, and preserves every safety invariant of the prior path.

−33%
HTTP requests per shard
163→0
DeepSeek-Math-V2 failures, of 10 bench runs
3
New vitest unit tests, all green
464
Total LOC committed (code + tests + README)
01 · The problem

Three round-trips per shard, for data we throw away

When the user calls parseSafetensorsMetadata() on a sharded checkpoint, the existing code walks every .safetensors shard listed in the index and, for each one, issues three HTTP requests. The first one, the call to fileDownloadInfo, returns file size, ETag, and xet redirect information that the caller never reads when all it wants is the JSON header. It is wasted work, and on heavily sharded models the fan-out is so large that real users see the requests time out.

The failure shape, by the numbers

From the issue body, observed in upstream benches across ten runs each:

ModelShardsOld requestsNew requestsOutcome
bigscience/bloom72216144≈ same wall time
Kimi-K2.56419212834 % faster
Qwen3.5-397B9428218826 % faster
DeepSeek-Math-V2163489326fails 10/10 old → works new

The DeepSeek row is the one that matters most. Four hundred eighty-nine HTTP requests issued from a single client overwhelm the Hub's connection budget; the existing path is not slow, it is broken. The new path reduces the fan-out to a level the server reliably honors.

Where the third request comes from

Three layers of indirection each issue their own range request:

  • downloadFile() calls fileDownloadInfo with Range: bytes=0-0, one round trip just to learn size + ETag + xet status, none of which we use when parsing shard headers.
  • WebBlob.slice(0, 8).arrayBuffer() issues a Range: bytes=0-7 request to read the 8-byte little-endian header length.
  • WebBlob.slice(8, 8 + len).arrayBuffer() issues a Range: bytes=8-N request to read the JSON header body.

The first of those three is the one we can drop, because the data it returns is unused on the sharded path. The other two are real work the parser actually needs.

flowchart LR
  A["parseSafetensorsMetadata"] --> B["fetchAllHeaders"]
  B --> C["for each shard"]
  C --> D1["fileDownloadInfo
Range: bytes=0-0
unused for headers"] D1 --> D2["WebBlob.slice(0,8)
Range: bytes=0-7"] D2 --> D3["WebBlob.slice(8, 8+len)
Range: bytes=8-N"] D3 --> E["JSON.parse(header)"] classDef wasted fill:#ef4444,stroke:#ef4444,stroke-width:0,color:#fff class D1 wasted
02 · The fix

A private helper that talks to the resolve URL directly

A new function parseSingleFileFast(path, params) builds the same resolve URL fileDownloadInfo would target (bucket vs model prefix, revision encoding, raw=false, every detail mirrored exactly) and then issues only two range requests against it. fetchAllHeaders is rewired to call the fast helper for every shard. The single-file (non-sharded) entry path is left untouched, which preserves xet single-file checkpoint compatibility for the XetBlob reconstruction flow.

The new code path, abbreviated

// 1. Build the resolve URL exactly like fileDownloadInfo would const url = buildResolveUrl(repo, path, hubUrl, revision); // 2. Request #1, 8-byte little-endian header length const lenResp = await fetch(url, { headers: { ...auth, Range: "bytes=0-7" }, }); if (lenResp.status !== 206) { await lenResp.body?.cancel(); // drain instead of buffering throw new SafetensorParseError("server did not honor Range"); } const len = new DataView(await lenResp.arrayBuffer()).getBigUint64(0, true); if (len > BigInt(MAX_HEADER_LENGTH)) throw /* refuse before the second request */; // 3. Request #2, JSON header body const headerResp = await fetch(url, { headers: { ...auth, Range: `bytes=8-${8 + Number(len) - 1}` }, }); if (headerResp.status !== 206) throw /* same refusal logic */; return JSON.parse(await headerResp.text());

Two requests, no fileDownloadInfo, no WebBlob intermediary. Plain web standards: fetch, Response.body, DataView, and a refusal to buffer a multi-gigabyte body if the server ignores the Range header.

flowchart LR
  subgraph OLD["BEFORE - 3 requests / shard"]
    direction TB
    O1["fileDownloadInfo
Range: 0-0"] --> O2["WebBlob slice(0,8)
Range: 0-7"] --> O3["WebBlob slice(8,N)
Range: 8-N"] end subgraph NEW["AFTER - 2 requests / shard"] direction TB N1["fetch
Range: 0-7"] --> N2["fetch
Range: 8-N"] end OLD ==>|"-33% HTTP fan-out"| NEW style OLD fill:#1e0a0a,stroke:#ef4444,color:#fca5a5 style NEW fill:#0a1e10,stroke:#10b981,color:#86efac

Safety invariants preserved

Auth
Authorization: Bearer <token> forwarded identically.
Header cap
MAX_HEADER_LENGTH (25 MB) enforced before issuing the body request.
200 refusal
A non-206 response means the server is streaming the whole shard. We drain and throw, not buffer.
Custom fetch
The user-overridable fetch parameter (proxy / header-rewrite) is preserved end-to-end.
Xet single-file
Untouched. XetBlob reconstruction logic continues to flow through downloadFile.
URL construction
Bucket vs model prefix, revision encoding, raw=false, every detail copy of fileDownloadInfo.
03 · The tests

Three offline tests, mocked fetch, no network

The new spec file uses an in-process fetch stub that records every call and asserts exactly the request shape the fix promises.

  • sharded path issues exactly 2 HTTP requests per shard (not 3), instruments fetch, parses a 3-shard fake repo, asserts 2 × 3 = 6 shard requests with the correct Range headers.
  • rejects a shard response that returns 200 (server ignored Range), the mock returns a 1 MB 200 body, the test asserts the code throws with /did not honor Range/.
  • rejects an oversized header length, the mock returns 8 bytes encoding 100 MB, the test asserts the code throws with /header is too big/ before issuing the body request.
Note on integration tests. The pre-existing spec file (parse-safetensors-metadata.spec.ts) issues real HTTP calls against bigscience/bloom, Alignment-Lab-AI/ALAI-gemma-7b and three other live repositories. Those tests exercise the new sharded path end-to-end against the production Hub. They are not run in this session (network + pnpm install ≈ 5-10 min) but they will run automatically in CI on the PR.
04 · The outreach

Where this stands in the conversation funnel

The branch is committed locally and ready to push. No public PR is open yet, that is a deliberate choice that leaves the conversation initiation in Francesco's hands.

Done

Branch + commit + tests + README

Local branch fix/1979-sharded-safetensors-2hops, commit 04a0909, three unit tests written, full PR-shaped README authored.

Next

Fork + push + open draft PR

Two-line shell command. The fork goes onto github.com/999purple999/huggingface.js; the PR opens as draft so the maintainer can self-pace review.

Then

Cold mail to a named @huggingface/hub maintainer

Mail body anchored to the branch URL, with the explicit two-way CTA: 15-minute call OR async PR review. Compensation range stated up front: €40-70/h freelance, €2,500-3,000/mo retainer, part-time afternoon-evening CEST until July, full-time after.

Goal

Conversation within 5 business days, contract within 4 weeks

Realistic odds for this specific outreach given the proof shape: ~50-70% conversation, ~10-25% contract from this lead alone.

05 · How to verify

Reproduce every claim on this page in 4 commands

# Confirm the branch + commit exist locally cd c:/Users/FRA/Documents/github/workrepo/huggingface.js git switch fix/1979-sharded-safetensors-2hops git log --stat -1 # expect commit 04a0909, 3 files # Install workspace deps and run the package test suite pnpm install pnpm --filter @huggingface/hub test # Expected: 3 new unit tests PASS; all pre-existing safetensors # integration tests PASS (those hit real Hub URLs).

If any of these fail on your machine, the PR cannot land. Run them first.