huggingface/huggingface.js · issue #1979 · branch fix/1979-sharded-safetensors-2hops
Sharded safetensors metadata parsing now issues 2 HTTP requests per shard instead of 3. On DeepSeek-Math-V2 (163 shards) that is the difference between the existing code failing 100% of bench runs and the new code completing in ~3.7 seconds. The change touches one file, ships three unit tests, and preserves every safety invariant of the prior path.
When the user calls parseSafetensorsMetadata() on a sharded checkpoint, the existing code walks every .safetensors shard listed in the index and, for each one, issues three HTTP requests. The first one, the call to fileDownloadInfo, returns file size, ETag, and xet redirect information that the caller never reads when all it wants is the JSON header. It is wasted work, and on heavily sharded models the fan-out is so large that real users see the requests time out.
From the issue body, observed in upstream benches across ten runs each:
| Model | Shards | Old requests | New requests | Outcome |
|---|---|---|---|---|
| bigscience/bloom | 72 | 216 | 144 | ≈ same wall time |
| Kimi-K2.5 | 64 | 192 | 128 | 34 % faster |
| Qwen3.5-397B | 94 | 282 | 188 | 26 % faster |
| DeepSeek-Math-V2 | 163 | 489 | 326 | fails 10/10 old → works new |
The DeepSeek row is the one that matters most. Four hundred eighty-nine HTTP requests issued from a single client overwhelm the Hub's connection budget; the existing path is not slow, it is broken. The new path reduces the fan-out to a level the server reliably honors.
Three layers of indirection each issue their own range request:
fileDownloadInfo with Range: bytes=0-0, one round trip just to learn size + ETag + xet status, none of which we use when parsing shard headers.Range: bytes=0-7 request to read the 8-byte little-endian header length.Range: bytes=8-N request to read the JSON header body.The first of those three is the one we can drop, because the data it returns is unused on the sharded path. The other two are real work the parser actually needs.
flowchart LR A["parseSafetensorsMetadata"] --> B["fetchAllHeaders"] B --> C["for each shard"] C --> D1["fileDownloadInfo
Range: bytes=0-0
unused for headers"] D1 --> D2["WebBlob.slice(0,8)
Range: bytes=0-7"] D2 --> D3["WebBlob.slice(8, 8+len)
Range: bytes=8-N"] D3 --> E["JSON.parse(header)"] classDef wasted fill:#ef4444,stroke:#ef4444,stroke-width:0,color:#fff class D1 wasted
A new function parseSingleFileFast(path, params) builds the same resolve URL fileDownloadInfo would target (bucket vs model prefix, revision encoding, raw=false, every detail mirrored exactly) and then issues only two range requests against it. fetchAllHeaders is rewired to call the fast helper for every shard. The single-file (non-sharded) entry path is left untouched, which preserves xet single-file checkpoint compatibility for the XetBlob reconstruction flow.
Two requests, no fileDownloadInfo, no WebBlob intermediary. Plain web standards: fetch, Response.body, DataView, and a refusal to buffer a multi-gigabyte body if the server ignores the Range header.
flowchart LR
subgraph OLD["BEFORE - 3 requests / shard"]
direction TB
O1["fileDownloadInfo
Range: 0-0"] --> O2["WebBlob slice(0,8)
Range: 0-7"] --> O3["WebBlob slice(8,N)
Range: 8-N"]
end
subgraph NEW["AFTER - 2 requests / shard"]
direction TB
N1["fetch
Range: 0-7"] --> N2["fetch
Range: 8-N"]
end
OLD ==>|"-33% HTTP fan-out"| NEW
style OLD fill:#1e0a0a,stroke:#ef4444,color:#fca5a5
style NEW fill:#0a1e10,stroke:#10b981,color:#86efac
Authorization: Bearer <token> forwarded identically.MAX_HEADER_LENGTH (25 MB) enforced before issuing the body request.fetch parameter (proxy / header-rewrite) is preserved end-to-end.XetBlob reconstruction logic continues to flow through downloadFile.fileDownloadInfo.The new spec file uses an in-process fetch stub that records every call and asserts exactly the request shape the fix promises.
2 × 3 = 6 shard requests with the correct Range headers./did not honor Range/./header is too big/ before issuing the body request.parse-safetensors-metadata.spec.ts) issues real HTTP calls against bigscience/bloom, Alignment-Lab-AI/ALAI-gemma-7b and three other live repositories. Those tests exercise the new sharded path end-to-end against the production Hub. They are not run in this session (network + pnpm install ≈ 5-10 min) but they will run automatically in CI on the PR.The branch is committed locally and ready to push. No public PR is open yet, that is a deliberate choice that leaves the conversation initiation in Francesco's hands.
Local branch fix/1979-sharded-safetensors-2hops, commit 04a0909, three unit tests written, full PR-shaped README authored.
Two-line shell command. The fork goes onto github.com/999purple999/huggingface.js; the PR opens as draft so the maintainer can self-pace review.
Mail body anchored to the branch URL, with the explicit two-way CTA: 15-minute call OR async PR review. Compensation range stated up front: €40-70/h freelance, €2,500-3,000/mo retainer, part-time afternoon-evening CEST until July, full-time after.
Realistic odds for this specific outreach given the proof shape: ~50-70% conversation, ~10-25% contract from this lead alone.
If any of these fail on your machine, the PR cannot land. Run them first.