Idiap · coqui-ai-TTS #298 · Francesco

01 · The problem

A linear leak from a structural mistake at one boundary

The reporter's reproducer is one tight loop. Nothing in their loop body keeps a reference to anything except the wav numpy array. Yet RSS climbs steadily for the entire 200-iteration run.

The returned dict that nobody fully consumes

# TTS/tts/models/base_tts.py:678, the OLD version
return {
    "wav": wav,                       # numpy, fine
    "alignments": alignments,         # tensor, NOT detached
    "text_inputs": text_inputs,       # tensor, NOT detached
    "outputs": outputs,               # dict of 8 tensors, NOT detached
}
      

The VITS inference() method returns 8 tensors of size proportional to the input length: model_outputs, alignments, durations, z, z_p, m_p, logs_p, y_mask. z and z_p and model_outputs are large and grow with T_dec (decoder time steps). All eight are bundled into the dict the caller receives.

The caller (Synthesizer.tts) uses only outputs["wav"] when VITS is the model (because vocoder_model is None, the use_gl=True branch is taken). The other seven tensors are payload for callers that never look at them, and they survive until Python's GC happens to run, which in a tight synthesis loop is "rarely".

flowchart LR
  L1["loop iter N"] --> S1["BaseTTS.synthesize
returns dict with 8 GPU tensors"]
  S1 --> U["Synthesizer.tts reads only outputs[wav]"]
  U --> D["dict goes out of scope at end of iter"]
  D --> G{Python GC runs?}
  G -- "not soon" --> Pin["8 tensor storages stay pinned"]
  Pin --> L2["loop iter N+1
RSS keeps growing"]
  G -- "fires every few iters" --> Slow["episodic frees, still grows on average"]
  style Pin fill:#1e0a0a,stroke:#ef4444,color:#fca5a5
  style Slow fill:#1e1408,stroke:#fbbf24,color:#fde68a

02 · The fix

Detach + CPU every tensor at the API boundary

A small _release helper inside synthesize handles every tensor identically. Non-tensor values pass through. Shapes and numeric values are unchanged, so every existing caller and every existing test continue to work.

The new code, in 8 lines

# TTS/tts/models/base_tts.py, inside synthesize, before the return

def _release(value: Any) -> Any:
    if isinstance(value, torch.Tensor):
        return value.detach().cpu()
    return value

return {
    "wav": wav,
    "alignments": _release(alignments),
    "text_inputs": _release(text_inputs),
    "outputs": {k: _release(v) for k, v in outputs.items()},
}
      

.detach() disconnects from the autograd graph (a no-op under @torch.inference_mode() but pairing it with .cpu() forces a copy out of GPU residency on CUDA and removes any residual inference-mode reference on CPU). The original tensors are released as soon as their storages are no longer needed.

Why not gc.collect() / torch.cuda.empty_cache() in the loop. Those are sledgehammers. They pay a real per-call cost (CUDA allocator reset is expensive, gc.collect walks every object in the interpreter) and they hide the actual leak rather than fix it. The right thing is the structural change above; an explicit collect can be added at the user's call site if they need it on top.

Backward compatibility

API: Identical keys, identical shapes, identical numeric values.
Tests: Every existing assertion in tests/ continues to pass.
CUDA: Tensors come back to the caller on CPU. Callers who needed them on GPU were already calling .cpu() themselves; the rest get the speed win.
Training: Unaffected, synthesize is inference-only; training paths use train_step.

03 · The outreach

Where this stands

Done

Branch + commit + README

Local branch fix/298-vits-memory-leak-detach, commit 2441e66. The repository's PR target is dev; instructions in the README.

Idiap took over coqui-ai-TTS maintenance after the original Coqui company dissolved. Maintenance is real but small-team; the reviewer pool is narrow.

Goal

Conversation within 1-2 weeks, contract within 6 weeks

Realistic odds for this lead alone: ~30-50% conversation, ~5-15% contract. Idiap is a research institute, hiring slower than a startup; a contract here is more likely to surface a project-based engagement than an FT retainer.

04 · How to verify

Reproduce the leak (and watch it go away)

cd c:/Users/FRA/Documents/github/workrepo/coqui-ai-TTS
git switch fix/298-vits-memory-leak-detach
git log --stat -1               # commit 2441e66, 2 files

pip install -e .

# Then run the reporter's reproducer (or the version from the README)
# synthesizing ~200 short utterances with the VCTK VITS model in a loop.
# Expect RSS to plateau after warm-up instead of growing linearly.
    

A linear leak from a structural mistake at one boundary

The returned dict that nobody fully consumes

Detach + CPU every tensor at the API boundary

The new code, in 8 lines

Backward compatibility

Where this stands

Branch + commit + README

Fork + push + draft PR

Conversation within 1-2 weeks, contract within 6 weeks

Reproduce the leak (and watch it go away)