OutreachMattermostFix #1143
Code ready · PR-shaped 3 unit tests · 13 IPs +334 LOC across 4 files Awaiting push + draft PR

Mattermost

mattermost/mattermost-plugin-calls · issue #1143 · branch fix/1143-warn-rfc1918-docker-ice

A plugin-layer pre-flight diagnostic that detects the exact failure shape #1143 reports: Mattermost Calls running in Docker, advertising a 172.x bridge IP as an ICE candidate, coturn rejecting that peer with 403, and the call dropping minutes later on allocation timeout. Zero behavior change, a single loud log line at activate time, named IPs included, with the exact setting the operator should change.

1 line
log emitted at activate, only when truly misconfigured
13 IPs
Test cases including RFC1918, RFC6598, ULA, loopback
0
Behavior changes for legitimate LAN-only setups
+100%
Operator diagnosability (was invisible)
01 · The problem

A silent ICE leak that only coturn knows about

The reporter ran Mattermost Calls inside Docker with a self-hosted coturn. Calls connected fine, media flowed for minutes, then dropped. The Mattermost plugin logs showed nothing. The clue was buried inside coturn's logs: peer 172.21.0.3 lifetime updated · CREATE_PERMISSION processed, error 403: Forbidden IP. The Docker bridge address was being advertised as an ICE host candidate, coturn refused to relay to a non-allowlisted private IP, and the call eventually died on allocation timeout.

Why the plugin couldn't tell

The plugin only passes ICEHostOverride through to github.com/mattermost/rtcd/service/rtc.NewServer; the actual ICE candidate gathering lives there. When ICEHostOverride is empty and rtcd enumerates local interfaces, it picks up whatever the kernel hands it, including the Docker bridge address.

The operator has no signal that this is happening. The plugin logs say "activated"; the call starts; media flows; then on the first TURN relay refresh, coturn rejects the unreachable peer and the session falls apart. Diagnosing requires SSHing into the coturn host, tailing its log, and matching peer addresses against your Docker network.

That diagnosis loop costs hours every time the misconfiguration ships. The fix below saves those hours by making the plugin itself shout, once, at startup.

sequenceDiagram
  autonumber
  participant Op as Operator
  participant Plg as Calls plugin (in Docker)
  participant Rtcd as rtcd ICE gather
  participant Cli as Client
  participant Cot as coturn
  Op->>Plg: enable Calls, ICEHostOverride=""
  Plg->>Rtcd: start with ICEHostOverride=""
  Rtcd->>Rtcd: enumerate interfaces, see 172.21.0.3
  Rtcd-->>Cli: ICE candidate 172.21.0.3
  Cli-->>Cot: relay to peer 172.21.0.3
  Cot-->>Cli: 403 Forbidden IP
  Note over Cli,Cot: call drops, no log from plugin
      
02 · The fix

One loud line, only when truly misconfigured

A new file server/ice_diagnostics.go adds three small primitives that run once at activate.go: an RFC1918/RFC6598/ULA predicate, an interface enumerator, and a container detector. The composing function checkICEDockerMisconfiguration emits a single LogError only when four conditions all hold, the operator is not warned in any other configuration.

The composition, in plain English

The warning fires only when all of the following are true:

  • ICEHostOverride is empty (the operator did not opt out of the diagnostic)
  • The plugin process is running inside a container, detected via both /.dockerenv and a /proc/1/cgroup scan for docker, containerd, or kubepods
  • No routable (public) IP is bound to any local interface
  • At least one private IP (RFC1918 / RFC6598 / IPv6 ULA) is bound, meaning we are about to advertise that one

The log message names the offending IPs and tells the operator exactly which setting to set:

Log line: "Calls is running inside a container with only private (RFC1918/RFC6598) IP addresses available and ICEHostOverride is empty. ICE host candidates advertised to clients will be unreachable from outside the container, which typically manifests as the call dropping after connecting (issue #1143). Set the ICEHostOverride plugin setting to the public IP (or DNS name) that clients use to reach this host."

What is intentionally NOT done

A LAN-only deployment where every participant sits on the same Docker network is a legitimate setup. Silently dropping the RFC1918 candidate would break that. The diagnostic is purely additive: it does not change which candidates are advertised. It just makes the most common misconfiguration visible.

The real candidate filter, the ability to exclude RFC1918 host candidates from the ICE gather rather than warn about them, belongs in github.com/mattermost/rtcd/service/rtc.NewServer, not in the plugin. The README on the branch offers to send that second PR as a follow-up after this one merges.

flowchart TD
  A[activate.go entered] --> B{ICEHostOverride empty?}
  B -- no --> Z[return silently]
  B -- yes --> C{Inside container?}
  C -- no --> Z
  C -- yes --> D[Enumerate interfaces]
  D --> E{Any routable IP?}
  E -- yes --> Z
  E -- no --> F{Any private IP?}
  F -- no --> Z
  F -- yes --> G["LogError with named IPs
and the setting to change"] G --> H[continue plugin start - no behaviour change] style G fill:#1e0a0a,stroke:#ef4444,color:#fca5a5 style Z fill:#0a1e10,stroke:#10b981,color:#86efac

Safety invariants

Behavior
Zero change. The function is purely a logger. The plugin starts identically.
Cadence
One log line, once per activation. Configuration is static, no need to repeat per call.
Coverage
Runs before both the RTCD-client and embedded-RTC branches, so it applies to both deployment modes.
False positives
Refuses to flag 172.15.0.0/16 (just outside RFC1918) and 192.167.0.0/16 (just outside). Tested explicitly.
IPv6
Recognises fc00::/7 ULA addresses as private. Public IPv6 (2606:4700:...) is correctly treated as routable.
LAN-only setups
Not affected. The check only fires inside a container; on bare-metal hosts it never triggers.
03 · The tests

Thirteen representative IPs, including the boundary cases

The test file covers the predicate against the full RFC1918 + RFC6598 + ULA + loopback + link-local + four public-internet boundary cases.

AddressExpectedWhy this case matters
172.21.0.3privateThe exact address in the bug report
10.0.0.5privateRFC1918 10.0.0.0/8
192.168.1.1privateRFC1918 192.168.0.0/16
100.64.1.2privateRFC6598 carrier-grade NAT, cloud NAT gateways leak this
fd00::1privateIPv6 ULA fc00::/7
127.0.0.1 / ::1privateLoopback, always refuse-to-advertise
169.254.1.1privateLink-local, should not leak
172.15.0.1publicMust NOT be flagged, just outside 172.16/12
192.167.255.1publicMust NOT be flagged, just outside 192.168/16
8.8.8.8 / 1.1.1.1publicPublic DNS resolvers, sanity
2606:4700:4700::1111publicPublic IPv6 (Cloudflare), IPv6 sanity

Two additional tests pin the documented behaviour: nil is treated as private (refuse-to-advertise default), and the package-level CIDR slice parses all five entries correctly.

04 · The outreach

Where this stands in the conversation funnel

The branch is committed locally and ready to push. The mail framing is honest: this is the plugin-visible half of the bug; the candidate filter itself belongs in mattermost/rtcd, offered as a second PR.

Done

Branch + commit + tests + README

Local branch fix/1143-warn-rfc1918-docker-ice, commit 7a1ea51. Three Go unit tests on isPrivateIP, full PR-shaped README.

Next

Fork + push + open draft PR

Mattermost's review cadence is slower than HF's (corporate, multi-reviewer). Drafting the PR is right; expect 1-2 weeks of review traffic.

Then

Cold mail to a named Calls plugin maintainer

Mail body anchors on the branch URL and on HALCYON (the public proof of WebRTC mesh expertise). Compensation range stated: €40-70/h, €2,500-3,000/mo, part-time afternoon-evening CEST until July.

Goal

Conversation within 1-2 weeks, contract within 6 weeks

Realistic odds for this lead alone: ~30-50% conversation, ~5-15% contract. Mattermost's hiring cycle is longer than a startup's; this is more likely to produce a referral or a contract-with-procurement than an immediate retainer.

05 · How to verify

Reproduce every claim in 4 commands

# Confirm the branch + commit exist locally cd c:/Users/FRA/Documents/github/workrepo/mattermost-plugin-calls git switch fix/1143-warn-rfc1918-docker-ice git log --stat -1 # expect commit 7a1ea51, 4 files # Requires Go >= 1.22 on PATH go test ./server -run 'TestIsPrivateIP|TestIsPrivateIPNil|TestRFC1918NetworksParsed' -v # Expected: 3 tests PASS. # Then build the plugin and deploy in a Docker Mattermost host: make deploy # Watch the plugin logs at activate. With ICEHostOverride empty and # only 172.x interfaces, the new LogError line appears once.