How I fixed a web audio echo problem with a 5-second delay
if you want to do anything with audio in the browser, you’ll have to deal with the Web Audio API. it’s what powers everything — from music apps to AI voice chats.
recently I was building a real-time user ↔ AI voice conversation app CrystaCode.ai.
everything worked fine except one weird thing:
👉 during the first few seconds of each conversation, the AI’s voice got picked up by the mic. after that? no echo at all.
I spent days trying everything different routing, gain nodes, worklets, custom dsp, RNNoise — nothing fixed it. then, out of frustration, i added one line:
await new Promise(res => setTimeout(res, 5000));
between starting the mic and sending audio to the AI. and boom — echo gone forever.
wait... why did this work?
when you call:
navigator.mediaDevices.getUserMedia({
audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true }
});
the browser gives you a mic stream with built-in acoustic echo cancellation (aec).
that aec runs natively inside chrome, firefox, safari, or edge usually the same code used in webrtc (like webrtc::EchoCanceller in chrome).
the browser AEC works like this:
- it watches what’s playing from your speakers
- it listens to your microphone at the same time
- it learns the relationship between the two — basically: “when I play this, the mic hears that”
- it builds a model of your room acoustics and device path
- after a few seconds, it knows how to subtract the speaker sound from the mic signal
so it needs time to adapt — normally 2–5 seconds. during that time, the AEC hasn’t finished learning yet, so a little bit of echo leaks through. the bug wasn’t a bug
my web audio code was fine. the browser just needed time to adapt.
by delaying the start of sending mic audio to the ai for 5 seconds, I let the AEC finish adapting. after that, no more echo.
why browsers behave like this
browser AEC uses adaptive algorithms. they constantly adjust to changing acoustics speaker position, device, user movement, etc.
so, every time you start the mic, the algorithm resets and starts learning again. the first few seconds are a "training phase". if you stream mic audio too early, it still contains echo. after it converges — everything becomes clean.
TL;DR
- problem: ai voice was picked up by mic in first seconds
- reason: browser AEC needs time to adapt
- fix: delay sending mic data for ~5s before starting conversation
Top comments (2)
Nice write-up, thanks for sharing! We’d just add that in real-world WebAudio/WebRTC apps echo or delay problems often aren’t bugs in the app code, but a consequence of how browser-embedded echo cancellation works: it needs a few seconds of “training” on speaker + mic audio before it can cancel echoes reliably.
In one of our recent projects, we saw exactly the same behavior: first 3–5 s of echo, then silence. The workaround that fixed it was to delay transmitting microphone audio for a few seconds after getUserMedia - that gives the built-in AEC time to stabilize and prevents that initial echo.
So if you’re building voice-chat or voice-AI features in browser, it’s often worth explicitly waiting a short period before sending mic audio. That little extra delay can save a lot of headaches.
Thanks a lot for the insight
really appreciate you sharing this. That explanation about browser AEC training time makes a lot of sense and matches what we’re seeing in practice.
If you come across any new findings or updates in this area, I’d be happy to hear about them. Thanks again!ر