Reverse engineering WhatsApp calls

i wanted to answer a WhatsApp call from code

not just see that a call is ringing, reject it but actually accept it, let the other phone enter the connected state, get relay/media traffic moving, and eventually play audio into the call

this sounds like something a WhatsApp Web library should not be able to do, and that is mostly true; libraries like Baileys and whatsmeow are great for messages, receipts, groups, media, contacts, app state, but calls are another layer completely

the funny thing is WhatsApp Web already ships the call engine to your browser; it is not a public and voice calling was not global enabled on WhatsApp Web while i was did this, but the code is there, and the call stack is a compiled WASM module with JavaScript around it acting as the browser adapter

this post is about how i made that thing run outside the browser, glued it to Baileys, fixed the signaling errors, got relay packets moving, and used prerecorded audio because testing with a file was easier than testing with a microphone i did not have connected

educational research only

this is not for abusing WhatsApp, bypassing user consent, or building spammy call automation

wa_call_signa  handle_incoming_xmpp_offer failed to parse offer
wa_call_signa  send Preaccept
core/call_     change_call_state: [ReceivedCall -> AcceptSent]
wa_tp.cc       Bind request sent for UDP relay

eventually the call connected and stayed alive until i cut it off

torn paper collage of a telephone handset on dark background

the starting point

i already had a project that fetched WhatsApp Web source files; the project downloaded the current web.whatsapp.com assets and saved interesting files into a files folder so they could be searched, diffed, and read without opening DevTools every time

the first working setup had three folders:

files - source files fetched from WhatsApp Web
wasm-loader - the first attempt at loading the WhatsApp Web VoIP WASM
baileys - glue code between Baileys and the WASM loader

the initial goal was intentionally small: receive one incoming call, not make outgoing calls, not build a phone system, not support every edge case, just make an incoming WhatsApp call reach the accepted state

that small goal still required almost everything: raw call signaling, WAP binary node encoding, the WebAssembly loader, callbacks for native-to-JS signaling, relay packet I/O, audio capture and playback drivers, and enough debugging to understand which layer was broken each time

the call UI was hidden behind ABProps

before reverse engineering the call stack, i had to make WhatsApp Web load it in the browser

for my account, WhatsApp Web did not show calling; the code was present in the bundle, but the feature was hidden behind internal experiment flags, and feature code asks WAWebABProps whether a specific experiment is enabled

so i used a Tampermonkey script i had written earlier:

WhatsApp Web Calling Enabler

the script runs at document-start, waits for the WhatsApp module system, finds WAWebABProps, and wraps getABPropConfigValue; the idea is to return enabled values for calling-related props while leaving everything else alone

roughly:

function wrap(original) {
  return function (...args) {
    const key = args[0]

    switch (key) {
      case "enable_web_calling":
      case "enable_web_group_calling":
      case "web_voip_call_tab_new_call":
        return true
      case "calling_lid_version":
        return 1
      default:
        return original(...args)
    }
  }
}

this does not implement calls it just convinces WhatsApp Web that the current account belongs to a calling-enabled experiment bucket

after that, open DevTools, go to Network -> Wasm, reload web.whatsapp.com, and the browser starts fetching a WASM file that looks like this:

KwJJIha0f3H.wasm

one URL i saw was:

https://static.whatsapp.net/rsrc.php/ys/r/KwJJIha0f3H.wasm

that file is the important part; it is compiled C++ code, and the surrounding JavaScript is mostly a loader plus adapters for browser APIs, so without enabling those AB props, the browser may never fetch the WASM

calls are not messages

the next problem was understanding what actually arrives over the WhatsApp socket

when a call comes in, Baileys emits a friendly call event:

{
  "from": "186775343484983:28@lid",
  "id": "00EA4DF98484BE6EBC45862BB9E3E1F6",
  "status": "offer",
  "isVideo": false
}

this is good for logging, but it is not enough for the VoIP engine; the WASM does not want "status: offer", it wants the actual WhatsApp binary node payload, the same thing WhatsApp Web would have passed into the native call stack

the real incoming stanza looks conceptually like this:

{
  tag: "call",
  attrs: {
    from: "186775343484983:28@lid",
    id: "35913.11715-64",
    platform: "web",
    version: "0"
  },
  content: [
    {
      tag: "offer",
      attrs: {
        "call-id": "00D898D54A3A6E16A0B7A51FFE7F9EBA",
        "call-creator": "186775343484983:28@lid"
      },
      content: [...]
    }
  ]
}

and after the offer, more call nodes arrive:

relaylatency
transport
terminate

the first mistake was trying to use only Baileys' parsed call event; the WASM saw an offer with missing content and failed with:

parse_xmpp_offer: empty call-id
handle_incoming_xmpp_offer failed to parse offer

the fix was to listen to the raw call stanza:

sock.ws.on("CB:call", async (node) => {
  await voipBridge.handleRawNode(node)
})

that one line changed the whole direction of the project; from that point, the bridge could feed the same kind of node WhatsApp Web itself receives into the WASM

the WAP binary problem

the raw node object is still not what the WASM wants

WhatsApp's native call API expects a base64 string containing a WAP binary node, so the bridge had to take the raw child node, encode it back into WhatsApp's binary XML format, then base64 it

Baileys already had the useful functions:

import { encodeBinaryNode, decodeBinaryNode } from "baileys"

function encodeStanzaBytes(node) {
  return Buffer.from(encodeBinaryNode(node))
}

function encodeB64(bytes) {
  return Buffer.from(bytes).toString("base64")
}

async function decodeStanza(bytes) {
  return decodeBinaryNode(Buffer.from(bytes))
}

the offer path became:

const signalNode = getFirstChildNode(callNode)
const wapBytes = encodeStanzaBytes(signalNode)
const b64Stanza = encodeB64(wapBytes)

voip.handleIncomingSignalingOffer(
  b64Stanza,
  platform,
  version,
  String(e || 0),
  String(t || 0),
  offline ? 1 : 0,
  isNotContact ? 1 : 0,
  peerJid,
  null
)

the important detail is that we pass the signaling child (offer, transport, relaylatency) to the WASM, not a simplified JSON object

once that was fixed, the logs changed from "empty call-id" to the stack actually parsing the offer

Offer from:4983:28@lid call_id:00E525341CB0E2B50259CE72187924AA
Handle MESSAGE Offer
init_local_state begins
create_p2p_transport start
send Preaccept

that was the first real win

the encrypted call key issue

after raw nodes started flowing, there was another parsing problem:

encryption 00D  bad enc size 178
handle_incoming_xmpp_offer failed to parse offer, status=70004

this happened because some call offers contain an enc node. that encrypted payload is not directly the key material the WASM expects. Baileys can decrypt normal message payloads, but i had to use its Signal repository to decrypt the call payload and extract the call key

the simplified flow was:

async function decryptIncomingCallEnc(signalNode, info) {
  const encNode = findChild(signalNode, "enc")
  if (!encNode) return signalNode

  const decrypted = await sock.signalRepository.decryptMessage({
    jid: info.peerJid,
    type: encNode.attrs.type,
    ciphertext: encNode.content,
  })

  const message = proto.Message.decode(unpadRandomMax16(decrypted))
  const callKey = message?.call?.callKey

  if (callKey) {
    encNode.content = Buffer.from(callKey)
  }

  return signalNode
}

this is the kind of annoying bug that looks like a WASM parser issue at first, but it is actually a missing protocol step before the parser

after replacing the encrypted payload with the extracted call key, the offer stopped failing with bad enc size

loading the wasm outside the browser

finding WAWebVoipWebWasmLoader.js was the map. it showed how WhatsApp Web creates the Emscripten module, where the .wasm file is located, and which callbacks the native side expects

the first Node loader looked roughly like this:

import { createRequire } from "module"

const require = createRequire(import.meta.url)
const createVoipModule = require("./WAWebVoipWebWasmLoader.js")

export async function loadVoipWasm({ wasmPath, persistentDir, callbacks }) {
  const Module = {
    locateFile(file) {
      if (file.endsWith(".wasm")) return wasmPath
      return file
    },
    print(text) {
      console.log("[voip:wasm]", text)
    },
    printErr(text) {
      console.error("[voip:wasm]", text)
    },
    noInitialRun: true,
    thisProgram: "wa-voip",
    preRun: [
      function () {
        Module.FS.mkdirTree(persistentDir)
      },
    ],
    ...callbacks,
  }

  return createVoipModule(Module)
}

the real loader had more details, especially around filesystem paths, pthread pool size, and native bridge callbacks. but the principle was simple: make the WhatsApp loader think it is still in the environment it expects, then replace browser pieces with Node implementations one by one

after loading the Emscripten module, i wrapped the exports in a small class because calling embind exports directly everywhere gets messy:

class VoipStack {
  constructor(module) {
    this.module = module
  }

  initVoipStack(selfJid, phoneJid, lidJid) {
    return this.module.initVoipStack(selfJid, phoneJid, lidJid, false)
  }

  handleIncomingSignalingOffer(stanza, platform, version, e, t, offline, notContact, peerJid) {
    return this.module.handleIncomingSignalingOffer(
      stanza,
      platform,
      version,
      String(e || 0),
      String(t || 0),
      offline ? 1 : 0,
      notContact ? 1 : 0,
      peerJid,
      null
    )
  }
}

once the wrapper had names like initVoipStack, handleIncomingSignalingOffer, handleIncomingSignalingMessage, and acceptCall, the rest of the code became much easier to debug

initializing the call stack

the first clean milestone was loading the module and initializing the VoIP stack

when this worked, the logs looked like this:

[VoipStack] WA Web embind module loaded
[VoipBridge] WASM loaded
VoipInit.cpp:518 initVoipStack called
WasmVoipAVDriverManager.cpp:189 [AV][register_audio_capture_driver_factory] SUCCESS
WasmVoipAVDriverManager.cpp:212 [AV][register_audio_playback_driver_factory] SUCCESS
WasmVoipAVDriverManager.cpp:141 [AV][initialize] SUCCESS
pjlib 2.13 for POSIX initialized
wa_media_api. init_media_endpt_and_codecs Enter
wa_opus.c pjmedia_codec_opus_init success

this told us a few things. WhatsApp's call stack is built on pjlib / pjmedia, Opus is initialized inside the WASM, and the JavaScript side must register audio capture/playback drivers

it also printed:

External audio not supported for this platform - Registering Virtual Audio

that was good news, if the WASM can use a virtual audio device, i can feed it samples myself without worrying about physical audio hardware, drivers, permissions, or sample rates

one issue here was Emscripten threads when the call state became more active, it printed:

Tried to spawn a new thread, but the thread pool is exhausted.
If you want to increase the pool size, use setting -sPTHREAD_POOL_SIZE=...

the loader could provide a larger pthread pool at startup increasing the pool size made these warnings stop being a constant worry during call setup

sending preaccept and accept

once the offer parsed, the WASM did what WhatsApp Web normally does: it sent preaccept

change_call_state call id ...: [None -> ReceivedCall]
send Preaccept to 4983:28@lid
EVENT: Call offer received

the WASM does not send this over the socket by itself it calls back into JavaScript with a binary stanza that JS is supposed to send. so the bridge needed the reverse direction too:

onSignalingXmpp({ peerJid, callId, xmlPayload }) {
  const node = await decodeStanza(xmlPayload)
  const callNode = wrapOutgoingSignalingNode(node, peerJid)
  await sock.sendNode(callNode)
}

one bug here was sending the wrong shape back to Baileys sometimes the WASM gives a child node, not a full call wrapper. if you send the child directly, WhatsApp does not treat it as a call stanza. the fix was to wrap it:

function wrapOutgoingSignalingNode(node, peerJid) {
  if (node.tag === "call") {
    node.attrs ??= {}
    node.attrs.to = peerJid
    node.attrs.id ??= makeStanzaId()
    return node
  }

  return {
    tag: "call",
    attrs: {
      to: peerJid,
      id: makeStanzaId(),
    },
    content: [node],
  }
}

another bug was decoding the outbound payload with the wrong type Baileys' binary decoder wants a Buffer, and passing the wrong object caused errors like:

TypeError: buffer.readUInt8 is not a function

that was fixed by normalizing every WASM payload to Buffer.from(...) before decoding

after preaccept, accepting the call was much smaller:

await voip.acceptCall(true, false)

for testing, auto accept was just:

if (options.autoAccept) {
  setTimeout(() => {
    acceptCall(options.autoAcceptMic, false)
  }, 250)
}

the log finally became:

wa_call_accept_asymmetric_2 begin
ACTION accept call offer
configure_audio_pipeline_sampling_rates
change_call_state: [ReceivedCall -> AcceptSent]
send Accept to peer
wa_call_accept_asymmetric() status 0

at this point the other phone could see the call as accepted but accepted signaling is not the same thing as media

signaling is not the call

this was the point where the problem split into two parts signaling was mostly working, but the actual call still needed relay allocation, bind requests, packet routing, audio capture, and playback

the logs made this very clear:

Relay List Update
setting relay info, num_relays: 2
Bind request sent for UDP relay
Ping request sent to UDP relay
Data Tx to peer: Relay:(bytes:0,pkts:0), P2P:(bytes:0,pkts:0)

the call stack was trying to talk to Meta relay servers if relay packets were not actually being sent and returned to the WASM, the call would sit in a half-connected state and eventually terminate

the SCTP dead end

one thing i tried too early was a WebRTC/SCTP data channel bridge

the assumption was reasonable: WhatsApp Web is a browser app, browsers use WebRTC, so maybe relay traffic needs a data channel path i tried creating peer connections to relay addresses and negotiating an SCTP data channel

the logs were not encouraging:

[SCTP] 57.144.41.57:3480 ICE state=checking
[SCTP] SDP negotiation done
[SCTP] ICE state=failed
[SCTP] PC state=failed

every relay failed, this told us the immediate missing piece was not "connect a data channel to relay port 3480". the WASM was already sending UDP bind requests to port 3478, so i focused on the UDP relay callbacks instead

this was a useful wrong turn it reduced the search space

relay packets and IPv6 weirdness

the WASM calls JavaScript with something like:

sendDataToRelay({ data, len, ip, port })

so the bridge created UDP sockets:

const udp4 = dgram.createSocket("udp4")
const udp6 = dgram.createSocket("udp6")

udp4.on("message", (packet, rinfo) => {
  voip.handleIncomingRelayPacket(packet, rinfo.address, rinfo.port)
})

outgoing packets were sent to the relay:

socket.send(packet, port, ip)

once this was wired, debug logs started showing real traffic:

Relay tx 344 bytes to 163.70.145.133:3478
Relay rx 20 bytes from 2a03:2880:f28a:1db:face:b00c:0:6749:3478

that rx line mattered because before this the WASM was just sending bind requests forever and never hearing back

then IPv6 caused another issue, the stack would start on IPv4, ping the alternate address family, then decide IPv6 looked better:

detected relay connection on alt af
Using IPv6 and resetting relay
Bind request sent for UDP relay [ipv6]:3478

the enviroment did not always have usable IPv6 the workaround was to keep relay aliases: if the WASM selected an IPv6 relay but i knew the matching IPv4 relay, send the packet through IPv4 and report the address back in the form the WASM expected

it is not elegant, but it moved the state machine forward, the native transport code cares about relay identity and observed packet source, and the bridge can map those if it is careful

audio devices

after signaling and relay I/O, audio was the next wall WASM asks JS to register audio capture and playback drivers in the browser, those map to Web Audio and MediaStream in Node, i had to provide our own driver layer

the first driver was silence:

capture.readFrame = () => silenceFrame

that sounds useless, but it is a good test it lets you prove that the call can connect without worrying about microphone permissions, speaker devices, sample rate conversion, or feedback

the negotiated audio format showed up in logs:

audio_device_sampling_rate 16000
audio_device_samples_per_frame 320
conf_bridge_sampling_rate 16000
frame_length_ms 60

for microphone input, the quick version was using ffmpeg to output raw PCM:

const mic = spawn("ffmpeg", [
  "-f",
  "alsa",
  "-i",
  "default",
  "-ac",
  "1",
  "-ar",
  "16000",
  "-f",
  "s16le",
  "pipe:1",
])

mic.stdout.on("data", (chunk) => {
  captureBuffer.push(chunk)
})

playback is the reverse: write raw PCM frames into something that can play them

const speaker = spawn("ffplay", [
  "-nodisp",
  "-autoexit",
  "-f",
  "s16le",
  "-ac",
  "1",
  "-ar",
  "16000",
  "pipe:0",
])

speaker.stdin.write(frame)

this is not the only way to do audio, but ffmpeg/ffplay made it easy to prove the media path before making the audio layer nicer

why prerecorded audio was easier

the call did support microphone audio, but prerecorded audio was easier to test because i did not have a good microphone connected to that machine lol

with a microphone, silence can mean many things: the mic is muted, the format is wrong, the capture process failed, the wrong device was selected, or the call media path is broken. with prerecorded audio, the test is deterministic. call the account and listen for the same file every time

the capture driver just reads frames from a file and pretends they came from a microphone:

class PrerecordedAudioCapture {
  readFrame(samples) {
    return nextPcmChunkFromFile(samples)
  }
}

but the file cannot just be an mp3 or any other format the capture driver wants raw PCM in the format the virtual microphone is pretending to provide

so the test file was converted with ffmpeg:

ffmpeg -i input.mp3 \
  -ac 1 \
  -ar 16000 \
  -f s16le \
  greeting.pcm

that gives a mono, 16 kHz, signed 16-bit little-endian PCM file. then the driver can read exact frame sizes:

class PrerecordedAudioCapture {
  constructor(filePath, { loop = false } = {}) {
    this.audio = fs.readFileSync(filePath)
    this.offset = 0
    this.loop = loop
  }

  readFrame(byteLength) {
    if (this.offset >= this.audio.length) {
      if (!this.loop) return Buffer.alloc(byteLength)
      this.offset = 0
    }

    const end = Math.min(this.offset + byteLength, this.audio.length)
    const frame = Buffer.alloc(byteLength)
    this.audio.copy(frame, 0, this.offset, end)
    this.offset = end
    return frame
  }
}

this was the most practical way to test audio if the remote phone hears the file, the media path is alive

the final connected call

there was no single beautiful success log it was more like the absence of failure

before the fixes calls ended with stats like:

call_accept_sent: 0
rx_total_bytes: 0
call_result: setup error
call_term_reason: timeout

after the fixes the call accepted and stayed connected until i ended it important behavior was:

incoming raw offer parsed
encrypted call key handled
preaccept sent
accept sent
relay packets sent and received
audio driver initialized
prerecorded audio could be fed as capture
call stayed alive

that was the target

what the bridge looked like

by the end the dev bridge had a few main pieces:

VoipStack wrapped the Emscripten module and exports
VoipBridge connected WhatsApp signaling to the WASM
AudioDevice implemented silence, microphone, playback, and prerecorded capture
WebRtcBridge held the P2P/SCTP experiments

the data flow looked like this:

Baileys raw call nodes
        |
        v
  WAP binary codec
        |
        v
WhatsApp Web VoIP WASM
        |
        +--> signaling callback --> send call node
        +--> relay callback      --> UDP socket
        +--> capture callback    --> mic / silence / prerecorded audio
        +--> playback callback   --> speaker

Baileys was useful but it was not the call engine it gave access to raw call stanzas and WhatsApp binary node helpers the actual call engine was WhatsApp's own WASM

that distinction matters becuase the hard part was not inventing a call protocol the hard part was feeding WhatsApp's own call engine the same environment it expects in the browser

what made it hard

the hardest part was keeping the layers separate there were at least five protocols stacked on each other:

WhatsApp websocket session
WhatsApp binary node format
call signaling stanzas
WhatsApp VoIP native API
relay/media transport

when something failed, the logs rarely told you which layer was wrong

bad enc size meant the encrypted call key was not transformed correctly
empty call-id meant the wrong stanza shape was passed
no active call meant the offer never created call state
no tx relays available meant relay bind/ping responses were not reaching the transport
rx_total_bytes: 0 meant media never arrived, but that could be signaling, relay, candidates, or audio

the way through was making one layer boring before moving to the next: raw call node, WAP encoding, offer parse, encrypted call key, preaccept, accept, relay tx, relay rx, audio capture, playback

ending

this started as "can we receive a call?"

the answer was yes it took enabling the hidden AB props, fetching the WASM, loading the WhatsApp Web VoIP stack outside the browser, passing raw call stanzas into it, sending its signaling output back through Baileys, implementing relay I/O, and feeding audio through virtual devices

after that, the call worked like a real WhatsApp Web call not because i built a new call engine, but because WhatsApp had already shipped one

i just made it run somewhere else

*Reverse engineering WhatsApp calls