Introduction
Most developers interact with Discord voice through high-level bot libraries such as discord.js or Eris. These libraries abstract away what is actually a fairly complex real-time system.
Behind the scenes, Discord voice consists of:
- A main Gateway WebSocket
- A separate Voice WebSocket
- A UDP-based real-time transport layer
dconnect is a CLI tool that implements this entire pipeline from scratch, without using any bot framework — and it does so using a user token, not a bot token.
⚠️ This article is for educational and experimental purposes only. Using user tokens may violate Discord’s Terms of Service.
What Does dconnect Do?
With a single command, dconnect:
- Connects to the Discord Gateway
- Authenticates a user account
- Sends a
VOICE_STATE_UPDATE - Receives voice server information
- Opens a Voice WebSocket
- Performs UDP discovery
- Establishes a real voice session
Result: the account actually joins the target voice channel.
CLI Usage
Options
Example
On success, the following sequence occurs:
- Gateway connection
- READY event
- Voice state & server updates
- UDP discovery
- Voice session becomes ready
Architecture Overview
The project consists of two core components:
- DiscordGateway — Handles the main Discord Gateway connection
- VoiceConnection — Manages Voice WebSocket + UDP
Both components are built on top of Node.js EventEmitter.
Discord Gateway Flow
Fetching the Gateway URL
This endpoint returns the Gateway WebSocket URL.
Establishing the WebSocket
Gateway v10 with JSON encoding is used.
HELLO & Heartbeat
After connecting, Discord sends:
The client must send heartbeats at this interval:
IDENTIFY
This authenticates the user session.
READY Event
The session_id is critical for the voice handshake.
Joining a Voice Channel
VOICE_STATE_UPDATE
This packet tells Discord that the user wants to join a voice channel.
VOICE_SERVER_UPDATE
Discord responds with:
endpointtoken
These values are required to open the Voice WebSocket.
Voice WebSocket Flow
Opening the Voice WebSocket
Voice IDENTIFY
READY Event
The voice server responds with:
ssrcipport
At this stage, UDP discovery begins.
UDP Discovery (The Critical Part)
Discord voice requires UDP discovery to determine the client’s external IP and port.
Once a response is received, the UDP transport is considered established.
Selecting the Transport Protocol
This selects Discord’s default encryption mode.
SESSION_DESCRIPTION
The voice server sends the encryption key:
At this point, the voice connection is fully established.
Speaking State
This updates the speaking indicator for the user.
Graceful Shutdown
- Heartbeats stop
- UDP socket closes
- All WebSockets close cleanly
Conclusion
dconnect is a low-level reference implementation of how Discord voice actually works under the hood.
Without any bot framework abstraction, it demonstrates:
- Gateway protocol
- Voice WebSocket handshake
- UDP discovery
- Encryption setup
If you want to truly understand Discord voice internals, this project exposes the entire pipeline.
If you don’t know what you’re doing, don’t use it. If you do know — this project should feel very satisfying.
Source: dconnect source code
