WebRTC: Web real-time communication

In this blog post, we provide an overview of WebRTC and look at its usage in several industry and consumer solutions.

WebRTC allows peer-to-peer communication between two browsers in a simple and easy manner without needing to install any plugins or extra software. WebRTC is continuously evolving with new features and improvements since its introduction in 2011. WebRTC capability is built into modern web browsers, such as Chrome and Firefox. The second peer in this interaction doesn’t need to be a browser but any component that can understand and communicate through WebRTC, which opens its applicability to a broader set of use cases than just browser-to-browser real-time communication (RTC).

WebRTC components and audiences

WebRTC has the following functional components:

Media capture: WebRTC media capture API allows access to the camera and microphone of the end-user devices after their consent.
Audio and video (AV) encoding and decoding: The AV data is broken into smaller chunks for transport and exchange between the peer endpoints involves using encoding and decoding component called a codec. Many codecs are available and maintained by different companies. The WebRTC framework, supported by the web browsers, chooses the optimal codec supported by both peers in the interaction.
Transport: The transport layer functions involve ordering the packets, dealing with the packet loss, and setting up the connection between the peers. The WebRTC framework supports several noteworthy events throughout the lifecycle of the peer-to-peer connection, including failures.
Session management: This stage involves connection management between the peers (RTCPeerConnection) using information exchange and protocol negotiation through a process called signaling. Along with the AV streams transferred on this connection, data can also be exchanged between the peers through RTCDataChannel.

The WebRTC API caters to the following audiences:

WebRTC client application developers
Browser vendors
Overridable API for browser vendors

This post focuses on the WebRTC client application developer API to help readers understand the underlying technology involved in WebRTC.

Developer API

The Developer API deals with the following elements provided by all browsers that support WebRTC:

RTCPeerConnection encapsulates the functions for peer connection negotiation, connection lifecycle and media stream attachments.
MediaStream API allows access to the video and audio stream objects, selection of sources, and security enforcement through consent of the user.
RTCDataChannel provides a channel for data exchange using the peer connection.

The following diagram captures the components that constitute the WebRTC functionality:

The key concept for WebRTC is traversal of the network address translation (NAT) path. Setting up RTCPeerConnection involves exchange of information and protocol negotiation between the peers before the network packets can traverse the NAT path.

The process aiding the connection negotiation is called signaling, and the component that facilitates this negotiation is called the signaling server. Depending on the security level of the NAT infrastructure involved at the peer endpoints, you might need a proxy between the peers to establish a connection.

NAT paths are of the following types:

Static NAT: One-to-one mapping of a private IP address to a public IP Address
Dynamic NAT: One-to-one mapping of a private IP address to a public IP from a pool of IP addresses
Port address translation (PAT): Mapping of multiple private IP addresses to a single public IP address using different ports

NAT maintains a table of entries to enable translation of transport addresses between the NAT device’s external and internal interfaces. An entry is created in the NAT table when the first outbound packet arrives at the routing device, which becomes the traversal rule for any subsequent inbound packets.

The following NAT translation methods are implemented in various routing appliances:

One-to-one NAT
Address restricted NAT
Port restricted NAT
Symmetric NAT

Because of the stringent requirements of the symmetric NAT, peer-to-peer connection between two peer devices isn’t feasible. The connection in this case needs to be relayed through a proxy, where each peer connects to an intermediary, called a TURN server, which handles the marshaling of traffic between the two peers. The other NAT translation methods support direct peer-to-peer connection and don’t need an intermediary.

Signaling

Setting up RTCPeerConnection involves signaling and connection negotiation to establish a direct peer-to-peer connection between browsers as a result. The role of a signaling server is to facilitate this negotiation process through the following steps using events and event handlers.

The initiator of the peer connection creates an offer and sends it to the signaling server.
The signaling server relays the offer to the peer that wants to participate in the connection.
The receiving peer accepts the offer received as payload of an event and the corresponding event handler creates an answer and sends it back to the signaling server to be relayed to the connection initiator.
An interactive connectivity establishment (ICE) process determines some candidate IP addresses that provide a clear route between the peers.
The peers pick the best candidate from the list of candidate IP addresses to establish a connection.

The offer and answer consist of session description protocol (SDP), which has the media description and constraints of each peer device participating in the connection. The peers exchange SDP data with the help of the signaling channel to establish a connection. When the peers know each other’s capabilities based on the SDP exchange, they figure out a clear route through their networking infrastructure and its constraints.

Figuring out a route between the two peers is aided by the following tools:

STUN: Session traversal utilities for NAT
TURN: Traversal using relays around NAT
ICE: Interactive connectivity establishment

The following diagram captures how different tools and components fit into WebRTC implementations:

The first step to finding a clear route is for each peer to find its own IP address, aided by a STUN protocol enabled server. After receiving a STUN request, the STUN server responds with a public facing IP address of the requestor. If the requestor is sitting behind a router and has a private IP, the router creates a NAT record and translates the private IP to a public IP, which is visible to the STUN server, and it returns the address it sees, as a response to the STUN request.

The following diagram captures a common scenario where the two peers can establish a direct peer-to-peer connection.

For more secure implementations of NAT, typically in enterprises, firewalls implement symmetric NAT that doesn’t allow direct connection through STUN-based traffic. In these cases, you need a server enabling traversal using relays around NAT (TURN) to act as a proxy relay between the two peers.

The choreography required to bring together the interactions between STUN, TURN, and the peers is handled by interactive connectivity establishment (ICE). It uses STUN and TURN servers to provide a successful peer connection either directly between the peers or through TURN as a proxy or relay.

To start, ICE doesn’t have any knowledge of the network capabilities of the peers. It runs a series of tests to discover the capabilities of each peer’s network and incrementally arrives at a list of IP addresses, called ICE candidates, that work for both the peers to connect with each other. When one of peer’s browser finds an ICE candidate, it sends a notification to the other client through the signaling server which the other peer can accept or reject, based on its capabilities. After enough ICE candidates are discovered and tested, a connection is established.

After the direct peer-to-peer connection is established, the peers can exchange AV data streams quickly, providing a near-real-time communication between them. The reliability of AV transmission isn’t paramount because the consumers of the data are typically humans, and missing a few frames of AV isn’t noticed or perceived by humans. So, the underlying protocol used by WebRTC is user datagram protocol (UDP), which provides low-latency and loss-tolerating connection, allowing near-real-time communication. WebRTC uses datagram transport layer security (DTLS) on top of UDP to prevent eavesdropping, tampering, and message forgery of UDP data.

RTCDataChannel

RTCDataChannel uses stream control transmission protocol (SCTP) for data exchange over an existing RTCPeerConnection. SCTP is built on top of DTLS to allow data blob transfer using RTCDataChannel on the same RTCPeerConnection established between the peers. SCTP provides similar reliability as TCP but is more efficient than TCP and allows multiple streams to be sent simultaneously. RTCDataChannel provides an alternate channel for data transfer that can be customized as needed for traditional TCP-based data transfer mechanisms.

SCTP uses multiple endpoints and connections between two IP addresses and by breaking down a message into multiple chunks that can be sent on those connections and reassembled at the receiving endpoint.

Oracle Cloud Infrastructure

Oracle Cloud Infrastructure (OCI) is the next-generation cloud designed to run any application, faster, and more securely for less money. For enterprises building applications and services based on WebRTC, OCI provides the best, efficient, performant, and cost-effective infrastructure platform to host STUN and TURN servers, signaling servers, and the WebRTC client applications. Developers can use our Free Tier with no time limits on a selection of Always Free services, such as Autonomous Database, Compute, Storage, and free credits to try more cloud services.

Sample WebRTC application

The following is a sample single-page web application using WebRTC to demonstrate a simple video chat using peer-to-peer connection. It illustrates all the component interactions, APIs, signaling, and protocol negotiation process.

All the components were coded in nodejs, and the peers use WebSockets to connect to the signaling server. The application is served through an HTTPS-enabled webserver to allow access to the web cameras and microphones of the peer devices for AV streams. Although several sophisticated Javascript wrapper libraries are available that hide the complexities of RTCPeerConnection, signaling process and reduce the client code footprint to a few lines of code, this example used raw API to clearly understand all the underlying interactions for a successful peer-to-peer connection.

The following screenshots show the sample WebRTC application hosted on an OCI Free Tier virtual machine (VM).

Sample App Login Screen

Goofy's local view

Goofy's View of Cruise

Cruise's View of Goofy

Use cases for WebRTC

With broadband internet access becoming widely accessible with less than 500-millisecond latency, WebRTC includes the following applications:

One-way conversational devices: Conversational digital assistants like Amazon Alexa and Amazon Chime, Google Duplex, and Dialogflow.
Internet of Things (IoT): DroneSense, a software platform for drones uses WebRTC for video conferencing
Metaverse: With recent interest in Metaverse, WebRTC is one of the key technologies sure to play a key role in enabling immersive experience.
Click-to-chat video conferencing: Examples include Slack, Google Hangouts, Facebook Messenger, and Houseparty.
Surveillance: Advanced surveillance systems, such as Ring and Amaryllo, use WebRTC for its secure nature that addresses some of the privacy concerns.
Self-driving automobiles: Self-driving technology companies that integrate Waymo’s technology are implementing real-time communications based on WebRTC in self- and assisted-driving automobiles.
Auctions: Auction houses, automobiles, art, rural livestock, and other auctions are made available in near-real-time remotely to digitally connected audience.
Gambling and betting: Racing, casinos, and sportsbooks are now accessible virtually. Similarly, sports betting has gone mainstream in virtual worlds in regulated markets. Examples include Twitch, FanDuel, and BetMGM.
Telehealth and remote monitoring: Telemedicine has become a common place using WebRTC for its browser-based no-plugin conferencing capabilities. Other examples include streaming enabled IoT medical devices for remote monitoring and heath assessments.
Emergency response and communication: WebRTC-enabled body cams, video-enabled emergency calls, bomb disposal, and disaster relief robots
Connected fitness and health: Connected fitness equipment and wearables like Mirror, Fitbit, and Peloton.
Live commerce and shoppable video: Jewelry Television (JTV), Facebook Live shopping events, live auctions, and influencer streaming.
Gaming and esports: Virtual reality and multiplayer gaming, Pixel Streaming for game developers and designers. Cubeslam and AirConsole also use WebRTC.
Online education: Remote and online learning and tutoring

Conclusion

The sub-second glass-to-glass latency and no requirement of client-side plugins enable WebRTC, to hold a lot of promise for a myriad of applications and solutions for peer-to-peer communications and live streaming media distribution.

WebRTC: Web real-time communication

WebRTC components and audiences

Developer API

Signaling

RTCDataChannel

Oracle Cloud Infrastructure

Sample WebRTC application

Use cases for WebRTC

Conclusion

Praveen Coca

Master Principal Cloud Architect

Cross-tenancy access: AssumeRole in OCI

Oracle University: Deploying applications to Oracle Container Engine for Kubernetes

WebRTC: Web real-time communication

WebRTC components and audiences

Developer API

Signaling

RTCDataChannel

Oracle Cloud Infrastructure

Sample WebRTC application

Use cases for WebRTC

Conclusion

Authors

Praveen Coca

Master Principal Cloud Architect

Cross-tenancy access: AssumeRole in OCI

Oracle University: Deploying applications to Oracle Container Engine for Kubernetes