In this article, we’re going to examine the details of how WebRTC architecture actually works so that a layperson can understand it.
WebRTC is an open-source project that ties together devices using peer-to-peer interactive web apps. If you’ve had an in-browser video call or played a real-time game through a web browser, WebRTC is probably what drove the back-end technology of how that web application worked.
Table of contents
At the core of every video conferencing solution sits the architecture of sending and receiving the participants’ video/audio streams. For example, if there are N participants in a video conference each of them needs to see/hear the video/audio of all other N-1 participants.
This can be implemented in different ways, but there are three main architectures which are used in practice:
A hybrid approach is also possible - to use different kinds of architecture depending on the number of participants in the conference. That is more of an optimization and will be covered at the end of the article.
Peer-to-peer (P2P) is an application architecture, which is also occasionally referred to as mesh architecture. It represents the fundamental structure of network design and is straightforward to conceptualise. In the context of a conference, each individual is a peer, broadcasting their video and audio to every other peer through the establishment of a direct peer connection.
Below is a peer-to-peer architecture diagram illustrating P2P with four participants:
In the absence of intermediate media servers, privacy, facilitated through end-to-end encryption, is inherently present. However, while this seems advantageous, there is a significant limitation with P2P: it does not utilise upload bandwidth efficiently.
For example, if there are N participants in the call, each participant needs to establish N-1 peer connections and send N-1 times their video/audio for a total amount of N*(N-1) peer connections.
Still, many homes have asymmetrical internet connections - e.g. ADSL (Asymmetric Digital Subscriber Line), where the upload speed is severely limited compared to the download speed. And even if you have a good upload speed, there will still be an issue in an office setting where many people are sharing the same internet connection.
In reality, P2P (peer-to-peer) architecture makes sense mostly for 1-1 calls where 2 people participate in the conference. In that scenario, P2P is still optimal because each of the 2 participants only sends their audio/video one time sends only one time their video/audio.
Advantages:
Disadvantages:
CPU (Central Processing Unit) usage will be significantly higher on the client side because the browser needs to encode the video N-1 times to send it to N-1 other participants. Unless you have a really powerful machine, the performance will be easily affected.
The above disadvantages make the P2P architecture reliable mostly for 1-1 calls and not scalable. In practice, while P2P architecture works well for small-scale sessions, WebRTC mcu is preferred when more participants are involved, offering centralised management of streams.
This architecture has become the preferred option in contemporary video conferencing solutions. Central SFU (Selective Forwarding Unit) media servers act as intermediaries, receiving the incoming streams and then distributing them unaltered to the other participants.
Although this approach introduces additional complexity to the server side, it is a significant enhancement over P2P architecture. It addresses the issue of limited upload bandwidth and improves scalability, which are notable challenges with P2P.
The technique of simulcast is frequently employed in SFU video conferencing. Each participant transmits multiple streams at varying qualities to the SFU unit. The SFU then selects the appropriate stream quality to forward; for instance, it may send streams of lower quality to participants with weaker internet connections. Conversely, it can route the high-definition version of a stream to those who are displaying it prominently on their local system.
That way a large amount of downlink bandwidth can be saved and many participants can be displayed in the same grid even if participants have an average internet connection. The WebRTC sfu server helps by optimising bandwidth usage, as it only forwards streams without decoding them, reducing server load.
In SFU video conferencing, as illustrated in the above diagram, each participant sends their stream to the SFU media server a single time and, in turn, receives the streams of all other participants.
Advantages:
Disadvantages:
SFU (Selective Forwarding Unit) is the most popular architecture deployed today in video conferencing.
SFU is much more efficient during upload and scalable than P2P.
Also while users still need to download and decode each of the other participant’s streams, the simulcast technique can be applied to allow a display of up to circa 50 participants in a grid on an average connection and machine.
In the MCU (Multipoint Control Unit) architecture, every participant publishes their stream only once their stream is to a central server. But unlike SFU, the MCU (Multipoint Control Unit) central server has the role of a mixer - combining all received streams into one stream.
Then all participants consume this one mixed stream instead of subscribing individually to the stream of every other participant.
Disadvantages:
The decoding/encoding and mixing are much more taxing than just routing/relaying streams like SFU. And since companies generally cannot afford to spend at least 10 times more money on the server side, SFU is the reasonable compromise which wins in most cases.
In the hybrid approach, different architectures are used depending on the number of participants in the call. Initially, P2P (Peer-to-Peer) is used for 1-1 calls, and as more participants join the call, the architecture switches to SFU (Selective Forwarding Unit) to accommodate the growing number of participants.
This approach helps optimise server resources, particularly during smaller 1-1 calls, where no intermediate media servers are required. Using P2P for smaller calls helps to save bandwidth and processing power. As soon as a third participant joins the call, the system transitions to SFU to handle the increased load efficiently.
This hybrid approach can be visualised through a WebRTC architecture diagram, which clearly shows the transition from P2P to SFU as the participant count grows. The diagram illustrates how the architecture evolves from direct peer connections in P2P to the use of an SFU media server, forwarding streams to multiple participants.
Advantages:
Disadvantages:
In this article, we have explored the different architecture options that drive WebRTC technology and enable seamless video conferencing experiences. Now, let's take a closer look at how Digital Samba leverages WebRTC on the back end to provide a cutting-edge live video conferencing solution.
Digital Samba is a leading provider of GDPR-compliant video conferencing API and SDK, offering a comprehensive platform for embedding video conferencing capabilities into software products or websites. Our solution is powered by WebRTC, an open-source project that facilitates peer-to-peer interactive web applications.
By integrating Digital Samba's video conferencing API and SDK into your platform, you can unlock the power of WebRTC and provide your users with high-quality, real-time video communication. Our solution is designed to be GDPR-compliant, ensuring the privacy and security of user data. With our EU-hosted infrastructure and end-to-end encryption, you can trust that sensitive information shared during video conferences is protected.
Whether you're building a remote collaboration tool, an online tutoring platform, or a virtual classroom, Digital Samba's video conferencing solution enables seamless communication and collaboration among participants. The WebRTC architecture allows for direct peer-to-peer connections, reducing latency and ensuring a smooth video conferencing experience.
With Digital Samba, you can leverage the advantages of both P2P and SFU architectures. For 1-1 calls, our solution utilises P2P, optimizing server resources and maximising efficiency. As the number of participants increases, the architecture seamlessly transitions to SFU, leveraging the scalability and bandwidth efficiency it offers. This hybrid approach ensures optimal performance and cost-effectiveness for your video conferencing solution.
Digital Samba's WebRTC-powered live video conferencing also supports advanced features such as screen sharing, file sharing, interactive whiteboarding, and more. These features enhance collaboration and enable interactive learning experiences for virtual classrooms, remote training sessions, and online meetings.
Experience the power of Digital Samba's WebRTC-powered live video conferencing solution. Contact our sales team today to learn more and get started on enhancing your video conferencing platform.
P2P (Peer-to-Peer) directly connects participants without using a central server, which is suitable for small calls but can strain bandwidth and processing power as the number of participants grows. SFU (Selective Forwarding Unit) routes media streams through a server, optimising bandwidth and scalability, making it more suitable for larger calls.
SFU improves scalability by reducing the amount of data participants need to send. Each participant only sends their media stream to the SFU server once, and the server forwards the stream to all other participants, reducing upload bandwidth and CPU usage on the client side.
MCU (Multipoint Control Unit) is used when there’s a need to mix all participants' streams into a single stream for each participant. This can be useful when you need a consistent layout and minimal client-side processing, but it’s less scalable and more resource-intensive than SFU.
WebRTC enables real-time communication directly between browsers without the need for plugins. It’s cost-effective, supports high-quality video and audio, and is highly secure with end-to-end encryption. Additionally, WebRTC is widely supported across platforms and browsers.
While WebRTC can handle large conferences, it becomes less efficient with more participants due to the increased strain on network bandwidth and processing power. SFU and MCU architectures help manage larger calls more efficiently, but the scalability depends on the infrastructure used.