So you know how a livestream works, right? You click on the stream and you’re taken to a live video feed being shot in real time, able to chat with others and even interact with the host themselves. But, what goes on under the hood to actually accomplish that? Well, a lot. So let’s take a look.
A livestream is really just a connection from the streamer’s computer to an “ingest” server, such as Twitch, YouTube, or Mixer, that are capable of receiving that amount of data. From that server, you’re now sent the video in a standard media streaming (different definition) format, like a normal youtube video. The only difference is that the file it’s reading from to give to you is being created as it sends it.
The interesting stuff happens between streamer and ingest, and we have an entire network protocol for it: enter RTMP.
The Real-Time Messaging Protocol, or RTMP for short, was originally developed by Macromedia for streaming audio and video content over the internet. The original use case was between flash players and servers (Macromedia is now owned by Adobe), but now has a more broad, general-purpose use, though it’s primarily used for streaming.
RTMP divides an audio or video stream into “fragments” or “chunks” and then sends them over. These chunks are usually small enough that they can be processed quickly and without much delay. RTMP can also multiplex multiple “channels” over one connection, allowing for you to send audio, video, and have a semarate channel for control messages between client and server to negotiate things like chunk sizes.
There’s nothing fancy, no real techno-wizardry going on, the secret sauce is just.. send it smaller. Despite being created for, as of this point, a dead product, it still lives on today. Now, there is an alternative, that is the FTL (Faster Than Light) streaming protocol, proprietary to Mixer (Microsoft), but without the specs for it being released (at least in a format that I can find), I’ll leave that one as just a footnote for now.
All I can say about FTL is that it’s aptly-named for its purpose: FTL is focused on interactivity. Whereas RTMP can take anywhere from 5 to 30 seconds for everything to be broken, transmitted, and reassembled, FTL works within just a few. The major downside (besides not being public) is that it’s extremely sensitive to connection instability, and you can get huge drops if there’s a little bit of jitter.
This is speculation, but I have a feeling that it’s doing a fair amount of on-the-fly compression (only sending parts that it needs to) and some predictions in the video stream to guess what’s going to come next, and if your connection sags a little it suddenly has to throw everything out and pick back up again, which causes delays.