Facebooklive is Facebook's live video product, which is open to celebrities when it starts testing, and allows them to interact with fans via live streaming and fan interaction; when opened to the public, the number of concurrent visits is enormous, and a popular live broadcast may be viewed online by millions of people. It is a very difficult architecture problem to deal with so many concurrent accesses and to ensure low video latency.
Federico Laramie, from the Facebook traffic team, shared the cache of Facebook live and the design of the load balancing system at the Networking@scale meeting:
Facebook Live was launched in April 2015 and only opened to celebrities, and after a year of product evolution, the technical team expanded from 10 to 150 people. At the end of 2015, it was open to users in multiple countries.
Protocol Selection
At the protocol level, they started with HLS (HTTP Live streaming), the iphone has good support for the protocol, and because it is HTTP-based, it can leverage the existing CDN architecture. At the same time, they are also investigating the RTMP protocol (TCP-based), which creates a video stream and an audio stream between the phone and the server. The benefit of rtmp is that the latency from the publisher to the audience is small, the latency is significant for the user experience in the live stream that supports the interaction, but the disadvantage is that the protocol is not HTTP, so you need to develop the proxy service for RTMP to facilitate expansion. There is also the Mpeg-dash protocol, which is much more space-saving relative to HLS.
Technical challenges of live streaming
Live traffic is unique compared to other Internet products, as shown in the following figure:
When the live stream starts, the concurrent traffic has a steep rise, and then continues to climb until the end of the broadcast and the traffic goes straight down. That means the traffic goes up very fast. In addition, a lot of product factors cause the live view is much larger than the average video, generally more than 3 times times.
An instant increase in Access can cause a lot of problems for the cache system and the load balancing system. Issues with caching
I believe everyone has heard of thundering herd problem, when the cache fails, a large number of access means a large number of back-source requests, resulting in a large number of requests at the data source accumulation.
The video is segmented into a second fragment file for transmission to the user, and the cache system caches the fragments, but when the traffic is too large, the cache is overloaded. Problem with load balancing
Facebook has a lot of pop servers all over the world (edge servers) to distribute Facebook traffic around the world. When the traffic is too large, the POP server may be overloaded.
Overall architecture
Live streaming from the host to the audience process is this: the host on the phone to launch a live mobile phone to the live server to establish an rtmp audio video stream streaming server decoding this video, and converted to a number of different bitrate for each bitrate, continuous production of a second mpegdash fragment These 1-second fragments are stored in the data center cache in the data center cache, and the cached user who sends the fragment to multiple pop servers sees the live message and starts to watch the user's phone start to get 1 seconds of clip playback from the POP service.
How to expand.
In the face of such large concurrent traffic, multiple levels of caching and load balancing must be introduced. There is a traffic amplification (multiplication) between the data center cache and multiple pop caches, and on each pop node there is another layer of traffic amplification: The pop contains two levels, one HTTP proxy and one layer of cache. The audience gets the fragment from the HTTP proxy, and the agent layer checks to see if the fragment is in the cache of the pop, if so, and if not, sends the request to the datacenter cache. Because different fragments may be stored in different caches, this helps to balance the load at the cache level.
If many viewers request the same fragment at the same time, and this fragment is not in the pop cache, it causes a lot of requests (one per user request) to be sent to the data center cache, causing thundering herd problems. The solution is to introduce request coalescing, which is to send only the first request to the datacenter, and the other requests to wait in the pop queue for the response of the first request, and then the data is returned to all requestors.
Even so, all viewers waiting in a pop cache service will also cause this service to overload. So at the proxy level, a local cache is added, and the technique of request merging is used, so that the proxy itself blocks a large number of requests directly.
Even with the above measures, pop has the risk of overloading. Due to the high speed of live streaming, the load of pop is too late to notify the load balancer system, it is possible to overload. This can only rely on the global load Balancer system cartographer.
Cartographer This system maps the global subnet to pop, which measures the delay between each subnet and the pop, and also measures the load of each pop (the requested counter on each agent node, and the sum of the counters for all agent nodes in a pop is the load of this pop). With these two data, cartographer is faced with an optimization problem: minimizing the delay of a user request when it is guaranteed that the pop does not exceed the load.
Because of the rapid rise in live traffic, this control system measures and reaction time is also very fast, for this reason, they reduced the cartographer measurement time window from 1.5 minutes to 3 seconds, but, 3 seconds of Time POPs still have an overload threat. What to do. It can only be predicted before traffic arrives.
They implemented a flow prediction program with three splines, fitting the trend of future traffic to the previous load and current load. In this way, even if current traffic is on the rise, it is also possible to predict the drop in future traffic: the first and second derivative of the curve, if the rate is positive, but the acceleration is negative, it means that the rise rate is decreasing, eventually becomes 0, this is the start of the load drop.
The experimental results show that the three spline ratio linear interpolation can predict more complicated flow conditions. And the tolerance for jitter is even higher.
How to test.
The previous talk about how to prevent pop overload, but the purpose of the test is how to overload the pop.
They developed a stress testing service that is deployed on all POPs worldwide, simulating real user traffic and producing up to 10 times times the flow pressure of the production environment. This test system helps the team to identify and resolve the bug of the traffic forecast program, and also verifies the endurance of several cache levels for thundering herd.
Upload stability issues
Uploading stability is also a tricky issue. Because the user's upload bandwidth is limited, the audio needs 64Kbps of bandwidth, while the standard resolution video requires 500Kbps.
The solution is to adjust the encoding adaptively, and the mobile app dynamically adjusts the bitrate of the video based on the current available bandwidth, while the current bandwidth estimate is weighted by the upload rate on the rtmp connection over the past multiple periods.