Keywords: Ott streaming media HTTP adaptive streaming
This article has been published in World broadband network 2011.6, 18th, 5th, and 200
HTTP adaptive streaming (hereinafter referred to as "has") technology combines the characteristics of traditional streaming media technology and HTTP progressive download playback, and transmits media content to users through HTTP, this technology can greatly improve the user's media playback experience and reduce the technical complexity of the header server. HTTP-based transmission improves the penetration capability of media content on network devices. This technology has become a development trend in the streaming media and video industry.
I. Traditional Streaming Media Technology
In recent years, Internet video has developed rapidly, and video content traffic has accounted for half of Internet traffic. When talking about Internet video, we have to mention the streaming media technology. It is the constant development of the streaming media technology that has promoted the rapid development of Internet video.
Traditional media content delivery technologies mainly fall into two categories: connection-oriented streaming media technologies represented by RTSP/RTP (Real Time Streaming Protocol/Real Time Transfer Protocol, another type is the connectionless HTTP progressive download of mainstream video websites.
1. RTSP/RTP Streaming Media Solution
RTSP is a traditional streaming media control protocol. Its Status indicates that the server listens to the status of the client from the client to the server until the connection is interrupted. The client uses the RTSP protocol to send control commands to the server, such as playing, pausing, or interrupting the server.
RTP/RTCP (Real Time Transfer Control Protocol) is an end-to-end multicast-based application layer protocol. Among them, RTP is used for data transmission, and RTCP is used for statistics, management, and control of RTP transmission. The two work collaboratively, which can significantly improve the efficiency of real-time network data transmission.
This architecture is based on the streaming media technology solution. After a connection is established between the server and the client, the server continues to send media packets, and media packets are encapsulated using RTP, the client control information is transmitted through the RTSP information package in UDP or TCP mode.
In addition, similar streaming media protocols include Adobe's rtmp (real time messaging protocol) and real's RTSP over RDT (Real Data Transport Protocol ), this article will not introduce these streaming media protocols one by one.
2. Http progressive download
Compared with stateful RTSP/RTP, HTTP incremental download adopts stateless HTTP protocol. When an HTTP client requests data from the front end, the server sends the requested data to the client, but the server does not record the status of the client. Each HTTP request is a one-time independent session.
The progressive download feature is currently supported by mainstream terminal players, such as Adobe Flash, Microsoft Silverlight, and Windows Media Player. The so-called progressive download means that the terminal player can play the media before the entire media file is downloaded. If both the client and server support http1.1, the terminal can also select a time point from the part that has never been downloaded to start playing.
Currently, mainstream video websites use HTTP progressive download to distribute streaming media, such as Youku and Tudou.
3. Comparison of solutions
As the simplest and original Streaming Media Solution, HTTP incremental download is particularly advantageous in that it only needs to maintain a standard Web server, its installation and maintenance workload and complexity are much simpler and easier than dedicated streaming media servers.
However, its shortcomings and shortcomings are also obvious. First, bandwidth is easy to waste. When a user chooses to stop watching a content after downloading it, the downloaded content is a waste of Bandwidth Resources. Secondly, HTTP-based incremental download is only applicable to on-demand content, and does not support live content (Can live broadcasting be supported?). Finally, this method lacks flexible session control and intelligent traffic adjustment mechanisms.
The RTSP/RTP-based streaming media system is specially designed for large-scale live streaming media and on-demand streaming media applications. It must be supported by a dedicated streaming media server. It has the following advantages over HTTP progressive download.
- Real-time streaming media playback. Unlike the progressive download client, which needs to buffer a certain amount of media data before playing, the RTSP/RTP-based streaming media client can start playing when it receives the first frame of media data.
- Supports advanced VCR control functions such as progress bar search, fast forward, and fast forward.
- Smooth and smooth audio/video playback experience. During RTSP-based streaming media sessions, the client and server always maintain Session connections, and the server can dynamically respond to feedback from the client. When the available bandwidth is insufficient due to network congestion or other reasons, the server can intelligently adjust the sending rate by appropriately reducing the frame rate.
- Supports large-scale user scaling. The common web server is mainly optimized for downloading a large number of small HTML files, and lacks performance advantages in transferring large media files. Professional streaming media servers are optimized for reading large media files on hard disks, memory buffering, and network transmission to support large-scale user access.
- Content copyright protection. In progressive download mode, the downloaded file is cached in the temporary directory of the client's hard disk. You can copy it to another location for playback. In the RTSP/RTP-based streaming media system, the client only maintains a small Decoding Buffer in the memory, and the media data after playback is cleared at any time, which is difficult for users to intercept and copy. In addition, you can use DRM and other copyright protection systems for encryption.
Despite this, the RTSP/RTP-based streaming media system still encountered many problems in actual application deployment, mainly reflected in:
- Compared with Web servers, the installation, configuration, and maintenance of streaming media servers are complex, especially for carriers that already have CDN (Content Delivery Network) infrastructure, reinstallation and configuration support for RTSP/RTP streaming media servers requires a lot of work;
- The logic Implementation of the RTSP/RTP protocol stack is more complex. Compared with HTTP, it is more difficult to implement the client software and hardware that supports RTSP/RTP, especially for Embedded terminals;
- The network port number (554) used by the RTSP protocol may be blocked by firewalls and Nat in some users' networks, resulting in unavailability. Although some streaming media servers can configure RTSP to host the http port 80 through tunnel, the actual deployment is not very convenient.
Ii. Http Bit Rate Adaptation
In the previous section, we talked about the RTSP/RTP-based streaming media technology and HTTP-based incremental download. However, we can clearly see that both solutions have their own shortcomings.
At this time, the has technology came into being. It integrates the traditional RTSP/RTP streaming media technology and the advantages of HTTP-based progressive download technology, and features high efficiency, scalability and compatibility. Figure 2 shows the implementation principle of the has technology.
Has is a hybrid media delivery method that provides a stream-based experience for users. However, as with HTTP incremental download, has implements content download and distribution over HTTP, however, these media contents are cut into a series of media parts for transmission.
The key to has technology is the cutting of media data blocks. Each segment has the same length of time, generally 2 ~ 10 seconds. In the video encoding layer, this means that each part is composed of several complete video GOP (each part has a key I frame ), this ensures that each part is unrelated to the previous and future media parts.
As shown in figure 3, media blocks are stored in HTTP web servers. The client requests media blocks from the Web server in a linear manner and downloads media blocks in the traditional http mode, when a media block is downloaded to the client, the client plays this series of media blocks in sequence. Because these media blocks are encoded according to the agreed rules, there is no overlapping or discontinuous content between each media block. for users, you can see a seamless and smooth playback effect.
If a piece of content provides multiple bit rates for encoding output, the content slicing module will cut it into multiple bit rates of media blocks. Because the Web server uses network bandwidth to download content as much as possible, and there is no traffic control mechanism, the client can easily detect the available network bandwidth from the Web server to the client, this allows you to download larger or smaller media blocks to achieve adaptive bit rate.
As shown in figure 4, the key technologies of has are mainly composed of two parts: Content preparation, including multi-screen transcoding platforms and media segmentation and slicing modules, second, content delivery includes HTTP-based content source servers and terminal-oriented content delivery networks to enlarge the number of concurrent streams.
Iii. Analysis of has technical features
1. advantages of using has
Like other HTTP-based media transmission methods, has the following advantages over traditional streaming media distribution technologies:
- Web servers are easier to deploy, because has technology uses the common HTTP protocol, and traditional network devices such as HTTP Cache/proxy and firewall can be perfectly compatible;
- It provides better compatibility and delivery rate. The bit rate can be dynamically adjusted based on the bandwidth of the last access network to achieve content distribution;
- The user experience is better, and the service provider does not need to consider watching the user's bandwidth.
In addition to the above advantages, has also has features that are not available in any previous technology, as follows:
- Users can wait for a shorter period of time to quickly play the video. The low bit rate is selected for client initialization by default. After playing the video, the client switches to the high bit rate gradually. Therefore, its service quality is constantly adjusted and optimized within the available bandwidth range;
- No large cache, uninterrupted playback, and smooth video playback experience without jitter;
- Seamless rate Switching Based on Network Conditions and CPU decoding capabilities;
- The client does not need to download content that exceeds its actual consumption.
To sum up, compared with traditional streaming media technology, it can provide better service quality because it can use the entire available bandwidth, the non-adaptive stream technology forces the client to select a fixed bit rate lower than the available bandwidth. It is foreseeable that the has technology will be widely deployed and applied in the near future.
2. problems to be faced
In the previous section, we talked about the advantages and technical implementation principles of has. It seems that the implementation of has is very simple. First, we provide media files with multiple bit rates in content preparation and an index file, the relationship and characteristics of Bit Rate files are recorded. Next, the terminal selects a bit rate media file for sequential playback based on the initial bandwidth. the bit rate is adjusted according to the network conditions and CPU load during the playback.
However, there are many issues that need to be clarified during the specific deployment and implementation of The has technical solution. If these problems are not well resolved, the best user experience cannot be provided.
- How many bitstreams are used?
- Code stream resolution?
- Key Frame interval?
- VBR or CBR?
- Audio parameter settings?
Iv. Has Enterprise Solutions and technical standards
At present, there are two main types of implementation methods of has technology: one is enterprise solutions, that is, they provide overall technical solutions, such as Apple live streaming technology, Microsoft smooth streaming technology, and Adobe dynamic streaming technology. The first is the technical standards developed by some international standards groups, for example, oipf's HTTP adaptive streaming, MPEG's Dash (dynamic adaptive streaming over HTTP), and IETF's draft (proposed by Apple ).
1. oipf
Open IPTV Forum defines the bit rate adaptive technology in its definition of oipf technical specifications, and standardizes the theory of how to implement HTTP bit rate adaptive, clarified how to use and the scope of use. Based on the 3GPP adaptive HTTP streaming technical specifications, the standard is extended to support the MEPG-2 ts format.
The index file downloaded by the terminal is defined in the oipf bit rate adaptive standard. In oipf, the index file is named MPD (media present description) and organized in XML format.
Meanwhile, the oipf Standard specifies that the media encapsulation formats are TS and MP4, and some details of the fragments are defined. For example, files at different bit rates of the same content must use the same media Encapsulation Format, but the encoded profile can be different.
This standard defines live video application scenarios and fast forward, fast return, and positioning operations.
2. MPEG
Recently, the MPEG standard has released a standard dash for HTTP streaming, as shown in Figure 5.
The dash standard summarizes the existing has technical framework, and introduces the background, purpose, and application scenarios. This standard defines a series of application scenarios, such as 3D video, interactive 3D, dynamic bit rate adaptive, peer-2-peer, and multi-screen TV, it also defines how to integrate with content protection technology.
Dash standards are designed to solve the following problems:
- More effectively distributes MPEG media content in an adaptive, progressive, download, or stream manner through the HTTP protocol;
- Supports live video services;
- More effective use of traditional HTTP-based CDN network, proxy server, firewall, and other basic network components;
- Supports integration with the content protection system to protect content.
In general, dash puts forward a series of technical requirements for each aspect involved in transmitting MPEG media over HTTP, it includes media content formats, transmission methods, MPD files, business control, adaptability, and media protection.
3. Apple HTTP live streaming (IETF)
HTTP live streaming is an overall has solution of Apple. The goal of this solution is to push live or on-demand content to Apple's terminal devices through common web servers, such as iPhone, iPad, and Apple desktop. Apple's technical specifications have been submitted to the IETF organization for discussion and are still in the standard draft phase.
HTTP live streaming consists of three parts: Server Components, distribution components, and clients. First, the encoder receives audio and video input and uses H. 264 encoding technology, output MPEG-2 TS stream, and then use the Slicing Software to cut the TS code stream at a set interval and save it as a TS file. These ts files are deployed on the Web server, and the Slicing Software also creates an index file containing the information related to these ts files. The URL of the index file is published on the Web server. The client reads the index file and then requests the media files from the server in order to display them without stopping. A simple HTTP
Live media stream configuration 6 is shown in.
In Apple's dynamic bit rate adaptive system, index files are saved.M3u8File, which is an extension of the. m3u format used to save the MP3 playlist. HTTP live streaming supports real-time broadcast sessions and on-demand video sessions.
For real-time sessions, when a new media file is created, the index file is also updated, and the old index file is usually deleted. The updated index file displays a moving window in the continuous stream. This type of session is suitable for Continuous live content. For VOD sessions, media files remain unchanged throughout the session period. The index file is static and only needs to be obtained once before the media starts playing. It contains a complete list of all media files.
Currently, HTTP live streaming does not fully support DRM (?), But it supports content encryption, through the 16-bit key AES-128 encryption algorithm to encrypt the content, HTTP live streaming only roughly defines how to get the key through Uri.
4. Microsoft smooth streaming
Smooth streaming is an has solution provided by Microsoft. It is based on Microsoft's header Web Service IIS 7 and its terminal's Silverlight technology. Microsoft's smooth streaming selects the MPEG-4 format as the media Encapsulation Format, smooth streaming encapsulates each slice into a MPEG-4 fragment, but is stored as a complete and continuous MP4 file, in fact, the media only performs virtual fragments. When the playback URL request of the terminal comes up, the header server needs to accurately analyze the URL request and convert it to a precise deviation, then find the corresponding media data block and distribute it to the terminal.
MP4 is used as the media file format because MP4 is easier to use than ASF. and MP4 is a widely used ISO base media file format specification, the most important thing is that MP4 was designed to support the loading of media content in a single file. The storage media formats and transmission media formats of Microsoft smooth streaming are shown in figure 7 and Figure 8 respectively.
As shown in figure 8, we can clearly see that smooth streaming uses a virtual Slicing Technology. Microsoft's HTTP bit rate adaptive technology does not actually slice media files, the content corresponding to each bit rate is stored as a full-length file. During the actual playback process, each fragment is independently distributed to the terminal according to the request of the terminal.
The smooth streaming terminal is implemented based on Silverlight. Silverlight can parse the MPEG-4 file format, download HTTP, and switch the bit rate. At the same time, Microsoft provides these functions to developers in the form of. Net code. developers can optimize and adjust the player effect. The most complex module in player development is the bit rate Switching Module. It is also a technical difficulty to determine when to switch and how to switch between the core functions of this module. If you want to give users the best experience, you must consider the following issues.
- What should I do if the user has enough bandwidth but the CPU decoding capability is insufficient?
- What should I do when the video is paused or hidden by the user?
- What should I do when the resolution of the optimal video quality exceeds the resolution of the screen itself?
- How large is the buffer window for downloading and broadcasting?
- How can we ensure seamless switching when new media content needs to be inserted during playback, such as inserting an advertisement?
5. Adobe HTTP dynamic streaming
The combination of Adobe's traditional Streaming Media Solution rtmp + FLV has been widely used in the internet video industry. In response to the demand for dynamic bitrate adaptation, Adobe first realized bitrate adaptation in its traditional solution, however, shortly afterwards, Adobe also launched an HTTP-based bitrate adaptive solution, HTTP dynamic streaming, as shown in figure 9.
Adobe HTTP dynamic streaming contains multiple parts to prepare the content, and transmits the content to the Flash Player of the terminal through HTTP.
The content preparation module consists of VOD-oriented and live streaming-oriented modules. The VOD packaging module partitions media files and stores them in f4f format, the live video packaging module writes live streams to f4v files in real time.
The HTTP source module is a standard web server that stores the f4f file and the index file in the f4m format corresponding to the media. The index file contains parameters such as encoding, resolution, and bit rate.
6. Analysis and Summary
By studying the technical specifications proposed by various standards groups and enterprise technical solutions of companies such as Microsoft and Apple, we can see that the implementation principle of HTTP-based Bit Rate Adaptation is similar, the main difference is that the formats of media files and index files are different, as shown in the table. Here we have discussed many advantages of has, but the current technical system still has many aspects to be improved.
First, the above technical system is based on the client-driven model, relying on the client to determine the network conditions and the capabilities of its own hardware platform, by parsing the index description file, finally, the origin server obtains the content in the form of active pull. In live broadcast applications, the terminal constantly updates the description file to obtain information about new content. If the server-driven mode is used, you do not need to update the description file frequently. The server continuously obtains the latest content, and continuously send media data to the terminal in the form of "push", which is more suitable for live broadcast applications with high real-time requirements.
Secondly, the above has technical system lacks quality monitoring and control mechanisms. For example, if a user switches the channel when watching the live video channel, the GOP at the current time and the initialization information required by the player need to be transmitted to the terminal for playing as soon as possible, however, the has system does not support accelerated transmission of important HTTP packets.
V. Looking forward to the future
With the rapid development of Internet technology, the use of Internet transmission channels to provide video services has become a trend. Traditional radio and television operators, emerging video websites, Internet giants, color TV manufacturers, and consumer electronics equipment vendors are pouring in. All participants are actively building their new media service platforms to distribute content to TV screens, PC screens, and mobile phone screens in the form of Ott.
The emergence of has technology provides an excellent solution for the construction of a new multi-terminal media service platform. It is foreseeable that has technology will have a very broad space for development, it plays a key role in the process of triple play. From: http://hi.baidu.com/%CE%DE%D0%C4%CF%F2%BA%F3/blog/item/27bb00568f09943e3b29355b.html