One of the crtmpserver series: Streaming Media Overview

Source: Internet
Author: User
Overview

The so-called streaming media literally refers to the media like a stream, and it looks like nonsense. Streaming media is common nowadays, so most people may not have doubts. In fact, when streaming media has not yet appeared, playing movies over the Internet is not realistic. When playing a movie through the network, you must first download the entire file to the computer before playing the video. Therefore, it usually takes a long time to buffer the video. This is also the reason why download tools such as Thunder started to become popular, in fact, most of them are used for movies.

The biggest feature of streaming media is the ability to play while downloading, without having to download all the files in advance. This greatly improves the user experience and real-time performance, this makes it possible to broadcast live.

So how can streaming media be played while downloading? We need to understand the composition of the streaming media system.

Composition of streaming media systems

A complete streaming media system consists of these parts: Signal Collection, encoding, transmission, decoding, and output.

Signal Collection: the streaming media system we call must be processed on a computer system. The most important elements in the system are audio and video, in terms of physics, audio is actually a mechanical wave formed by physical vibration. Audio Acquisition is to convert this physical wave into an electrical signal and then convert it into binary audio data, the raw audio data collected is PCM data. What is a video? The video is actually a continuous static image displayed sequentially. Therefore, the video is actually composed of a static image, video Acquisition is the process of continuously collecting these static images. These images are generally referred to as frames. So how have these static images, called frames, been collected? The actual image is not represented by a wave like a sound. We have learned imaging in junior high school physics, we can see the image because the light that shines on the object is reflected into our eyes and enters the retina, and the optic nerve is finally perceived by our brain. Therefore, image acquisition is the process of collecting and converting optical signals into binary image frames. Generally, the original image data format we obtain is YUV.

Encoding: What is encoding and why. Assume that our network capacity is infinite, and the transmission speed is infinite, which of course does not require encoding. In fact, the raw audio and video data volume we have collected is very large, so we need to find a way to minimize the size of the binary data of the collected raw audio and video, it facilitates transmission over the network. At the same time, it is necessary to restore (decode) as close as possible to the original audio and video (loss, encoding can also be divided into lossy and lossless ). We sometimes call encoding also called compression encoding. In fact, the concept of compression is similar to the principle of compressing files. For example, for example, a text file contains a string of 00000000000000000000000000000000000000000000000000 characters, which is actually 50 characters and 0 characters. We can use the simplest description to compress it, for example, if the file content becomes 50 '0' after compression, it indicates that there are 50 0 characters. During restoration, it can be directly filled with 50 0 characters. Will this save a lot of space? If it is not 50 0, but 10 thousand 0 characters, in this way, the compression ratio will be larger after compression, that is, the more redundant data, the higher the compression efficiency. Of course, the actual compression algorithm is certainly not that simple. The simple example is just to illustrate the principle. There are many video encoding algorithms, and they are much more complicated. The efficiency, compression ratio, and loss rate of each algorithm are different. The principles are the same. The most common concepts are intra-frame compression and inter-frame compression.

What is intra-frame compression? Assume that the background of an image is pure red and that of the person standing in front of the image (the scene of the photo ID ). During encoding, an image is divided into many small blocks (macro blocks), because many adjacent blocks in the background are pure red, many small pieces of pure red color can be inferred from the small blocks around them without individual encoding. This is intra-frame compression, which is performed within a frame, it has nothing to do with the image before and after it.

What is inter-frame compression? If a video contains two adjacent images, the background is pure red, and a ball in the background is a point in Image 1. The position in Image 2 is B. In fact, if we fold image 1 and Image 2, we will find that they have the same position except the ball, that is to say, a large part of the two adjacent images is the same. When coding 2nd secondary images, You can encode only the different parts of the previous image. If image 1 needs to be fully encoded, image 1 is called a key frame (generally called an I frame), and image 2 needs to refer to image 1 during restoration, therefore, it is called a Reference Frame (usually called a p frame ). If no key frame exists, the reference frame cannot be restored. Of course, during encoding, a frame can refer not only to the previous frame, but also to the next frame (compression between two-way prediction frames). For example, a ball is rolled from left to right, this kind of movement can be predicted. For the encoding of the current frame, refer to the adjacent image algorithm, which is the inter-frame compression algorithm.

Transmission: after collection and encoding, we have now obtained audio and video data frames. However, videos are not typically watched at the collection and encoding site, otherwise they do not need to be transmitted, the transmission process is to transmit the encoded audio and video data through the network (Internet, or cable TV network, we only discuss the Internet) to the audience who want to watch. Data is transmitted from one place to another. The most important thing in the transmission process is the Streaming Media Protocol. Why do we still need the streaming media protocol? Streaming media playback may have some playing logic, such as playing, pausing, redirecting, stopping, fast forward, and fast forward. In addition, after the encoded data is transmitted from one end to the other end, the other end needs to restore the encoded data to the original data before playing. How to restore? You must know which algorithm is used for the previous encoding. The corresponding decoding algorithm can be used for restoration during restoration. What algorithm is used for encoding? This also requires notifications from one end to the other. This information is also available in the streaming media transfer protocol. In addition, there will be other logic information, such as the video frame rate and the interval between key frames (GOP size. In summary, the encoded audio and video data is transmitted from one end to the other through the network. Some information is required at the other end to restore the data, and the logic of some playback scenarios must be supported, these must be described in the streaming media protocol. Currently, the most popular streaming media protocol is Adobe's rtmp, RTSP, and Microsoft's MMS.

Decoding: the compressed data must be restored to the original data before it can be played and displayed. This restoration process is the decoding process.

Output: the output process is the process of playing out the video. Similar to the acquisition process, the original audio and video data is converted into physical signals through Analog-to-analog conversion, and the video signal is displayed through the display, the audio signal is sent out through the speaker.

Media File Encapsulation

We have discussed the stream playback of media. In fact, this stream mainly refers to stream transmission, and video frames can be transmitted and decoded simultaneously. If you want to save the playback content to a disk, you must have a file format to organize the data and store the audio and video data in a certain structure. Why can't I directly write the content transmitted over the network to a file for storage? If I directly save all transmitted data without any structure organization, so how do I play the video during playback? How do you know which binary data is audio and which are videos, how do you know the boundaries of each frame of audio and video data in a file, and how do you know the encoding algorithm used to encode the audio and video content, how can I know how long the next frame of data is played after a frame of data is played? How can I organize multiple Subtitles like some movie files. Therefore, there must be a file format that organizes the audio and video data and attaches the required information to the file. This is the encapsulation of media files. There are many kinds of media files, some are patents of a company, some are international standards, such as MP4, MP3, Avi, rmvb and so on. Therefore, if you want to save the streaming media data to a file, you must store the audio and video data in a media file in a certain format through a certain file Encapsulation Format.

Transmission Protocol

Currently, the most popular streaming media transmission protocols are rtmp and RTSP. Microsoft's MMS is rarely used at work, especially the popular Internet live broadcast. There is another HTML5 that needs to be mentioned. Many of the playback methods that previously carried the rtmp protocol through Flash clients are replaced by HLS. However, live videos with high real-time requirements still need to pass the rtmp or RTSP protocol. Strictly speaking, HLS cannot be called the Streaming Media Protocol. HLS can basically be regarded as a playback list of TS files, without the Logic Functions in the streaming media protocol, such as playing, pausing, and stopping. It should be said that HLS is only available in specific circumstances, mainly for cross-platform browsers for live broadcasting. Most of the HLS we see are played through browsers on mobile devices and PCs.

It should be pointed out that, in the process of competing with Apple's HLS for the market, there are also many similar technologies, all using the HTTP protocol, which is generally called HTTP progressive download. For example, Microsoft's live smooth streaming Chinese name is called live streaming. This technology requires support from Microsoft iis7 and later web servers and Silverlight clients. Another open-source technology does not seem to be produced by a company. The Chinese name of HTTP pseudo-streaming is a pseudo stream. Currently, only the plug-in h264 streaming module for Apache based on the Apache server is displayed, and Flash Player is also used on the client. There is also a technical solution called HTTP dynamic streaming, which is supported by Flash Media Server and flash player on the client. In fact, the Flash Media Server now supports HLS. Apple's HLS does not mean that many servers and open-source code support HLS. As long as HTML5 browsers support HLS, HLS is now mainstream. The last thing we should talk about is a new technical standard. The MPEG-DASH aims to unify these technical schemes and is still being standardized. If it is really standardized, it may also replace HLS, after all, HLS has not been called a formal standard, but Apple has submitted a draft.

The development of the rtmp Protocol benefited from the extensive spread of Flash players, while the RTSP Protocol benefited from its open source protocol. The crtmpserver to be explained in this series is an open source Streaming Media Server Based on the rtmp protocol. The development language is C ++. The equivalent product is that the red5 language is Java and is also open-source. Another influential factor is that wowza is a closed-source product. In addition, the rtmpdump Project is a large number of open-source rtmp client projects, among which the librtmp library is widely used and written in C language. In addition, openrtmfp is based on P2P technology rtmp.


We look forward to the subsequent chapters of crtmpserver.

One of the crtmpserver series: Streaming Media Overview

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.