Structure and principle of streaming media client

Source: Internet
Author: User

Streaming media is a technology that plays multimedia online on the network. Because of its downloading and playing features, it shortens the user's initial waiting delay, but its data also needs to be streamed, with high real-time requirements, therefore, it is more complicated than local playback. The client is a basic component of the streaming media system. It is generally a player with network communication functions. The most famous players with streaming media functions include RealPlayer and Windows Media Player. Here we take a client player developed on Windows as an example to introduce the system structure and working principle of the streaming media client.

I. System Structure

Depending on the platform, there are also a variety of client forms. In addition to PCs, it can also run on a set-top box or wireless portable device. However, the client's workflow is similar, that is, various types of media data streams transmitted from the stream server are received from the network and stored in a buffer queue, call various decoder to reconstruct the original data format for each frame of data, and play the data on the device after synchronization.

From the functional level, the player main module can be divided into four layers: RTSP session control layer, RTP data transmission layer, decoding layer, and display playback layer (1 ). The communication between the player and the server is mainly implemented by the RTSP protocol at the application layer and the RTP protocol (Real-Time Transport Protocol) at the transport layer.

The RTSP session control layer is completed by the player's main thread and is responsible for transmitting and receiving RTSP-related control commands. The RTP data transmission layer and the decoding layer are respectively completed by the receiving and decoding threads generated by the master thread, the corresponding video data and audio data of the receiving and decoding threads are independently processed by two different threads for data receiving and decoding tasks. The display layer also provides two independent playback tasks: video and audio.

For information interaction between different layers, the RTSP session control layer first initiates a request to the Streaming Media Server and establishes a connection, then, the RTP data transmission layer is responsible for preprocessing the real-time video and audio data transmitted over the network. It mainly collects statistics on relevant data and sorts the data in the Buffer Queue according to the RTP Header. According to the time stamp of the RTP data packet header, it is sent to the decoding layer for decoding on time. The decoding thread selects a matching decoder to decode and finally completes the final playback at the display playback layer.

Ii. Working Principle

1. RTSP session connection

RTSP [2] is a real-time stream control protocol based on TCP. Through this protocol, you can establish a session control connection between the server and the client, and provide remote control functions for Multimedia Streams, such as playing, pausing, jumping, and stopping. Therefore, the client should first connect to the RTSP port of the server. After an RTSP connection is established, the client sends the Describe Method to the server, which contains the URL of the VOD file. If an authentication step exists, the server returns an error code. Then, the client will include the username and password entered by the user into the RTSP package and send the describe message again. After receiving the message, the server sends the media description file SDP (compliant with rfc2327) to the client player. The client reads the SDP description file to configure the audio and video decoding synchronization information, such as the file name, network type, RTP data transmission channel port number, encoding type, and sampling rate. After the audio and video information is configured, the client sends the setup Method to the server and configures the relevant transmission network protocol, transmission mode, port, and other information. After the receiving and decoding thread is created, the client sends the play method to notify the server to send audio and video data to the local RTP receiving port. After the session ends, the client sends teardown to the server to disconnect. In addition, during a session, the client can change the parameters of the play command and the pause command to implement VCR functions such as pause and skip. The test, resend, and echo commands in Figure 2 are the RTSP commands we added to the smart stream service.

2. RTP data processing before Decoding

RTP [3] transmission is generally based on UDP protocol with high transmission efficiency and low data reliability. It is a transmission protocol for real-time data. An RTP Header is added before a UDP packet, which contains information that can ensure the real-time continuity of the stream data, such as the serial number and timestamp. The serial number can ensure the continuity of the RTP packets arriving at the client, and the time stamp can synchronize the audio and video packets.

In the setup package of RTSP, the client notifies the server of the local RTP receiving port. Therefore, when creating a receiving thread, create a local UDP socket port and bind it. Then, it cyclically waits for receiving RTP audio and video data packets from the server, and inserts the received data into a buffer queue in sequence by serial number. The initial buffer length can be set by the user. The new data packet is inserted to the correct position in the queue according to its serial number.

Once the buffer reaches the initial threshold, the client starts the decoding thread and starts to read the data of the buffered header node cyclically. Each time the client sends data with the same timestamp in the read buffer to the decoder as a whole. Because one frame of video data is split into several RTP data packets with the same timestamp, and the audio is not processed in this way, the timestamp of each RTP packet is different. Therefore, each time a video is sent to the decoder, It is the data of a frame of the video or an RTP packet unit of the audio, as shown in figure 3.

From receiving to decoding, audio and video data are processed in independent threads, So synchronization may be lost due to network or terminal environment factors.

3. decoded Data Processing

Each time the decoder decodes a video or an audio package (collectively referred to as a data unit), the decoded data may not need to be played immediately. To ensure security, from decoding a frame to displaying the frame, a buffer storage process can be performed in the middle.

You can design a cache that contains a Fixed Array of lengths (video is 16 and audio is 32) to store decoded data content, playback time information, and current filling status. Each decoded data unit is stored in the cache, and then the corresponding data unit is retrieved from the cache at the playback time. Each time a data unit is taken out, a new data unit is filled in with the space reserved for the retrieved data. In this way, the cache space of the fixed length can be used cyclically.

The decoded data of each frame of the video is filled into the same Array unit. for audio, the data of each RTP packet unit is decoded and then filled into an array unit. At the same time, two indexes are created, one for filling in data and the other for retrieving data.

Take a video as an example. In the initial stage, 16 frames of data are decoded consecutively, and the buffer array is filled. 4 (a) indicates that 1 indicates that existing data is filled in, and 0 indicates that the data has been taken out.

After 15th sets of data are filled in, the index points to the 0th arrays again. Then, the player continues to unbind the next frame. However, there is already data in the 0th group, so you cannot enter the data in the 0th Group. At this time, you can enter the value to wait. At this time, the value index also points to 0th sets of data at the beginning. When the current time is equal to the playback time of 0th sets, the decoded data of 0th sets is retrieved and played, and the value index is moved to the first group, at this time, no data exists in the 0th groups.

After the 0th sets of data are played, the decoding thread will be awakened again, And the next data frame that has been parsed will be filled in the 0th sets. The value-filling index will also be moved to the 1st sets. Then, the player continues to extract the next frame, but the data in the first group has not been taken out and displayed. Therefore, new data cannot be filled in, and the decoding thread starts to wait, as shown in. This loop completes decoding to display.

For audio, the difference is that each playback extracts audio data of a fixed length or number of sampling points from the cache.

4. Audio and Video Synchronization

As mentioned above, audio and video data decoded to the cache may not be synchronized due to non-relevance, which will damage the quality of service during playback, therefore, audio and video must be re-synchronized when cache data is retrieved before playback. The synchronization mechanism uses a timing cycle based on the system clock. As audio requires more strict uniformity of the playback rate, audio playback continuously extracts data for playback at a certain rate based on its frame rate. The video is determined whether to play the video based on the system clock updated by the timer. That is to say, when the system clock exceeds the playing time of the next video frame, the video will be played. The system clock is updated based on audio. If the video is out of sync, for example, if it lags far behind the current system clock, frame-jumping will be selected to catch up with the timer time as soon as possible. If it exceeds the current system clock too much, it will temporarily wait for the timer time to increase. Similarly, audio will be processed in case of exceptions. In this way, with the above mechanism, audio and video can always be synchronized according to a certain benchmark, and the impact of external changes on synchronization can be resisted.

5. Audio and Video Playback

You can call the DirectShow interface to play audio and video media. DirectDraw and directsound are used to play audio and video data through hardware devices of the driver system. DirectShow technology is widely used in audio and video collection, video chat, on-demand video, video superposition, and media playback. When the program starts, you must first initialize some audio and video playing configuration information. If the video is decoded, if it reaches the playing duration of a certain frame, the data content will be displayed as a parameter after synchronous detection. The audio starts a playing thread after initialization. There is a loop in this thread, which constantly reads the audio data in the cache and then plays the audio.

Iii. Conclusion

Streaming Media Technology is the most promising and widely used technology for multimedia data transmission over the Internet. As an important component, the performance of client players directly affects the service quality of users. In the client, the processing of audio and video data, from receiving to decoding, is in an independent thread, and then uses the data timestamp for Synchronous protection. Client Communication and transmission must comply with RTP and RTSP, which is an important standard for supporting streaming media playback.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.