Use DirectShow to implement QQ's audio/video chat function

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Currently, popular instant messaging tools, such as MSN and QQ, all implement the video and audio functions. Through video and audio, we can better communicate with our friends through the network, this article uses DirectShow technology to simulate QQ to achieve video and audio acquisition, transmission, and basically implement the QQ video and audio chat function.

The main function of the network video/audio system is the collection of video/audio and network transmission. With the video capture API functions, you can easily capture videos. However, for network transmission of videos, it takes some time. For the transmission of video and audio data, it is not feasible to transmit audio and video data simply using a datagram socket. RTP must also be used on the UDP layer (real-time transmission protocol) and RTCP (real-time transmission control protocol) to improve service quality. The real-time transmission protocol provides end-to-end data transmission services with real-time features. Before audio and video data, we insert a RTP Header containing the load identifier, serial number, time stamp, and the same-step source identifier, and then transmit the RTP packet on the IP network using the datagram socket, this improves the continuous replay effect and audio/video synchronization. Real-time transmission control protocol RTCP is used for RTP control. The most basic function of RTCP is to use the sender report and receiver report to infer the Quality of Service of the network. If the congestion is serious, use low-rate encoding standards or reduce the bit rate of data transmission to reduce network load and provide better q.s assurance.

DirectShow provides a good interface for audio and video collection. Using the icapturegraphbuilder2 interface, you can easily create a graph for video capturing by enumerating audio device filters, you can also easily capture audio. What is a little troublesome is the transmission of audio and video data. We can encapsulate the RTP and RTCP protocols to implement a filter by ourselves, directShow also provides a set of filters that support the use of RTP protocol to send and receive audio and video data. You can also use the RTP filter provided by DirectShow to transmit data.

The following is an analysis of these RTP filters.

The newly defined filters include RTP source filter, RTP render filter, RTP Demux filter, RTP receive playload handler (Rph) filter, and RTP send payload (7D) filter, using these five filters to build a graph that transmits audio and video data through the RTP protocol is no problem.

The RTP source filter is used to receive RTP and RTCP packets from a separate RTP session. This filter provides an interface for receiving RTP sessions by specifying the RTCP receiver report sent to other hosts and specifying the network address and port interfaces.

RTP rend filter is a filter used to send data to the network. This filter also provides interfaces similar to RTP source filter.

RTP Demux filter is used to separate RTP packets from the RTP Source Filter. This filter has one or more output pins. This filter provides an interface for controlling Multi-Channel Separation and how to assign a specific output pin.

RTP Rph filter is used to restore RTP packets over the network to the original data format. It mainly supports H.261, H.263, Indeo, g.711, g.723, g.729, and a variety of common audio and video load types.

Compared with the functions of Rph filter, the RTP smoothing filter splits the output data of the audio and video compression filter into RTP packets. It provides interfaces that specify the maximum packet size and PT value.

Next we will look at how to use these filters to build the graph we collect and transmit.

Figure 1 and figure 2 show how to use the filters defined in DirectShow RTP. Figure 1 shows a filter graph that collects local multimedia data and uses the RTP protocol to send data over the network. It contains a video collection filter that outputs the original video frame, followed by a compression frame encoding filter. Once compressed, these frames will be sent to the RTP 7d filter, packed in parts, generated RTP packets, sent to the RTP render filter, and transmitted over the network. Figure 2 shows a filter graph used to receive RTP packets containing video streams and play videos. This graph is composed of a RTP used to receive packets.
Source Filter, a rtp Demux filter that classifies based on the source and load types, and a RTP Rph filter that converts RTP packets into compressed video frames. These filters are subsequently used to extract the decoding filter of frames, and a rendering filter that displays uncompressed frames.

With the help of RTP filter, we can complete functions similar to QQ to achieve interaction between video and audio on the network, the following graph shows the interaction between audio and video on two clients A and B on the network. Here, I encapsulate the RTP filter in Figure 1 and figure 2. the codec filter is directly encapsulated into the RTP Source Filter and RTP render filter, so that the graph is very concise, the RTP source filter is only used to receive audio and video data from the network and then transmit the data to the client program. The RTP render filter sends the collected audio and video data to another client on the network, codec is encapsulated into these two filters.

Figure 3 network video and audio interaction Graph

If you want to encapsulate your own source and render filters, you must first select your own codec. For video codec, choose h261, h263, or mepg4. for audio, choose g729 or g711, be sure first. With codec selected, encapsulation is easy.

Let's talk about it more. Let's take a look at the code I have provided.

First, define the CLSID of the four RTP filters used.

Static const guid clsid_fg729render = {0x3556f7d8, 0x5b5, 0x4015, {0xb9, 0x40, 0x65, 0xb8, 0x8, 0x94, 0xc8, 0xf9 }}; // audio transmission static const guid clsid_fg729source = {0x290bf11a, 0x93b4, 0x4662, {0xb1, 0xa3, 0xa, 0x53, 0x51, 0xeb, 0xe5, 0x8e }}; // audio receives static const guid clsid_fh263source = {0xa0431ccf, 0x75db, 0x463e, {0xb1, 0xcd, 0xe, 0x9d, 0xb6, 0x67, 0xba, 0x72 }}; // video receiving static const guid CLSI D_fh263render = {0x787969cf, 0xc1b6, 0x41c5, {0xba, 0xa8, 0x4e, 0xff, 0xa3, 0xdb, 0xe4, 0x1f }}; // The filterccomptr <ibasefilter> operator for sending and receiving audio and video data; ccomptr <ibasefilter> operator; ccomptr <ibasefilter> m_pvideortprender; ccomptr <ibasefilter> m_pvideortpsource; char szclienta [100]; int ivideoport = 9937; int iaudioport = 9938; // construct the graph of the video and send the data ccomptr <igraphbuilder> Audio; // video graphics manager ccomptr <compression> numeric; ccomptr <ibasefilter> numeric; ccomptr <ivideowindow> m_pvideowindow; ccomptr <imediacontrol> numeric; ccomptr <ibasefilter> numeric; hresult cmydialog:: videographinitandsend () {hresult hr; HR = m_pvideographbuilder.cocreateinstance (clsid_filtergraph); If (failed (HR) Return HR; HR = failure (failure); If (failed (HR) return hr; response-> setfiltergraph (m_pvideographbuilder); m_pvideographbuilder-> QueryInterface (iid_imediacontrol, (void **) & m_pvideomediactrl); m_pvideographbuilder-> QueryInterface (iid_ivideowindow, (void **) & m_pvideowindow) finddevicefilter (& m_pfiltervideocap, callback ); If (m_pfiltervideocap) m_pvideographbuilder-> addfilter (m_pfiltervideocap, T2W ("videocap"); // create a preview filter hR = Preview (clsid_videorenderer ); if (failed (HR) return hr; m_pvideographbuilder-> addfilter (m_prenderfiltervideo, l "videorenderfilter"); Connect (m_pfiltervideocap, m_prenderfiltervideo); // set the preview window crect RC; getclientrect (m_hownerwnd, & rc); int iwidth = r C. right-RC. left; int iheight = RC. bottom-RC. top; int ileft, iTOP; If (iheight * 1.0)/(iwidth * 1.0)> = 0.75) {// calculate int tmpiheight = iwidth * 3/4 by width; iTOP = (iheight-tmpiheight)/2; iheight = tmpiheight; ileft = 0;} else {// calculate int tmpiwidth = iheight * 4/3; ileft = (iwidth-tmpiwidth) /2; iwidth = tmpiwidth; iTOP = 0;} m_pvideowindow-> put_owner (oahwnd) m_hpreviewwnd); m_pvideowindow-> put_vi Sible (oatrue); m_pvideowindow-> put_windowstyle (ws_child | birthday); // connect to the network and send ccomptr <irtpoption> prenderoption; ccomptr <ivideooption> pvideooption; tagvideoinfo VIF, 24); int t = (INT) (m_iframerate/5) * 5) + 5; VIF. nbitcount = 24; VIF. nwidth = 160; VIF. nheight = 120; HR =: cocreateinstance (clsid_fh263render, null, clsctx_inproc, iid_ibasefilter, (void **) & m_pvideortprender); If (FAI LED (HR) return hr; m_pvideortprender-> QueryInterface (response, (void **) & prenderoption); m_pvideortprender-> QueryInterface (iid_ivideooption, (void **) & pvideooption ); pvideooption-> setproperty (& vif); pvideooption-> setsendframerate (m_iframerate, 1); // 1 do not send data, 0 actually send data connect (m_pfiltervideocap, m_pvideortprender ); // connect to the peer hR = prenderoption-> connect (szclienta, ivideoport, 1024); If (failed (HR) retur N hr; audio-> Run () ;}// ccomptr <igraphbuilder> receiver; // video graphics manager ccomptr <ibasefilter> m_pfiltervideocap; ccomptr <ivideowindow> m_pvideowindow; ccomptr <imediacontrol> identifier; ccomptr <ibasefilter> m_pvideorenderfilter; hwnd m_hrenderwnd; hresult videorecive () {hresult hr; HR = cocreateinstance (identifier, null, identifier, identifier, (void **) & Response); response-> QueryInterface (iid_imediacontrol, (void **) & m_pvideomediactrl); response-> QueryInterface (iid_ivideowindow, (void **) & m_pvideowindow) HR = :: cocreateinstance (clsid_fh263source, null, clsctx_inproc, iid_ibasefilter, (void **) & m_pvideortpsource); If (failed (HR) return hr; response-> addfilter (m_pvideortpsource, L "my custom source "); Ccomptr <irtpoption> callback; ccomptr <ivideooption> m_pvideooption; m_pvideortpsource-> QueryInterface (callback, (void **) & callback); m_pvideortpsource-> QueryInterface (callback, (void **) & m_pvideooption); tagvideoinfo VIF (160,120, 24); m_pvideooption-> setproperty (& vif); HR = prenderoption-> connect (szclienta, ivideoport + ); if (failed (HR) return hr; // create a previewed filter hR = Callback (clsid_videorenderer); If (failed (HR) return hr; response-> addfilter (m_prenderfiltervideo, l "videorenderfilter"); Connect (m_pvideortpsource, m_prenderfiltervideo); crect RC; getclientrect (m_hownerwnd, & rc); int iwidth = RC. right-RC. left; int iheight = RC. bottom-RC. top; int ileft, iTOP; If (iheight * 1.0)/(iwidth * 1.0)> = 0.75) {// calculate int by width Tmpiheight = iwidth * 3/4; iTOP = (iheight-tmpiheight)/2; iheight = tmpiheight; ileft = 0;} else {// calculate int tmpiwidth = iheight * 4/3 by height; ileft = (iwidth-tmpiwidth)/2; iwidth = tmpiwidth; iTOP = 0;} m_pvideowindow-> put_owner (oahwnd) m_hrenderwnd); m_pvideowindow-> put_visible (oatrue ); m_pvideowindow-> put_windowstyle (ws_child | ws_clipsiblings); m_pvideomediactrl-> Run (); Return s_ OK;} // HRE Sult finddevicefilter (ibasefilter ** ppsrcfilter, guid deviceguid) {hresult hr; ibasefilter * psrc = NULL; ccomptr <imoniker> pmoniker = NULL; ulong cfetched; If (! Ppsrcfilter) return e_pointer; // create the system device enumerator ccomptr <icreatedevenum> pdevenum = NULL; HR = cocreateinstance (clsid_systemdeviceenum, null, empty, empty, (void **) & pdevenum); If (failed (HR) return hr; // create an enumerator for the Video Capture Devices ccomptr <ienummoniker> pclassenum = NULL; hR = pdevenum-> createclassenumerator (deviceguid, & pclassenum, 0); If (failed (HR) return hr; If (pclassenum = NULL) return e_fail; if (s_ OK = (pclassenum-> next (1, & pmoniker, & cfetched) {hR = pmoniker-> bindtoobject (0, 0, iid_ibasefilter, (void **) & psrc); If (failed (HR) return hr;} else return e_fail; * ppsrcfilter = psrc; return s_ OK;} // construct the audio graph, and sends the ccomptr <igraphbuilder> audio; // audio graphics manager ccomptr <audio> audio; ccomptr <ibasefilter> m_pfilteraudiocap; ccomptr <imediacontrol> audio; hresult audiographinit () {hhr result; hR = m_paudiographbuilder.cocreateinstance (clsid_filtergraph); If (failed (HR) return hr; HR = Hangzhou (clsid_capturegraphbuilder2); If (failed (HR) return hr; response-> setfiltergraph (response); response-> QueryInterface (iid_imediacontrol, (void **) & m_paudiomediactrl); finddevicefilter (& m_pfiltervideocap, response); If (m_pfilteraudiocap) response-> addfilter (m_pfilteraudiocap, T2W ("audiocap"); // send to network hR =: cocreateinstance (clsid_fg729render, null, clsctx_inproc, iid_ibasefilter, (void **) & lt) if (failed (HR) return hr; response-> addfilter (m_paudiortprender, l "filterrtpsendaudio"); Connect (m_pfilteraudiocap, interval); ccomptr <irtpoption> poption; m_paudiortprender-> QueryInterface (iid_ijrtpoption, (void **) & poption) HR = poption-> connect (szclienta, iaudioport, 1024); If (failed (HR) return hr; m_paudiomediactrl-> Run (); Return s_ OK;} // audio receiving ccomptr <igraphbuilder> audio; // audio graphics manager ccomptr <compression> audio; ccomptr <ibasefilter> m_pfilteraudiocap; ccomptr <imediacontrol> lower; ccomptr <ibasefilter> m_paudiorender; hresult audiorecive () {hresult hr; HR = lower (clsid_filtergraph); If (failed (HR) return hr; response-> QueryInterface (iid_imediacontrol, (void **) & m_paudiomediactrl); HR = m_paudiortpsource-> cocreateinstance (clsid_fg729source); If (failed (HR) return hr; audio-> addfilter (m_paudiortpsource, l "audiortp"); // create a sound card renderfilter finddevicefilter (& audio, audio); audio-> addfilter (m_paudiorender, l "audiorender "); ccomptr <irtpoption> prtpoption; m_paudiortpsource-> QueryInterface (callback, (void **) & prtpoption) HR = prtpoption-> connect (szclienta, iaudioport + ); if (failed (HR) return hr; Connect (m_paudiortpsource, m_paudiorender); m_paudiomediactrl-> Run (); Return s_ OK ;}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use DirectShow to implement QQ's audio/video chat function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use DirectShow to implement QQ's audio/video chat function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support