Live555 supports MPEG4 ES (elemental stream) streams. The relevant classes are mpegvideostreamframer and mpeg4esvideortpsink. I want to extend the support for the AVI format by parsing MPEG4 data packets in Avi and handing them over to mpegvideostreamframer for processing. Later I found that this would not work at all. The problem is that mpegvideostreamframer processes strict MPEG4 es streams.
Let's briefly describe the es stream of MPEG4:
The composition of MPEG4 elemental stream is as follows:
Vos-> vo-> vol-> Gov (optional)-> VOP
Vos Visual Image Sequence
VO visual Mirroring
Vol visual object Layer
Gov Visual Object Plane group (VOP Group)
VOP Visual Object Plane
Followed by vop, there is a 2bit sign, used to indicate the frame is an I frame, P frame, B frame or S Frame (GMS-VOP)
The flag is as follows:
00: I frame
01: P Frame
10: B frame
11: S Frame
The start and end operators are defined as follows:
#define VISUAL_OBJECT_SEQUENCE_START_CODE 0x000001B0#define VISUAL_OBJECT_SEQUENCE_END_CODE 0x000001B1#define GROUP_VOP_START_CODE 0x000001B3#define VISUAL_OBJECT_START_CODE 0x000001B5#define VOP_START_CODE 0x000001B6
Open the AVI file in binary mode and find that only the VOP start character exists, indicating that only the VOP level exists, rather than the strict es stream. A vop corresponds to a frame.
Later, we found that the live555 implements another class, mpeg4videostreamdiscreteframer, inherited from mpeg4videostreamframer. It can process Vos, Bov and VOP, which can meet the requirements.
Let's take a look at mpeg4videostreamdiscreteframer's processing of MPEG4 data.
Void metadata: aftergettingframe1 (unsigned framesize, unsigned numtruncatedbytes, struct timeval presentationtime, unsigned durationinmicroseconds) {// check that the first 4 bytes are a system code: if (framesize> = 4 & FTO [0] = 0 & FTO [1] = 0 & FTO [2] = 1) {fpictureendmarker = true; // assume that we have a complete 'picture' here unsigned I = 3; // The sequence of visual objects, according to the complete MPE G4 elemental stream for parsing // If (FTO [I] = 0xb0) {// visual_object_sequence_start_code/the next byte is the "profile_and_level_indication": If (framesize> = 5) fprofileandlevelindication = FTO [4]; // The start of this frame-up to the first group_vop_start_code // or vop_start_code-Is stream configuration information. save this: for (I = 7; I <framesize; ++ I) {If (FTO [I] = 0xb3/* group_vop _ Start_code */| FTO [I] = 0xb6/* vop_start_code */) & FTO [I-1] = 1 & FTO [I-2] = 0 & FTO [I-3] = 0) {break; // The configuration information ends here }} fnumconfigbytes = I <framesize? I-3: framesize; Delete [] fconfigbytes; fconfigbytes = new unsigned char [fnumconfigbytes]; for (unsigned J = 0; j <fnumconfigbytes; ++ J) fconfigbytes [J] = FTO [J]; // This information (shoshould) also contain a vol header, which we need // to analyze, to get "vop_time_increment_resolution" (which we need //-along with "vop_time_increment"-in order to generate accurate // presentation times for "B "Frames ). analyzevolheader ();} if (I <framesize) {u_int8_t nextcode = FTO [I]; // VOP group // If (nextcode = 0xb3/* group_vop_start_code */) {// skip to the following vop_start_code (if any): For (I ++ = 4; I <framesize; ++ I) {If (FTO [I] = 0xb6/* vop_start_code */& FTO [I-1] = 1 & FTO [I-2] = 0 & FTO [I-3] = = 0) {nextcode = FTO [I]; break ;}}/// Visual Object Plane // If (nextcode = 0xb6/* vop_start_co De */& I + 5 <framesize) {++ I; // get the "vop_coding_type" from the next byte: u_int8_t nextbyte = FTO [I ++]; u_int8_t vop_coding_type = nextbyte> 6; // 2bit after the VOP start, indicating the Frame Type I/P/B/S // next, get the "modulo_time_base" by counting the '1' bits that // follow. we look at the next 32-bits only. // This shoshould be enough in most cases. u_int32_t next4bytes = (FTO [I] <24) | (FTO [I + 1] <16) | (FTO [I + 2] <8) | FTO [I + 3]; I + = 4; u_int32_t timeinfo = (nextbyte <(32-6) | (next4bytes> 6); unsigned modulo_time_base = 0; u_int32_t mask = 0x80000000; while (timeinfo & Mask )! = 0) {++ modulo_time_base; mask >>=1 ;}mask >>=2; // then, get the "vop_time_increment ". unsigned vop_time_increment = 0; // first, make sure we have enough bits left for this: If (mask >>( fNumVTIRBits-1 ))! = 0) {for (unsigned I = 0; I <fnumvtirbits; ++ I) {vop_time_increment | = timeinfo & mask; mask >>=1 ;} while (mask! = 0) {vop_time_increment >>=1; mask >>=1 ;}/// if it is a "B" frame, timestamp to be corrected /// if this is a "B" frame, then we have to tweak "presentationtime ": if (vop_coding_type = 2/* B */& (flastnonbframepresentationtime. TV _usec> 0 | flastnonbframepresentationtime. TV _sec> 0) {int timeincrement = flastnonbframevop_time_increment-vop_time_increment; If (timeincrement <0) timeincrement + = vop_time_increment _ Resolution; unsigned const million = 1000000; double usincrement = vop_time_increment_resolution = 0? 0.0: (double) timeincrement * million)/percent; unsigned secondstosubtract = (unsigned) (usincrement/million); unsigned percent = (unsigned) usincrement) % million; presentationtime = flastnonbframepresentationtime; If (unsigned) presentationtime. TV _usec <usecondstosubtract) {presentationtime. TV _usec + = million; If (presentationtime. TV _sec> 0) -- presentationtime. TV _sec;} presentationtime. TV _usec-= usecondstosubtract; If (unsigned) presentationtime. TV _sec> secondstosubtract) {presentationtime. TV _sec-= secondstosubtract;} else {presentationtime. TV _sec = presentationtime. TV _usec = 0 ;}}else {response = presentationtime; response = Response ;}}// complete delivery to the client: fframesize = framesize; response = numtruncatedbytes; fpresentationtime = presentationtime; fdurationinmicroseconds = durationinmicroseconds; aftergetting (this );}
The above Code actually only completes one function, that is, when the current VOP is B frame, adjust the timestamp.
Finally, let's take a look at the processing of the MPEG4 es stream timestamp. When processing the es stream of MPEG4, use mpeg4videostreamframer as the source. The analyzer mpeg4videostreamparser is used to analyze the complete MPEG4 elemental stream, mainly to parse the time information.
Void mpegvideostreamframer: continuereadprocessing () {unsigned acquiredframesize = fparser-> parse (); If (acquiredframesize> 0) {// we were able to acquire a frame from the input. // It has already been copied to the reader's space. fframesize = acquiredframesize; fnumtruncatedbytes = fparser-> numtruncatedbytes (); // "fpresentationtime" shocould have already been computed. /// calculate the frame duration based on the frame count and frame rate // // Compute "fdurationinmicroseconds" now: fdurationinmicroseconds = (fframerate = 0.0 | (INT) fpicturecount) <0 )? 0: (unsigned) (fpicturecount * 1000000)/fframerate); fpicturecount = 0; // call our own 'after getting' function. because we're not a 'leaf' // source, we can call this directly, without risking infinite recursion. aftergetting (this);} else {// we were unable to parse a complete frame from the input, because: //-we had to read more data from the source stream, or //-the source stream has ended .}}
To calculate fdurationinmicroseconds, the frame rate parameter fframerate is required. It is determined by analyzing the vol header.
Void mpeg4videostreamparser: analyzevolheader () {// parse the time information from vol /// extract timing information (in particle, // "vop_time_increment_resolution ") from the vol header :... do {... // use "vop_time_increment_resolution" as the 'frame rate' // (really, 'tick rate'): usingsource ()-> fframerate = (double) vop_time_increment_resolution; // Frame Rate Return;} while (0 );...}