Ogg Audio Format analysis

Source: Internet
Author: User
I. OGG audio format Overview

Ogg is a free and open standard container format that is maintained by the Xiph.org Foundation. The OGG format is not limited by software patents and is designed to efficiently stream media and handle high-quality digital multimedia.

"Ogg" means a file format that can be incorporated into a wide variety of free and open source codecs, including audio, video, text (like subtitles) and metadata processing.

Under Ogg's multimedia framework, Theora provides a lossy image level, often using a music-oriented Vorbis codec as the sound level. Compression codecs for speech design Speex and lossless audio Compression codecs FLAC and OGGPCM may also be used as sound levels.

The term "ogg" usually means that Ogg Vorbis this audio file format, which is the format of the Vorbis encoded sound contained in the Ogg container. In the past,. ogg this extension was used in any of the Ogg support formats; In 2007, however, the Xiph.org Foundation made a request for backward compatibility, leaving. ogg to be used only in the Vorbis format. The Xiph.org Foundation decided to create new extensions and media formats to describe different types of content, such as those used for sound effects only. OGA, videos containing or without sound (covering Theora) are used by the. OGV and programs used by. OGX.

Oggvobis (Oggvorbis) is a new audio compression format, similar to the music format of MP3. Oggvobis is completely free, open and without patent restrictions. Oggvorbis file has the. OGG extension. The Ogg file format can be continuously improved in size and sound quality without affecting the old encoder or player. OGG Vorbis has a feature that supports multichannel. two. ogg Audio format profiling

1. The organization of the Ogg file

Ogg links the logical flow organization in pages (page), with PageHeader and pagedata for each page. This is shown in Figure 1 below:

A *

b*

c*

...

..

a#

...

b#

C#

d*

d#

BOS BOS BOS EOS EOS, EOS BOS

Figure 1 The organization of the Ogg file

The files in the image above link two physical streams, a, B, and C three logical streams constitute a physical stream, and logical flow D is a separate physical stream. The bos_page of all logical flows in a physical stream must be adjacent to the physical location, as shown in Figure 1 *a*, *b*, *c* three bos_page locations.

Bos:beginning of Stream;

Eos:end of Stream;

2. OGG page Structure

Each page is independent of each other and contains the appropriate information, the size of the page is variable, usually 4k-8kb, and the maximum value cannot exceed 65307bytes (27+255+255*255=65307). Page header format as shown in Figure 2:

0 8 16) 24 31

Oggs

V

Header_type

Granule_position

Serial_number

Page_sequence

Crc_checksum

Num_segment

Segment_table

..............................

..............................

............

Payload

..............................

Figure 2 Ogg Page header structure

1) Page ID: ASCII character, 0x4f ' O ' 0x67 ' g ' 0x67 ' g ' 0x53 ' S ', 4 byte size, which identifies the beginning of a page. The function is to separate the Ogg package format to restore the media encoding when recognizing the role of a new page.

2) Version ID: Generally the current version defaults to 0, 1 bytes.

3) Header_type: Identifies the type of the current page, 1 bytes,

0x01: This page the media encoding data and the previous page belong to the same logical flow of the same packet, if this bit is not set, indicating that this page is a new packet start;

0X02: Indicates that the page is the first page of the logical flow, the BOS identity, if this bit is not set, that means the first page;

0X04: Represents the last page of the page's logical flow, the EOS identity, if this bit is not set, it means this page is not the last page.

4) Granule_position: Media encoding-related parameter information, 8 bytes, for the audio stream, it stores the logic stream to this page the number of sampling code in the PCM output, it can be a timestamp. For video streaming, it stores the number of video frame encodings to this page. If this value is-1, it means that the packet of the logical stream is not closed until this page. (Small end)

5) Serial_number: The id,4 byte of the stream in the current page, which is the ordinal of the logical flow and other logical flows to which this page belongs, we can divide the stream by this value. (Small end)

6) Page_seguence: This page has a sequence number of 4 bytes in the logical stream. The Ogg decoder can identify any page loss.

7) Crc_cbecksum: Cyclic redundancy checksum checksum, 4 bytes, contains the 32bit CRC checksum of the page (including header 0 CRC checksum and page data check), and its resulting polynomial is: 0x04c11db7.

8) Num _segments: Gives the number of segement, 1 bytes, of the page that appears in the segment_table domain. Its maximum value is 255. The maximum physical size of the page is 65307bytes, which is less than 64KB.

9) Segment_table: Literally it is a table that represents the length of each Segment, and the value range is 0~255.

By Segment can get the value of packet, each packet size is the last not equal to 255 segment end, from the page header segment_table can get each packet length, for example: If a group of segment sequentially in order of FF The FF FF FF 5 FF FF FF66, then the first packet has a length of 255+69 = 324, the second packet size 829, the same.

The page header is basically composed of the above parameters, so we can get the length of the page header and the length of the entire page:

Header_size = 27+num_segments; (byte)

Page_size = the size of each segment in the Header_size +segment_table;

3. ogg Package processing process (attached)

1) The audio and video coding is presented in the form of "Packets" with packet boundaries before being provided to the Ogg package, and the packet boundaries depend on the specific encoding format. As shown in Figure 3.

2) Each packet of the logical stream is fragmented segmentation, each piece is fixed at 255Byte, but the last segment of the packet is usually less than 255 bytes. Because the size of the packet can be any length, it is determined by the specific media Encoder.

3) page encapsulation, each page is added to the page header, the length of each page can be unequal, determined by the specific circumstances. The Segment_table field of the page header tells the size of the "Lacing_value" value, which is the length of the last segment in the page (can be 0, or less than 255). Processing one packet at a time, this packet is encapsulated as one or more page pages (the page length is capped, typically 4kB); the next packet must be encapsulated with a new page, represented by the settings specified by the header field Header_type_flag.

Multiple logical streams (such as voice, text, pictures, audio, video, and so on) that have been encapsulated in a page format are used to synthesize the physical flow according to the time-series relationship required by the application.

Logical bitstream with packet boundaries
-----------------------------------------------------------------
> | packet_1 | packet_2 | Packet_3 | <
-----------------------------------------------------------------

|segmentation (logically only)
V

Packet_1 (5segments) packet_2 (4segs) p_3 (2 segs)
------------------------------ --------------------------------
.. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3| | |seg_1|s_2 |.
------------------------------ --------------------------------

| Page Encapsulation
V

Page_1 (packet_1 data)    page_2 (pket_1data)    page_3 (packet_2 data)
--------------------- --- ---------------- ------------------------
| h|-------------------|  | h|-----------|  | H|-------------------|
| d| | seg_1|seg_2|seg_3| |  | D|seg_4|s_5 | |  | d| | seg_1|seg_2|seg_3| | ...
| r|-------------------|  | r|-----------|  | R|-------------------|
------------------------ ---------------- ------------------------

|
Pages of            |
other   --------|  |
logical        -------
bitstreams      | MUX |
               -------
                  |
                   v

Page_1 page_2 Page_3
------  ------  ------- -----  -------
... | |  |   ||    | ||  |  ||  |    ||  | ...
------  ------  ------- -----  -------
Physical Ogg Bitstream

Figure 3 ogg Package flow diagram

4. OGG Vorbis bit stream structure

The Vorbis bitstream starts with three data headers. These header packets are sequentially: The identification header, the comment header, and the provisioning packet. These are closely related to decoding Vorbis audio files.

1) Packet header structure

Each packet starts with the same header structure:

u [Packet_type]: 8 bit value

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.