3D audio theory research (2)-3D Introduction)

Source: Internet
Author: User

Author:It168.com reny

I. 3D sound effects
With the continuous development of software and hardware, the traditional dual-channel single-layer stereo sound field cannot meet people's needs. In order to get a better three-dimensional feeling and space feeling, scientists use digital audio to generate a brand new sound-simulating 3D sound effects.
In daily life, we use two ears to listen to things, obtain information from various sources, and then locate the sound through brain computing. The computer simulates the 3D sound computing of the human brain and plays it through digital audio sources, making us feel like we are in a virtual world.
Since in the real world, we can use a pair of ears to distinguish a 3D sound field, then only two headphones close to the ears can achieve an approximate effect. You may ask, can you use two speakers? The answer is no, because the speaker is too far away from the ears, resulting in distortion during air transmission. The effect is definitely not as good as that of Multi-speaker systems, otherwise, each vendor will not make a four-byte product.
With 3D sound effects implemented by two speakers Algorithm There are high requirements, but there is no need for too complex hardware. 3D sound effects achieved by multiple speakers require extremely expensive hardware and complex speaker location settings, with relatively low software requirements.
2. Human Hearing
To better illustrate the impact of 3D sound effects on us, it is necessary to explain the human auditory system first. The basic voice positioning principles of human ears are IID and ITD. IID (interaural intensity difference, differences between the two sides of the sound intensity) refers to the side of the ear that is closer to the sound source, the received sound intensity is higher than the other side, feel more sound. ITD (interaural time difference, the difference between the two sides of the sound time delay) refers to the difference in the azimuth, so that the time of the sound to reach two ears is different, people will think that the sound is located on the side of the arrival time earlier, the result of IID + ITD is that the audio source is located within the Cone range with the line of the two ears of the listener as the axis.
The role of the ear filter is to strengthen/weaken the acoustic energy from different angles of the sound, filter it and send it to the brain, so that we can more accurately position the sound source. The size of the ear is limited, so the range of received audio waves is limited, usually 20Hz to 20 kHz, that is, the wavelength of 16 meters to 1.6 cm of the audio wave. In other words, below this range is the secondary acoustic wave, and above this range is the ultrasonic wave.
Since the distance between the two headphones is about 15 cm (unless your head is very large), Iit and ITD will decrease when the wavelength is greater than 15 cm. Low Frequency Sound wavelength is large, so it is difficult for us to determine the position of the bass, but it is easy to tell the location of the tweeter. In fact, the positioning of the ear sound is a crucial part, and it is difficult for people without ears to determine the location of the sound.
Sound wavelength = sonic speed/frequency. Assume that the frequency is 10 kHz, that is, 10000 vibrations per second. The sound speed is set to 330 meters/second, And the wavelength is 330/10000 = 0.033 meters= 3.3 cm.
In many cases, the voice we hear does not go straight into the ears, but enters the brain after several reflections. In the process of traveling the sound wave, the energy of the sound wave will be weakened, coupled with the effect of the sound and delay caused by reflection, the sound has changed. The effect of this reflection mixing is called interactive reverb. Especially in a closed environment, you will obviously feel a lighter voice. With these changes, we can determine whether the sound is reflected, how to determine the surrounding environment, and even determine the location of the wall and the opening and closing of the door.
To simulate 3D sound effects, you need to restore the above positioning effects: IID, ITD, ear, and reflection, and analyze the changes in sound from different angles, A Virtual Sound System-digital sound field is built through computer simulation and synthesis.
Iii. HRTF
HRTF (head related transfer function, associated transmission function of the header) is a system that determines the position of the output by hearing the sound. Each user's HRTF is different and can be exchanged. If a set of HRTF can well locate the sound, this system can also give you the same accurate sound information in the virtual world.
HRTF detection is very simple. First, put two micro microphones in the human ear canal, then put a speaker near the listener, play a definite signal, and record the signal received by the microphone. Compare the pulse characteristics of the source signal and the microphone to obtain a filter effect. Finally, repeat the above process at all locations near the listener to obtain the complete HRTF system.
In the real world, we not only rely on our ears to obtain the location of the sound, but also sometimes use audio-visual synergy to locate the sound. For example, we are located outside the front door of a house and the door is open. We hear a sound source in front of the house. Even if we cannot see the sound, we can easily judge that the sound comes from inside the house, visual 3D image system + psychological computing can produce 3D sound effects similar to the real world.
The ears remind the brain of things within the visual range, which changes our attention and turns the head, so the head movement also has an impact on HRTF. Since the time the sound is transferred from left to right to the front is about 90 degrees faster than the head rotation, we can use the head rotation to determine the sound position. For example, if you cannot determine whether the sound is in front or back, you only need to turn the head to the left or right to locate it.
Scientists have designed the corresponding digital signal processing software and algorithms, which can be effectively applied to the forthcoming audio signals in real time based on the specific filtering effects produced by acoustics and psychology. HRTF is widely used, including video conferences, games, fighter cockpit alarms, and air traffic control.
Iv. Classification of 3D sound effects
Positioning and interaction are the two most important factors in 3D sound effects. Positioning allows people to accurately determine the source of the sound, which can be achieved by pre-selecting the recording sound and then performing specific decoding. Real-time positioning is interaction. The sound is not pre-recorded, but determined based on your control. Real-time interactive sound requirements for input devices are stronger than those for sound recording devices (for example, movies.

1. Extended stereo (Extended stereo)

It uses the sound delay technology for additional processing of the traditional stereo sound, widening the location of the sound field, so that the sound is extended to the space outside the speaker, let us feel the 3D world is wider. This is a technology for passive playing audio tracks, which can only be called 3D positioning sound effects at best.

2. Surround Sound (surround sound)

It uses audio compression technology (such as Dolby AC-3) to encode multi-channel audio sources into Program And then decode it using a group of Multi-speaker systems to achieve multi-area surround effect. This is also a technology for passive playing audio tracks and is most suitable for playing movies. In addition, the main work of Surround Sound is coding/decoding. Of course, two speakers can simulate the surround Effect of five speakers through special algorithms.

3. Interactive 3D audio (interactive 3D sound effects)

Interactive 3D copies the voice heard by ears in the real world as much as possible, and uses some algorithms to play it out. This makes us feel that sound may be produced in all parts of the 3D space, and changes accordingly as the listener moves. It is the most realistic 3D sound, usually used in first-person 3D games.

V. Microsoft direct sound 3D

3D sound control is implemented through software. These software are called APIs (application programming interfaces). Common APIs include: microsoft direct sound 3D, aureal a3d, creative eax, and qsound.

To enable Windows to better run various multimedia programs, Microsoft provides DirectX to software, hardware developers, and consumers. Direct sound 3D is the part for processing 3D sound effects. Unlike Direct 3D in graphics, it has certain scalability and is welcomed by many manufacturers (d3d is defeated by OpenGL due to lack of expansion ).

Ds3d is a command set that helps game developers Define the sound position and volume with specific arithmetic operations. If your sound card does not support ds3d, you can use the software 3D engine to mix the stereo sound and simulate 3D sound processing. Intel rsx and qsound qmixer are just like this, the biggest disadvantage of software processing is the CPU usage. If your sound card supports ds3d, it proves that it has a certain algorithm (usually from aureal, CRL, etc.), which can show 3D sound fields while reducing CPU usage.

Because 3D audio streams are extremely precious, software vendors only use them in important places (such as the cry of monsters). As for background music, they generally use 2D audio. The latest game supports multiple 3D audio streams, while the old-style video card only has a few audio streams, which causes sudden changes in the sound during the game.

D. The main object of sound effect positioning is the sound source, the player's position and direction. In the virtual 3D world, software vendors do not need to calculate the real speed of sound transmission, 3D sound fields can be simulated based on the Doppler effect. Programmers will make a choice between 0 and 1, 0 indicates no Doppler effect, 1 indicates the use of real Doppler effect, and sometimes the sound effects produced by this method are much better than the real world.

Given the limit of 16-bit audio, we cannot set the volume for each sound in the game, so you will encounter the same volume of car and human voice. After using ds3d, programmers can define the initial volume of each sound. For example, the initial volume of a shuttle is 300 feet, the initial volume of an insect is 1 inch, and the distance of the sound is also defined, sound from a distance is small, and sound from a distance is large. By controlling the volume, you can simulate the real-world sound environment.

Volume Control is only applicable to the source of Point sound that spreads to all directions. For a sound in a specific direction, it is not enough to adjust the volume only. Only the conical sound band is used (similar to IIT + ITD) in order to better simulate such sound sources, the internal volume in the conical shape is the maximum, and the external volume in the conical shape decreases with the increase of distance.

The authenticity of the cone audio band is higher than the normal volume control, and it is very suitable for creating dynamic sound effects. For example, if you set a sound source in the middle of the room and face the door, you cannot hear the sound in the room outside the room (outside the cone audio band). Only when you pass the door, the voice suddenly appears in the room, making people feel the effect of interaction.

Vi. aureal a3d

When Microsoft launched DirectX 3 in the summer of 96, it did not provide ds3d with any hardware acceleration capabilities. The rapid development of multimedia applications has promoted the revolution of 3D sound effects. Aureal can no longer wait for Microsoft to release a new ds3d, so it has to develop a set of 3D algorithms-a3d, and provide simple tools to software developers, it makes it easier for them to create games that support 3D sound effects.

At that time, a3d had the advantages that ds3d did not possess. It not only occupied less CPU resources, but also provided some sound effects not supported by ds3d 3.0, which quickly occupied the market. Microsoft's operations have always been slow, and it was not until DirectX 5 was launched in the fall of 97 that it began to be compatible with 3D audio stream hardware acceleration.

1. Compared with ds3d, a3d 1.x has the following advantages:

1) a sound resource management program is provided to software developers to make it easier for them to control 3D audio streams and sound cards. For example, the a3d algorithm is used to process important sound effects, while the stereo Mixed Sound is used to process non-major background music, it can partially improve the number of new game audio streams more than the sound card itself can handle the temporary sound loss problem when audio streams occur.

Now, the new version of DirectX 7.0 has implemented similar functions with the help of qsound and sensaura, and has designed itself into a scalable API. Qsound even develops sdks (software development kit, software development kit) for all sound cards-qmdx and qmixer.

2) added an ultra-distant sound effect model to simulate various atmospheric environments, such as fog and bottom water.

2. a3d waveform tracking and locating

In the real world, there are many sources that affect each other. In addition to sound interference, a part of the sound wave will be absorbed and blocked by the surrounding objects. Aureal uses the a3d waveform tracking and positioning technology to simulate the above effects. common applications are the stage and room reflection sound effects. Waveform tracking creates a virtual space and computes the waveform based on the reflection of a specific object. It can even simulate real environments such as the sound of the next room or the sound of the carpet.


3. Features of a3d

Although a3d is the first standard of 3D sound effects, the technology of adjusting sound filters cannot completely simulate the reality. The a3d 2.0 enhances sound positioning, adds a sound reflection library, and provides real-time compensation for sound. Its biggest feature is that it instantly changes the game's volume, and sometimes the volume suddenly increases or decreases. For example, when the distance from the wall is different, you can hear the sound difference produced by a3d 2.0.

Eax (Environmental audio extensions, environmental sound extension technology) allows you to customize echo sound effects, such as early sound reflection, delayed echo, Sound diffusion, detuning ratio, sound depth, and high-frequency suspension. Software developers use eax to develop special sound effects for a wide range of open space, wall reflection, etc. Because it is an application interface, it is easy to call.

In contrast, the 3D sound field produced by a3d in real time is more real than eax (Real-time is certainly better than pre-selected). Therefore, a3d is suitable for the standard 3D sound effects of 3D games, many famous shooting games have announced support for a3d 2.0, such as quake 3 and Diablo 2.

4. a3d sound card classification

There are four types of a3d sound cards: analog, DSP, vortex1, and vortex2. The simulated sound card is called the a3d soft decompression. It can capture commands issued by a3d from ds3d and generate a forged a3d. dll file. Not all a3d commands can be translated into ds3d equivalent commands. Simulation often causes the game to be unable to speak out, for example, descent: freespace.

The DSP uses a programmable DSP chip to decompress a3d hardware, such as diamond monster sound, xitel storm 3D, and shark predator 3D. Its advantage is its low CPU usage. The DSP sound cards of different manufacturers process different audio streams. For example, if monster does not use a3d, it can process 23 directsound 2D audio streams. If six a3d audio sources are used, the remaining resources only process 4 2D audios. If you use 8 16-bit 22 K a3d audio sources, you cannot process any 2D audio data.

Vortex1 is a standard a3d sound chip released by aureal. It can process 8 3D audio streams and 8 2D audio streams at the same time. If 3D audio is not used, it can process 48 2D audio streams. Vortex1 is slightly different from a common DSP chip. It calls some CPU resources for a3d sound rendering and cannot accelerate the hardware of 2D audio. It is not a programmable DSP, it is more appropriate to call it a semi-soft chip.

Vortex2 is an enhanced vortex1 product. a3d does not need to run in the small software programmable area of the chip. Instead, it uses some hardware for hard-only decompression. It can accelerate 16 a3d/ds3d audio streams and 80 directsound 2D audio streams simultaneously without occupying too much CPU resources.

Despite the powerful processing capabilities of vortex2, because the waveform tracking of a3d is very complex, processing 2D audio while enabling a3d can inevitably lead to performance degradation. This is the most obvious change when playing Quake 3, and the enabled a3d option will reduce the number of game frames by 5.

VII. Innovation eax

We hear two types of sound: direct sound and reflected sound: direct sound is transmitted directly from the sound source to the human ear; reflection sound is an obstacle (such as a wall) in the sound transmission process) it is reflected in our ears, which is also known as occlusion (occlusion ). Unless you stand in the music room, the sounds we normally hear are composed of both direct and reflected sounds. The ratio of reflected sounds to direct sounds is the wet/dry rate.

The shape, size, and substance of a 3D scene play a decisive role in the reflected sound effect. Innovation uses it to make environmental sound effects. In other words, eax is also a reflection sound engine. The environment supported by eax 1.0 is very limited and does not fully reflect the sound of all environments, but it can expand I at any time, and 2.0 provides more environments. As mentioned above, a3d itself is a direct sound engine, and reflection sound effects are also added in 2.0, which can process 60 reflected audio streams at the same time, visible reflection sound effects are the second leap in the history of virtual 3D sound effects.

Eax enhances the ds3d function, allowing software developers to control various sound effects as they wish, such as disabling a sound effect at a time and testing its impact on the game. This was hard to imagine before. Today, you only need to move a few mouse clicks.

In eax games, the software developer determines the reflection effect based on the player's positioning, environment, sound source distance, and wet/dry rate. The sound at a long distance is composed of reflection. Although the farther the sound wet/dry rate is higher, the ratio is automatically calculated by eax without manual intervention, which greatly reduces the programming difficulty. Software developers can create many different environments by changing the reflection time and wet/dry rate without changing the size of the environment.

The biggest advantage of eax is its low CPU usage. After innovation authorization, the sound cards of other companies can also be compatible with eax. Many vendors started to switch from vortex2, qsound, and sensaura to the eax camp. Unfortunately, the reflection engines used by various vendors are different, reflection algorithms may not be completed at low CPU usage.

I would like to remind you that the sound card supporting eax does not necessarily occupy a low CPU frequency. If sblive is not available! Emk10k1 and other powerful DSP (Digital Signal Processing, digital signal processing) chips can only be simulated by software, it will greatly slow down the performance of the entire system (also applies to the sound card for soft decompression a3d ).

Some games (unreal, half-life, etc.) have their own reflection engines. it is wise to know whether eax is better than these engines. In addition, eax can be applied to music CD, MP3 and non-eax games, so that you can feel the charm of eax all the time.

VIII. sensaura

Don't think that 3D sound effects only have innovations eax and aureal a3d. sensaura's history is much longer than they are. It was proposed by several audio equipment manufacturers 10 years ago and mostly used for professional music, it was not until the last few years that 3D audio games were played. According to Mercury's survey, sensaura sound cards manufactured by ESS, Yamaha, Cirrus Logic/crystal semiconduand other manufacturers account for over 70% of the PC sound chip market, low prices, good results, and high penetration rates are the reason they choose sensaura.

Like aureal and other companies, sensaura uses the HRTF (head related transfer function, header-associated transmission function) technology in headphones, while the dual-speaker system applies the cross-crossing HRTF. The latest sensaura MultiDrive can be connected to four voices
ESS has released the canyon 3D chip that supports this technology. Note: Do not mistake the speaker's position. Otherwise, the sound will become nondescribable, and the effect of using the two speakers will be better.

Theoretically, humans cannot identify more than six audio sources. 8 3D audio streams are sufficient for the game. However, in order to achieve better results, more and more game manufacturers are developing more than eight audio sources. For example, unreal supports 16 3D audio streams. As sensaura can provide 32 more sound channels, and the 3D kernel is quite efficient, sound decoding is faster than ds3d.

This is very important today when the demand for gaming sound effects is increasing. After all, 8-channel sound effects are hard to satisfy players' appetite. Sensaura provides a ds3d sound management program for software developers to better control 3D audio streams.

To get support from more vendors, sensaura is compatible with the a3d 1. x standard and eax. However, it has not innovated live! Or a DSP (Digital Signal Processing, digital signal processing) chip such as aureal vortex2 is available and can only be decoded by host resources such as CPU and memory, of course, the speed is slower than the standard a3d and eax hardware acceleration. If eax decoding is run on P2/300, approximately 2% of resources are occupied. In addition, sensaura uses a variable-point splitting low-channel filter to implement closed/blocked reflection sound effects and supports the i3dl2 standard.

The old sensaura version can only implement HRTF within 1 meter. To improve this situation, people have invented macrofx technology to provide a more precise volume mode. It is divided into 6 areas, 0-zone ultra-far distance and 1-zone long-distance mode = ds3d long-distance mode. The other four are divided into the neighborhood area, left ear area, right ear area, and central area, which can create whisper and wind (very useful for RAC games such as skiing and racing cars), headset simulation, Fast Flight bullets or rockets.

The essence of 3D sound effects is time delay. The time when different audio sources enter different ears is different, and we will feel in a virtual 3D world. Suppose there is a sound source located 2 meters ago, it will enter our ears at 85 degrees, and there is a slight difference between the two sides of the sound. Most sound effects do not show this difference, but macrofx does. You do not need additional instructions for programming, so it is easy to achieve better sound effects.

Ds3d regards all sounds as a single point sound source, which is a good technique for displaying large objects in the distance. However, for objects from far to near, the insufficiency of the Point Source will become obvious. Sensaura uses zoomfx technology to solve the above problems. As the name suggests, zoom represents the sound zoom technology. Of course, it can simulate sounds like trains from far to near. Unlike macrofx, zoomfx requires additional programming and more than three 3D audio streams.

Sound cards supporting sensaura include Yamaha 724, ESS maestor 2/2e, and canyon3d. Although they use the same algorithm and filter library, however, different DSPs and CODEC will affect the signal-to-noise ratio (SNR) and Distortion Rate of the sound card.

Currently, most sensaura only use eight channels, and macrofx and zoomfx are not well applied and cannot fully exert their power. Fortunately, sensaura is a pure technical product that is not limited by the market as it is innovative or aureal. Therefore, sensaura can make any innovation and bring us better 3D sound effects, I am eager to see sensaura in the new century.

9. qsound Q3D

Like sensaura, Q3D is also an ancient 3D sound API. Since its entry into the market in 1991, Q3D has undergone several major changes and has become a very mature 3D sound. Recent applications include Sega Dreamcast home game machines, large-scale Thunderbird, Trident's 4dwave-dx and 4dwave-nx. Qsound's 3D audio technology is not only used in the sound card field, but also includes professional software such as IQ and qcreator. In fact, it is more widely used in game consoles than PCs. You can see qsound in many large game consoles.

Qsound uses the HRTF filter technology and uses two speakers for desktop (Q1) and in-ear (Q2) Q3D positioning. The biggest feature of qsound is that it adds HRTF to music to enhance its 3D effect, but it does not improve the real game. However, the qsound method is very effective for 3D games, it can implement 32 3D audio channels at a low CPU usage of about 5%. Q3D 2.0 has a reflection sound engine that supports ds3d, eax, and a3d 1.x, as well as the popular four channels.

Qsurround is the ring sound branch of qsound. It can be used with two or four speakers to achieve a sound effect similar to Dolby Surround. It is suitable for playing on a DVD.

To open the professional user market, qsound provides qmdx and qmixer. qmdx is a free software development kit. qmixer allows sound cards that do not support ds3d to use Q3D. What's more, the two software also supports eax, which can be used to simulate eax on non-eax sound cards.

The three sound effects of qsound are qxpander, QMSs, and 2d-to-3d re ing. Qxpander enhances audio processing by converting audio mixing into 3D sound effects (QX). QX is implemented by two speakers and q2x is implemented by headphones. Qxpander is not a real interactive 3D sound, and cannot directly support games. It only broadens and deepens the sound field (especially the music field). It is an extended stereo sound at best. However, you can feel the benefits of playing music CD, Midi, WAV, and DVD image files.

Qxpander's supporting software includes IQ, iqfx, and ultraq. IQ can add qxpander technology to most sounds, including Windows system sounds, MP3 files, and Real Player. IQ does not currently support music CD and MIDI files, which is a big pity. With the emergence of AC-97 specifications and new MIDI standards, I believe qsound will soon complement this defect. Iqfx is a special version of IQ, which integrates real-audio player and bass boost technology. Ultraq is used to connect a separate audio mixing output device (Power Amplifier) to provide additional wide sound field effects.

When you use four speakers, you need to use QMSs, which can copy the information of the front speaker to the rear speaker, but does not include the reflection effect.

To use qxpander in direct audio games, you must use 2d-to-3d re ing. To locate the single-channel audio stream (2D) separately, the hybrid plane data is mapped to a variable-width 3D Orientation, which is called 2d-to-3d re ing. It is a powerful supplement to qxpander. All Q3D engines support this technology.

The latest Q3D 2.0 sound card supports headphones, dual speakers, four speakers, and qem (qsound Environmental Modeling, qsound Environment Modeling speaker group). In short, Q3D 2 = Q3D + qem 1.0. In addition, it has a high quality sampling rate and Interpolation Speed, which is faster than 1.0. Currently, Q3D 2.0 sound cards are used: Trident 4dwave-dx/NX and vlsithunderbird 128.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.