Chapter One: Fundamentals
This chapter will show you how to start learning about Web audio APIs, which browsers support audio APIs, how to detect if an audio API is available, what an audio graph is, what an audio node is, how to connect each audio node, some underlying node types, and finally how to load sound files and play sounds.
Introduction to the history of Web audio
The first way to play a sound on a Web page is through the <bgsound> tag, which allows the site author to play the background music automatically when the user accesses the page. This feature can only be used in Internet Explorer, and the feature has never been standardized or used by other browsers. Netscape uses <embed> tags to achieve a similar feature, and provides the basic same functionality.
Flash is the first cross-browser way to play sound on a Web page, but there is a huge drawback to having a plugin running. In recent years, browser makers have focused on the HTML5 <audio> element, which provides native support for audio in modern browsers.
Although the <audio> tag eliminates the need for plug-ins in the web, it still has a lot of limitations when it comes to complex gaming and interactive applications. The following are some of the limitations of <audio> elements:
• No precise time control
• Only a very small number of sounds can be played at a time
• No reliable preload sound
• Live effects cannot be used
• Cannot parse sound
Historically there have been several attempts to create a powerful Web audio API to address these limitations, and one notable example is the audio data API. It is a prototype designed and made by Mozilla Firefox. Mozilla's approach is to first create a <audio> element and then extend more functionality to its JavaScript API. These APIs have only a limited audio graph (the later "audio context" contains more information about the audio graph) and are not adopted after the first implementation. This API has now been deprecated in order to support the Web audio Api,firefox.
Unlike the audio data API, the Web Audio API is a completely new model, fully detached from <audio>, although there is some integration with other Web API (see Chapter seventh). It is a high-level JavaScript API for processing and compositing audio in Web applications. The goal of this API is to include more of the ability to discover audio processing in audio mixing, processing, and filtering tasks in modern game engines and some modern desktop audio product applications. The end result is a multi-functional API that can be used on a variety of audio-related tasks, from gaming to interactive applications to very advanced music synthesis and visualization applications.
Games and interactions
Audio is a huge part of making the interactive experience very appealing. If you don't believe me, you can try to turn off the sound to see a movie.
The game is no exception! My favorite video game memory is full of music and sound effects. Now, nearly 20 years after the release of my favorite game, I still can't forget Koji Kondo's Zelda and Matt Uelmen's Diablo dubbing. (The back bla bla said how he could not forget the sounds of the Games--the translator's note)
Sound effects also have a huge impact outside the game. They exist in various UI from the command line, such as some command errors that make the computer beep. The same idea continues into the modern UI, where properly handled sounds are critical in notifications, Zhong, and, of course, some audio and video communication applications, such as Skype,google now and Siri provide very rich audio-based feedback. When we delve into the ubiquitous world of computing, we find that voice-and gesture-based interfaces that enable people to interact with no-screen interactions rely more on audio feedback. Finally, for computer users with visual impairments, voice prompts, speech synthesis, and speech recognition are critical to creating an available experience.
Interactive audio presents some interesting challenges. In order to create a recognizable game of music, the designer needs to adjust to accommodate all the potentially unpredictable game states that a player may be in. In practice, the game part can run at an indeterminate time, while the sound interacts with the environment and mixes in a complex way. They have an environment-specific effect and a related sound position. Eventually, there may be a large number of sounds playing at the same time, and these sounds need to sound good and do not cause any degradation in quality and performance during rendering.
Audio context
The Web page Audio API is built on the concept of an audio context. An audio context is a direct image of an audio node that defines how an audio stream flows from its source (usually an audio file) to its end (usually your speakers). Because the audio passes through each node, his attributes can be modified and reviewed. The simplest audio context is the connection directly from the source node to the end node. (Figure 1-1)
An audio context can become very complex and contains many source-to-end nodes (Figure 1-2), which enables arbitrary advanced compositing and parsing.
Figures 1-1 and 1-2 represent the audio nodes as blocks. Arrows indicate a connection between nodes. A node can typically have multiple inputs and multiple output connections. By default, if there are multiple inputs connected to a node, the Web page audio API simply mixes all the input audio signals.
The audio node graph is not a new concept. It goes back to some popular audio frameworks like Apple's CoreAudio, which has a similar audio processing diagram API. The idea is older, and it originated in some audio environments in the 1960 's, such as the Moog modular synthesizer system.
Initialize audio context
var contextclass = (window. Audiocontext | |
Window.webkitaudiocontext | |
Window.mozaudiocontext | |
Window.oaudiocontext | |
Window.msaudiocontext);
if (Contextclass) {
//Web Audio API is Available.
var context = new Contextclass ();
//Web Audio API is not available . Ask the user to use a supported browser.
A single audio context can support multiple sound inputs and complex audio graphs, so in general, every audio application we create requires only one audio context. An instance of the audio context contains many ways to create audio nodes and manipulate global audio preferences. Fortunately, these methods have no webkit prefixes and are relatively stable. The API is still changing, so be careful about sudden changes. (Detailed Appendix A)
Types of Web page audio nodes
One of the main uses of the audio context is to create a new audio node. Broadly speaking, there are several audio nodes:
SOURCE Node
Sound sources such as audio buffering, real-time audio input, <audio> markers, oscillators, and JS processors
Improved node
Filters, convolution, panning, JS processors, etc.
Analysis Node
Parser and JS processor
End Node
Audio output and offline processing buffering
Connecting Audio graphs
The output of any audio node can be connected to the input of any other node using the Connect () function. In the following example, we connect the output of a source node to a receiving node and connect the output of the receiving node to the end of the context.
Create the source.
var Source = Context.createbuffersource ();
Create the Gain node.
var gain = Context.creategain ();
Connect Source to filter, filter to destination.
Source.connect (gain);
Gain.connect (context.destination);
Note Context.destination is a special node that is related to the default audio output of your system. The audio image of the above code looks like Figure 1-3.
Once we've connected an image like this, we can modify it dynamically. We can break a node with Node.disconnect (Outputnumber). For example, in order to transform the line so that the source and the end can be directly connected, bypassing the middle node, we can do the following:
Source.disconnect (0);
Gain.disconnect (0);
Source.connect (context.destination);
The power of the modular process
In many games, the resulting mixed audio is combined by a number of sound sources. Sources include background music, game sound effects, UI feedback sounds, and other players ' speaking sounds in multi-player situations. An important feature of the Web Audio API is that it allows you to separate all the different channels and give you complete control over each and all of the audio. The audio diagram for this setting may look like Figure 1-4.
We have linked some of the receiving nodes to each channel and have created a primary collection node to control them. With this setting, it will be easier for your players to control each channel individually and precisely. For example, many people prefer to turn off background music while playing a game.
What is a sound?
In the jargon of physics, a sound is a longitudinal wave that can be transmitted through sound and other media. The sound source makes the molecules in the air vibrate and collide with each other. This results in high-pressure and low-pressure zones, which are simultaneously generated and scattered across the frequency bands. If you can freeze the time and see the Sonic pattern, you'll see something like Figure 1-5.
Mathematically, a sound can be represented as a function that represents the pressure value under the time domain. Figure 1-6 shows a diagram of such a function. You can see that it is similar to Figure 1-5, where high values correspond to particle-dense areas (high pressure), where low values correspond to areas where the particles are loose (low pressure).
Electronics dating back to the early 20th century enabled us to capture and reproduce the sound for the first time. The microphone captures the pressure wave and converts it into an electronic signal. + 5 volts corresponds to the highest pressure,-5 volts corresponds to the lowest pressure. Instead, the audio speaker turns the voltage into a pressure wave and we can hear it.
Whether we are synthesizing sound or analyzing it, this interesting point for audio programmers is in the black box in Figure 1-7, where it operates the audio signal. In the early days of the audio, this place was dominated by analog signal filters and other old hardware that was used by our current standards. Today, there are many modern digital products that are identical to these ancient filter devices. But before we can use the software to deal with interesting things, we need to show the sound as a way to make the computer work.
What is a digital sound?
We can sample the analog signal by a certain frequency, then encode each sample as a number. The rate at which we sample the analog signal is called the sample rate. The sample rate commonly used in sound applications is 44.1kHz. This means that every second of the sound has 44,100 records. These rates must be reduced within a certain range. Usually each value is assigned a certain number of bits, the number of bits is called bit depth. For most digital recordings (including CDs), the bit is 16, which is enough for most listeners. Audiophiles uses a 24-bit depth, which makes the sound accurate enough for the user to not hear the difference between it and the higher bit depth.
The process of converting analog signals into digital signals is called digitizing (or sampling). The diagram is shown in Figure 1-8.
In Figure 1-8, the digital signal after digitizing is not the same as the analog signal, which is caused by the difference between the bar and the smoothed line. This difference (shown in the Blue section) can be reduced in the case of higher sampling rates and bit depths. These values are then increased to increase the number of storage units in memory, disk, or Web pages where these sounds are stored. In order to save space, the telephone system typically uses a sample rate as low as 8kHz, because the frequency range that makes the vocals audible is much lower than the full range of audible frequencies.
With the sound digitized, the computer can take a sound as a long array of numbers. This encoding is called Pulse coding (PCM). Because computers are very good at working with arrays, PCM becomes a very powerful foundation for most audio applications. In the World of Web page audio APIs, long arrays of numbers that represent sounds are abstracted as audiobuffer. The Audiobuffer can store multiple audio channels (usually in stereo, i.e. left and right channels), which are represented as a set of floating-point numbers that are normalized between 1 and 1. The same signal can also be expressed as an array of 16 bits long, ranging from 15 to (2) of the 15-square-to-1 integer in the range of (-2).
Audio encoding format
The original PCM encoded audio is very large, it uses extra memory, wastes hard drive space, and consumes extra bandwidth while downloading. Because of this, audio is typically stored in a compressed format. There are two ways to compress: lossy and lossless. Lossless compression (such as FLAC) guarantees that the number of digits is the same after compressing and decompressing the sound. Lossy audio Compression (such as MP3) uses the characteristics of human hearing to save space by discarding bits that we cannot hear. lossy compression is usually good enough for most people, except for enthusiasts.
A common measure of compression is called bit rate, which represents the number of bits played per second during audio playback. The higher the bit rate, the more data is allocated per unit of time, so this requires less compression. Typically, lossy compression (such as MP3) is described using bit rates (the common bitrate is 128kb/s and 192kb/s). It is possible to encode lossy audio to any bit rate. For example, the voice of the telephone quality is usually similar to 8KB/S's MP3. Some formats (such as OGG) support the multi-bitrate rate and can also change the bitrate based on time. Be careful not to confuse bit rate with sample rate or bit depth. (Detailed exploration of what is sound).
browser support for different audio formats varies greatly. In general, if the Web Audio API is implemented in a browser, it will use the format used by the <audio> tag. So the browser has the same support model for <audio> and the Web audio interface. Generally speaking WAV (a simple lossless, typically uncompressed format) is supported by all browsers. MP3 is still subject to patent restrictions, so that it cannot be used in open source browsers such as Firefox and chromium. Unfortunately, the Ogg format, which is less popular but has no patent restrictions, is still not supported by Safari when writing.
View more real-time audio format support forms, detailed prospecting Http://mzl.la/13kGelS
download and play sound
in order to download an audio sample into the Web Audio API, We can use XMLHttpRequest and use Context.decodeaudiodata to process the results. These are asynchronous, so the main UI thread is not blocked.
var request = new XMLHttpRequest ();
request.open (' GET ', url, true);
request.responsetype = ' arraybuffer ';
//Decode asynchronously
Request.onload = function () {
Context.decodeaudiodata (Request.response, function (thebuffer) {
request.send ();
audio buffers have only one source node that can be played. Other source nodes include direct input from the microphone or a built-in device or a <audio> tag. (The seventh chapter of the detailed exploration)
Once you have downloaded your buffer, you can create a source node (Audiobuffersourcenode), connect the source node to your audio graph, and then call Start (0) on the source node. If you want to stop a sound, you can call Stop (0) on the source node. Note Both of these function calls require a time parameter on the coordinate system of the current audio context. (Second chapter of detailed exploration)
function PlaySound (buffer) {
var Source = Context.createbuffersource ();
Source.buffer = buffer;
Source.connect (context.destination);
Source.start (0);
}
The game usually has a looping background music. However, be careful about the excessive repetition of your selections: If a player is stuck in an area or a hierarchy. And instead of playing a piece of the same background music, it's worth doing something that fades away to prevent bad things from happening. Another strategy is to mix the intensity of the audio in the gradient and disappear into another piece of audio based on the game scene. (Detailed exploration of the audio parameters of the gradient)
Put them together.
If you look at the previous code, you know you need to build something when you play the sound using the Web Audio API. In a realistic game, consider implementing an abstract class of JavaScript related to the Web Audio API. An example is the Bufferloader class in the following example. It puts everything in a simple downloader, this downloader gives a collection of paths, and returns the audio buffer collection. Here's how to use this class:
Window.onload = init;
var context;
var Bufferloader;
function init () {
context = new Webkitaudiocontext ();
Bufferloader = new Bufferloader (
Context
[
‘.. /sounds/hyper-reality/br-jam-loop.wav ',
‘.. /sounds/hyper-reality/laughter.wav ',
],
Finishedloading
);
Bufferloader.load ();
}
function finishedloading (bufferlist) {
Create a sources and play them both together.
var source1 = Context.createbuffersource ();
var source2 = Context.createbuffersource ();
Source1.buffer = bufferlist[0];
Source2.buffer = bufferlist[1];
Source1.connect (context.destination);
Source2.connect (context.destination);
Source1.start (0);
Source2.start (0);
}
See a simple implementation of Bufferloader to see Http://webaudioapi.com/samples/shared.js.
Web Audio API Chapter I