Sox: audio file conversion command

Source: Internet
Author: User

When developing a call center, you need to play the voice. You need to convert the wav format of your recorded voice to the gsm format. asterisk also supports the wav format, however, it is unclear why the wav file cannot be played if it is larger, so it is only converted to the gsm format.
Command: sox 00.wav-r 8000-c 1 00.gsm resample-ql
The following is an article source found on the Internet.
Sox is the most famous Open Source audio file format conversion tool. Has been widely transplanted to Dos, windows, OS2, S
Multiple operating system platforms, such as un, Next, Unix, and Linux.
The Sox project was created by Lance Norskog and gradually improved by many developers. It now supports many types of sound
Audio File Format and audio processing effect. Generally, common sound formats are supported. More effectively, Sox can be used
Audio filtering, sampling frequency conversion, which is very useful for those who are engaged in the development or maintenance of the sound platform. Of course, in Sox
It also includes some DSP algorithms. If you are interested, you can download them and study them. Sox can be used for any purpose. However, the source code is released.
The copyright notice must be included, and the author must be declared when binary code is released.

First, a simple command is provided as follows:
Sox file1.wav-v 0.6 file2.wav
-V is the option to adjust the volume, and 0.6 is the parameter. It is a linear adjustment, not the original 0.6, but the amplitude adjustment, fi
-Le2.wav is the output file. If the number after-v is greater than 1, the volume is increased. Otherwise, the volume is reduced. If it is a negative number
The audio is also converted in reverse phase at the same time, but it cannot be added at will. If the value is too large, it is easy to produce ripple. Yes
What should I do? Type the following command:

Sox file1.wav-n stat-v

Command output result"

1.003

This leads to the maximum adjustment without distortion. The above stat acts as an effect generator to perform a statistical analysis on audio files and
If it is printed to a standard error file, option "-v" will print the Volume Adjustment-related start with "Volume Adjustment :'
Which line. -N indicates that the output file is empty. In this way, the file information can be obtained without interfering with the audio file.

The syntax format of sox is as follows:
Sox global parameter formatting parameter input file 1 formatting parameter input file 2... formatting parameter output file
Effect Generator
First, the global parameters are at the beginning. Each input file has corresponding formatting parameters. Multiple Input files can be input at the beginning.
Output file, followed by the formatting parameter, and finally the effect device. This will be discussed later.
The above syntax format is very clean and should be easy to understand.
Before outputting a file with SoX, it is a good idea to use the play command attached with SoX to first understand the effect.
Next let's do another thing: Cut off the audio file. Suppose we have a file that we don't want for about 10 seconds at the beginning,
We can do this first by looking at the length of the file:

Sox Yang wang.wav-n stat

The following output information is displayed:
Samples read: 20889600
Length (seconds): 236.843537
Scaled by: 2147483647.0
Maximum amplum: 0.996857
Minimum amplum:-0.993195
Midline amplitude: 0.001831
Mean norm: 0.084509
Mean amplitude:-0.000000
RMS amplitude: 0.119258
Maximum delta: 0.729645
Minimum delta: 0.000000
Mean delta: 0.058931
RMS delta: 0.080600
Rough frequency: 4743
Volume adjustment: 1.003
We are now concerned about the line "Length (seconds): 236.843537", which indicates that the file is 236.843537 seconds long,
About 237 seconds. Type the following command:
Sox Yang wang.wav Yang wang1.wav trim 0 10

The following command indicates that the output name is Yang wang1.wav, 0 is the start time of the file, and 10 is the time.
Concept, indicating the length, rather than the file length. Then use the playscript of soxto use ear-microphone to confirm that Yang wang1.wav is about to go.
After the length is dropped, you can start the truncation operation:
Rm-rfv Yang wang1.wav
Sox yangwang.wav yangwang1.wav trim 10 227
The output file Yang wang1.wav is the one we want. The above 227 is the final file length, equal to 237 minus 10.

SoX can perform most common audio format conversions, such:
Sox Yang wang.wav sox Yang Wangyu
After mp3lame or libmad library support is installed, you can convert the wav format to the mp3 format.

Let's talk about file concatenation. If no special parameter is specified, for example:

Sox file1.mp3 file2.mp3 file3.mp3

Similarly, file1.mp3 and file2.mp3are connected in sequence, and the output file is file3.mp3. For the "play" command attached to SoX, if no special parameter is specified, the operation method is basically the same, for example, play file1.mp3 file2.mp3.
The rec command of SoX is the same as that of sox.
However, if it is as follows:

Sox-m file1.mp3 file2.mp3 file3.mp3

File1.mp3and file2.mp3 are mixed and overlapped, and the number of audio tracks does not need to be the same. Output files may reduce audio tracks and output files
Parts are irreversible.
Note that when sequence or merge is used for file linking, the sample speed of the input file must be the same. Otherwise
No. For example, use merge to join two files:
Sox-M test.wav Yang wang.wav test1.wav
Result:
Sox: Input files must have the same sample-rate
In fact, the-M parameter is mainly used to mix several channels into one joint channel. For example, it is used to mix two single channels into a stereo channel.
Before mixing, you need to adjust the sampling rate to the same.

If you want to know the header information of the file and do not want to see a lot of information, you can use-V and-n combination, such:
Sox-V *. wav-n
Input File: 'yang wang.wav'
Sample Size: 16-bit (2 bytes)
Sample Encoding: signed (2's complement)
Channels: 2
Sample Rate: 44100
Duration:. 84 = 10444800 samples = 17763.3 CDDA sectors
Endian Type: little
Reverse Nibbles: no
Reverse Bits: no
The header information of all wav files in the current directory is printed.

To adjust the sample speed of a file, type:
Sox file1.wav-r: The sample speed value file2.wav to be adjusted. For example, if you want to set the sample speed value to 48000Hz, type the following:
Sox file1.wav-r 48000 file2.wav

An option is particularly useful. It is "-- interactive". If your output file has the same name as an existing file, it will prompt you whether to override
If this option is not available, SoX forcibly overwrites files of the same name. Therefore, a 'shell' symbolic link or batch file is permanently used.
It is best to enable it.

The conversion functions of some efficient devices support Drawing Mathematical conversion charts, which can be achieved through the global option "-- plot", followed by "-- plot"
The plotting program to be called with the conversion function can be gnuplot or Ave ave. For example:
Sox -- plot octave Yang wang.wav-n lowpass 1320> plot. m
The octave plot. m command shows the table of the effect generator conversion function.

Sometimes the output file sounds uncomfortable, so you can use "-- replay-gain" to apply replay gains to the input file.
Adjust the audio channels, adjust the albums with the track, and close the albums with the off function.

If you have a file that is in a single channel and you want to convert it into a stereo sound, type the following example:
Sox file1.wav-c 2 file2.wav
Where-c is the channel conversion option,-c
2. It can also be written as-c2. In the same principle,-c1 indicates the single channel, and-c4 indicates the four channels. Audio channel conversion and sampling rate adjustment, volume adjustment, playing
The Printing Details are combined as follows:
Sox-V4-v 1.2 file1.wav-r 48000-c 2 file2.wav where-V4 indicates printing the most details.

This is often the case where sometimes an audio file is obtained, but the file extension is not standard or the file header does not
Type. At this time, we need to specify the file type for him. How should we specify it? Use the-t option, for example:
Sox-v 1.0-V file1-t wav-r 44100-c2 file2.wav

Type man 7 soxformat to view the list of supported file types.

Here are a few examples. The following example uses the jitter generator:
Sox recital. au-r 12000-1-c 1 recital.wav vol 0.7 dither 4
In the preceding example, Sun's AU format is converted to Microsoft's WAV waveform file.-1 indicates 1-byte encoding,-2,-3-4,-8, and so on.-c 1
Volume 0.7 indicates the single-channel, Volume 0.7 indicates the volume effector. Here the volume is, dither indicates the jitter effector, and 4 indicates the jitter depth.

Sox-r 8000-u-1-c 1 file1.raw file2.wav
Specify the sampling rate of the audio file in the preceding format as 8000, adopt u-law (u law) encoding, and send the single voice to (-c 1 ).
Add the header information.

Sox file1.wav file2.wav speed 1.29
Increase to 1.299 of the original speed (the tone beats come together ).

Try the following two different effects:

Play file.wav bass-20 and play file.wav bass + 20

In the upper style, the bass effect is added to the output result.-20 indicates the low limit and + 20 indicates the high limit. The lower the value, the lower the sound, and the worse the value.
High, sound thicker.

The above describes the usage of many SoX companion programs. In fact, the sox companion programs include rec and play. Specifically,
Rec is used for recording, and play is used for audition. Their syntax is similar to sox, but the Input Source of rec is changed
An internal device or an external device. The syntax is as follows:

Play Global parameter formatting parameter input file 1 formatting parameter input file... formatting parameter output file Effect
Special effect parameters...

Rec global parameter formatting parameter output file effect parameter
The usage of play is described in the effect generator section. An example is provided to illustrate the usage of rec:
Rec file.wav
For details:
Rec-r 44100-4-u-c2-t mp3 testbench

The command line format, global options, and input and output options of SoX software packages are explained through examples above.
The next section describes how to use line options and parameters.

Section 2 SoX advanced-SoX efficient server
This section describes the performance of SoX, that is, SoX, which is used for sound filtering, sampling frequency conversion, harmony, reverb, phase shifting, and sound adjustment.
And so on. It is the most exciting part of SoX. It is precisely because of these That SoX is the Swiss in linux.
Military knife. They are all behind the output file in the command line position. You can use only one or multiple federated commands.
. However, we recommend that you try it one by one and adjust it before using it together. Of course, this requires a high cpu performance. Basically, I
They will use the SoX package's 'play' command to listen to the results through a microphone or speaker, instead of looking at the mysteries in those audio files.
.

Also, we use a short self-recorded sound (3.15 minutes long, 'wav 'format, 44.1 kHz sampling rate, 16 Bit Single channel ). Sample
It shouldn't have included any special effects. However, if you are recording from a tape, radio, or CD, and it sounds like a concert, or if ten people are playing with drums or other things in the same tone, use other samples. (Typical samples: fewer instruments
There are four types and there is no synthesizer. This is also true for the combined drums, voice, bass or guitar ). Because only in this way can we feel the effect
If the audio file used already has many special effects, you will not be able to feel the powerful SoX effect.
For example:

Play Yang wang.wav mixer 0.3, 0.5, 0.8, 0.6
The mixer filter is used to reduce the number of audio tracks by mixing or reducing audio tracks, or by copying audio tracks.
Number of tracks. The numbers above mean that 0.3 is the volume value from the left of the input channel to the left of the output channel, and 0.5 is the volume value from the input sound.
The volume value from the left side of the channel to the right side of the output channel, 0.8 is the volume value from the right side of the input channel to the left side of the output channel, 0.6 is the volume value from the input channel
The volume value from the right of the input channel to the right of the output channel. If l is left, r is right, B is back, and f is front, then
For two channels: l → l, l → r, r → l, r → r, meaning left --> left, left --> right, right --> left, right --> right. This is Two
The four channels are: the first four digits are left-front output channel lf → lf, rf → lf, lb → lf, followed by right-front channel rb → lf; lf --> rf, lb --> rf, rf --> rf, rb --> rf; then left-back audio channel output lf --> lb, rf --> lb,
Lb --> lb, rb --> lb; the output is lf --> rb, lb --> rb, rf --> rb, rb --> rb. This is the private sentiment.
Status. Therefore, the number of four channels can be as many as 16.

In the following example, the tempo (cycle) effect is applied:
Play *. wav tempo-q 0.8 82 20 16

In the above example, 0.8 sets the ratio of the new beat to the old beat, 82 sets the size of the audio segment to be divided by the selected algorithm, Unit
In milliseconds, 20 is the length of the audio. It is used to search for overlapping points. 16 is the overlapping length.

The following is an example of a tremolo effect:
Play file.wav tremolo 3.5 60
3.5 is the frequency of vibrato, measured in Hz, and 60 is the percentage of depth, specifically, the length or depth of "trembling.

In a movie, there is a kind of effect called fade-in and fade-out, which also works in music:
Play file.wav fade t 00:00:100. 09
In the above example, fade is the effect device name, t is the sonic Envelope Form, t is the linear slope, and q is the four sine waves
H indicates half of the sine wave, l is the logarithm, and p is the inverted parabolic. The default value is linear slope. . 09 is in hh: mm:
The time expressed in the form of ss. fraq. It can also be calculated by the number of samples. If it is set to 8000 s, it is samples.

The above is the fade-in effect. What should I do if I want to set the fade-out effect? Let's look at the following example:
Play *. wav fade t 00:00:50. 09 00:01:00

In the above example, t has already been mentioned. 00:00:50. 09 is the time from 0 to the end of the fade-in. 00:01:00 is the time that begins to fade out.
Time Point. 00:00:06 is the time that begins to fade out to the end. That is to say, it takes 6 seconds to fade out from 00:01:00.
: 00: 06 is over. The preceding time can be set to the number of samples, as described above.

I don't know if you feel like this. Sometimes, when you listen to cd music using headphones, your ears will beep after a long time.
Music is like spreading from the ears to the outside. It is because of the stereo effect. In the SoX package, there is an efficient device to eliminate this
It is earwax. For example:

Play file.mp3 earwax

This is a simple way to eliminate the stereo effect.

Sometimes, if the sampling size is less than 24 bytes, a quantifiable effect can be heard. The dither (high frequency vibration) filter can be used to eliminate
In this case, it is intentionally added to the signal with white noise, for example:

Play file.wav dither 100

In the above formula, 100 is the depth value.

In nature, the echo is everywhere. For example, standing in the mountains and shouting at the surrounding mountains will cause an echo between the shouting and the echo.
The interval is the latency, and its response is the attenuation value. The following is an example of the ECHO:

Play file. xxx echo 0.8 0.88 60 0.4

It sounds like playing the same sample with two instruments. 0.8 is the input volume, 0.88 is the output volume, 60 is the delay, and
The bit is millisecond, and 0.4 is the attenuation value relative to the input volume.

If the delay is longer, it sounds more like an open-air concert on the top of the hill:

Play file.wav echo 0.8 0.88 1000 0.4

It is recommended that the attenuation value not be greater than 0.5, otherwise it may cause output saturation.

If the delay is short, it sounds like a (metallic) Robot's performance.

Play file.wav 0.8 0.88 6 0.4

You can also achieve more Echo:

Play file.wav echo 0.8 0.9 1000 0.3 1800 0.25

If you are standing between mountains, it may also cause continuous repercussions, that is, the echo itself has encountered neighboring peaks, rebounded back, and then played back,
This effect is echo, which means continuous echo. If an echo is applied separately, the effect is the same as that of the echo. See the following.
Two Echo examples:

Play file.wav echos 0.8 0.7 700 0.25 700

In the above formula, echos is the echo timer. When this timer is used, the echo will be played back twice because the two latencies are the same,
All are 700. This echo is called a symmetric echo to produce an asymmetric ECHO:

Play file. xxx echos 0.8 0.7 700 0.25 900

The following example sounds like playing in a car:

Play file.wav echos 0.8 0.7 40 0.25 63 0.3

Because of the short delay, it sounds a bit dull, isn't it?

In music, there is a sound effect. It refers to the sound produced by two or more different sounds simultaneously according to certain rules.
Combination. It contains: ① chord, which is the basic material of harmony. It consists of three or more different audios. It is stacked by three degrees or other methods.
This is the vertical structure of the harmony. ② Harmony, indicating the successive connections of each chord. This is the lateral motion of the harmony.
. Add one sentence. Sound has a strong, light, thick, and thin color effect, and also constitutes a sentence, music segment, and termination of music.
. SoX also has a sound effect generator named chorus, which is the meaning of the English harmony and works like
The 'echo 'is the same, and there is a short delay. But the delay is not continuous. The delay changes are modulated by sine or trigonometric functions. Modulation depth
Defines the modulation range before or after latency. Therefore, the delayed voice sounds so fast and slow that the original file is extended.
The late voice is over-modulated, and the voice in the sound seems to be slightly changed. Let's look at the following example:

Play file.wav chorus 0.7 0.9 55 0.4 0.25 2-t

In the above formula, 55 is the delay, 0.4 is the attenuation, 0.25 is the modulation speed, unit Hz, 2 is the modulation depth, the typical delay is 40 ms (40 ms)
To 60 ms (60 ms), the modulation speed is preferably near 0.25Hz, and the modulation depth is about 2 ms (2 ms ). -T use trigonometric function modulation,
The above delay is a little short, and the output is a little overloaded. Let's look at two sound examples:

Play Yang wang.wav chorus 0.6 0.9 50 0.4 0.25 2-t 60 0.32 0.4 1.3-s

-S is used in the above formula to represent sine wave modulation.

The following example uses three voices:

Play file. xxx chorus 0.5 0.9 0.4 0.25 2-t 60 0.32 0.4-t 40 2.3 0.3 0.3-s

When watching a terrorist movie, people often put some music to render it before encountering a ghost. in SoX, there is also a similar effect.
The name is flanger, which means a flange or a blow back. It blends two equivalent sounds, but at one time
Some latencies keep changing over time, but the change is less than 20 ms. It sounds like the wind is blowing, and the speed becomes erratic and slow. Fla-
Nger is widely used in terror and soul music, so that the guitar frequency sounds fast and slow. Let's take a look at a simple example:

Play Yang wang.wav flanger

Listen carefully to the differences between the sound modulated by Sine and triangular waves:

By default, the sine wave is modulated, and then you can carefully listen to the differences between the sound modulated by Sine and triangular waves:

Play Yang wang.wav flanger triangle

Next we will use the square interpolation method:

Play Yang wang.wav flanger quadratic

The following is an example using different scanning wave shapes and different inner interpolation methods:

Play Yang wang.wav flanger quadratic flanger lin flanger sine flanger triangle

Finally, all parameters are given, and each parameter is explained as follows:

Play Yang wang.wav flanger 8 5 90 90 8 triangle 80 quadratic

In the above formula, 8 represents the basic latency, And the range is between 0 and 10. The default value is 0 (in milliseconds). 5 represents the additional scan latency, And the range is 0.
-Between 10, the default value is 2 (unit: milliseconds). 90 is the regeneration percentage, that is, the percentage of delay signal feedback, range:-95-95
The default value is 0. The second 90 is the percentage of the mixture of the delay signal and the original signal. The value range is 0-100, and the default value is 71. 8 is
Scan frequency. The value range is 0.1-10. The default value is 0.5. Triangle is modulated by triangular waves. sin is a sine wave. 80
Is the percentage of scanning wave phase movement, 0 = 100 = the same phase for each channel, the range is 0-100, the default is 25. Quadrat-ic square inner method. Linear inner method lin is optional. In practice, there is no need to specify so many parameters, many of which are default. However
It must be discussed differently.

Next, we will briefly introduce another effect device, with the remaining sound reverb. The reverb effect is often used in the concert hall.
Multiple, which cause interference when the sound is reflected to the wall. Reverb makes the sound feel like it is in a large concert hall. You can go to the bathroom
In the car or in the gym, shout out some words to experience the effect of the remaining sound. You will hear things from the wall, for example:

Play Yang wang.wav reverb 1 600 180

In the above formula, 1 is the output volume, 600 is the residual response time, and 180 is the delay time. The delay time is preferably between 1/4 and 1/2 of the remaining response time. Upper
Only one wall is taken into account. If you want to add more walls, the above formula should be: play Yang wang.wav reverb 1 600
180 200. And so on.

In the process of music processing, you often need to phase-shift the sound. In this case, you can use the phaser timer in SoX to process the sound.
The effect is like a flanger effect, but the echo is replaced with the reverb and migrated in sequence. It supports multiple instruments. See the following example:

Play file.wav phaser 0.8 0.74 3 0.4 0.5-t

3 is the delay time, it must be less than 5 ms, 0.4 is the attenuation value, recommended less than 0.5, 0.5 is the Scan Frequency, must be less than 2Hz,-t is used
Triangular modulation. If sine function modulation is required, use-s. What are the differences in the following example:

Let's listen to what is different in the following example and see what Bounce occurs in your ears:

Play Yang wang.wav phaser 0.6 0.66 3 0.6 2-t

If a common sound is modulated, it is as follows:

Play file.wav phaser 0.89 0.85 1 0.24 2-t

If you want to play the audio repeatedly, you can use repeat to do this:

Play file.wav repeat 2

Indicates that the audio is repeatedly played twice, and 0 indicates unlimited times.

In life, people often listen to music in the car or in public, and sometimes turn the volume on when they hear the bass section.
It was so loud that it overwhelmed the surrounding noise, but when it suddenly reached the pitch, the sound would suddenly become larger and the ears would not be able to stand it. Feature
Not when listening to the symphony. Is there any way to keep it in the bass, not in the tweeter?
What about ears? This is the role of the scale-in. Compression-extenders allow dynamic signal compression or expansion. Sound initiation and attenuation Based on given conditions
Calculate the average value of the input signal relative to the time, and set the output signal according to the given Conversion Function (function) parameters.
Level. A scale-in tool named compand in SoX can be used to do this. See the following example:

Sox asz.flac asz-car.flac compand 0.3, 1 6:-70,-60,-20-5-90 0.2

In the above formula, compand is the name of the Effect Generator, 0.3 is the start time (refers to the time when the generator suddenly grows), 1 is the attenuation time, and
The sound time should be shorter than the attenuation time, because our ears are more sensitive to sudden changes in sound than sudden soft sounds.
6:-70 is the scale-down converter Conversion Function table, in the unit of dB, which is associated with the maximum amplitude of the audio signal. It means a soft sound.
(-Below 70 dB) remains unchanged, which will prevent the scale-in from suddenly surging volume from silent during music conversion. But in-60dB
The sound to 0 dB (maximum volume) will be increased, and the dynamic range of the original 60 dB audio will be compressed to 20 dB, so that the bandwidth is sufficient to enjoy
Music is not affected by road noise at the same time. This is what-60 and-20 mean. -5 is an extra gain to avoid wave cutting,
-90 indicates that the initial volume starts from almost silent, which can effectively suppress the phenomenon of rippling. 0.2 seconds is delayed
The scale-down device can suppress sudden rise of sound.

To visualize the functions of the conversion function, the -- plot option can be used to call SoX. For example:

Sox -- plot gnuplot *. wav-n compand 0.2:-70,-60,-20-5-90> my. plt

Run the gnuplot my. plt command to view it.

The following long command shows how to create a multi-band contraction on the FMFM Radio:

Play file. xxx vol-8000 filter 100-32 mcompand \
"0.005, 0.1-47,-40,-34,-34,-17,-33" 100 \
"0.003, 0.05-47,-40,-34,-34,-17,-33" 400 \
"0.000625, 0.0125-47,-40,-34,-34,-15,-33" 1600 \
"0.0001, 0.025-47,-40,-34,-34,-31,-31,-0,-30" 6400 \
"0, 0.025-38,-31,-28,-28,-0,-25 "\
Vol 15dB highpass 22 highpass 22 filter-17500 256 \
Vol 9dB lowpass-1 17801


The values 8000-and-17500 after the filter indicate low filtering, 32 indicates high filtering, and 100 and 256 indicate the Window Length. Filter indicates Sinc
Filter, which removes the signal component above the given bandwidth and only retains the ideal electronic filter for low-frequency signals. Filter
8000-indicates the low frequency, 32 indicates the high frequency, the filter behind is similar, 100 is the length of the filter window, And the number behind highpass is
Filter frequency.

In practice, sometimes it is necessary to change the playback speed of the sound, but maintain its tone at the same time to achieve a dramatic effect, which can be achieved through the stretch effect, for example, to change the playback speed to twice the original speed:

Play file.wav stretch 2

Another similar effect is speed, which is used to change the playback tone and beat. For example:

Play file.wav speed 2

To increase the short tune of one sample (100 audio points), you can do this:

Play file.wav pitch 100

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.