PHP implements text-based MOS code generator _ php instance

Source: Internet
Author: User
This article mainly introduces related information about PHP implementation of text-based Moss code generator. For more information, see a recent requirement for generating Moss code audio files based on input text. After several unsuccessful searches, I decided to write a generator myself.

I decided to use PHP as my main programming language because I wanted to access my Moss code audio files through the web. The above shows a Web page that begins to generate Moss code. The downloaded zip file contains a webpage for submitting text and a PHP source file for generating and displaying audio files. If you want to test the PHP code, copy the webpage and related PHP files to the PHP-enabled server.

For many people, Mosi code is like a sequence of "points" and "horizontal lines", or a series of beeps, as shown in some old movies. Obviously, this is far from enough understanding if you want to use computer code to generate Moz code. This article will introduce the elements of the Moz code generation, how to generate audio files in the WAVE format, and how to use PHP to convert Moz code into audio files.

Moz code

Moz code is a text encoding method. It has the advantage of convenient encoding and can be easily decoded by human ears. Essentially, an audio (or radio frequency) pulse is formed by enabling and disabling audio (or radio frequency), which is generally called dot and dash ), or, in radio terms, it is called "radio" and "tick ". In terms of modern digital communication, Moz code is an amplitude shift keying (ASK ).

In Moz code, characters (letters, numbers, punctuation marks, and special symbols) are encoded into a sequence of "tick" and "tick. Therefore, to convert the text into Moz code, we must first determine how to represent "success" and "tick ". An obvious choice is to use 0 to represent "tick", 1 to represent "tick", or vice versa. Unfortunately, the mos Code uses a variable-length encoding scheme. Therefore, we must also use a variable-length sequence, or use a method to package data into a general fixed bit-size format for computer memory. In addition, you must note that the mos code is case-insensitive and cannot be encoded for some special characters. In our implementation, undefined characters and symbols will be ignored.

In this project, memory usage is not a special issue. Therefore, we propose a simple encoding scheme, that is, "0" is used to represent each "tick" and "1" is used to represent each "tick ", and put them in a string join array. The PHP code that defines the mos code encoding table is as follows:

$CWCODE = array ('A'=>'01','B'=>'1000','C'=>'1010','D'=>'100','E'=>'0',  'F'=>'0010','G'=>'110','H'=>'0000','I'=>'00','J'=>'0111',  'K'=>'101','L'=>'0100','M'=>'11','N'=>'10', 'O'=>'111',  'P'=>'0110','Q'=>'1101','R'=>'010','S'=>'000','T'=>'1',  'U'=>'001','V'=>'0001','W'=>'011','X'=>'1001','Y'=>'1011',  'Z'=>'1100', '0'=>'11111','1'=>'01111','2'=>'00111',  '3'=>'00011','4'=>'00001','5'=>'00000','6'=>'10000',  '7'=>'11000','8'=>'11100','9'=>'11110','.'=>'010101',  ','=>'110011','/'=>'10010','-'=>'10001','~'=>'01010',  '?'=>'001100','@'=>'00101');

Note that if you are particularly concerned about memory usage, the above Code can be interpreted as bit ). Add a starting bit for each code to form a single-bit mode. Each character can be stored in one byte. At the same time, when parsing the final encoding, delete the bit on the left of the Start bit to get the real variable-length encoding.

Although many people do not realize that "Time Interval" is the main factor in defining moms code, understanding this is the key to generating moms code. Therefore, the first thing we need to do is to define the time interval between the internal codes (namely, the internal codes of Moz code. For the sake of convenience, we define a "beep" sound length as a time unit dt, and the interval between "tick" and "tick" is also a time unit dt; defines the length of a "tick" to 3 dt, and the interval between characters (letters) is also 3 dt; the interval between words is 7 dt. So, in summary, our time interval table is like the following:

In the mos code, the "playback speed" of the encoding sound is usually expressed by the number of words/minute (WPM. Because English words have different lengths and different numbers of characters, converting from WPM to (audio) digital sampling is not as easy as it looks. In a scheme adopted by international organizations, five characters are used as the average length of words, And a number or punctuation is treated as two characters. In this way, an average word is 50 time units (dt. In this way, if you specify WPM, the total playback time is 50 * WPM time unit/minute, each "Hour" (that is, a time unit dt) the length is equal to 1.2/WPM seconds. In this way, the length of time for a "Period" is given, and the length of time for other elements can be easily calculated.

You may have noticed that in the page shown above, we use "Farnsworth spacing" for options lower than 15WPM ".So what is this "Farnsworth spacing?

When the reporter learns to use his ears to decode moms, he will realize that when the playback speed changes, the rhythm of the characters will also change. When the playback speed is lower than 10WPM, he can easily identify "tick" and "tick" and know which character to send. However, when the playback speed exceeds 10 WPM, the operator's identification will fail, and the number of characters recognized by the operator will be greater than the actual number of "clicks" and "clicks ". When a person who is used to low-speed moms code is learning, problems may occur when handling high-speed playback code. Due to the changing pace, his subconscious recognition will go wrong.

To solve this problem, Farnsworth spacing was invented. In essence, the playback speed of letters and symbols is still higher than that of 15WPM. At the same time, more spaces are inserted between characters to reduce the overall playback speed. In this way, the reporter can recognize each character at a reasonable speed and rhythm. Once all the characters are learned, the speed can be increased, the receiver only needs to speed up character recognition. In essence, the Farnsworth spacing technique solves the rhythm change problem and enables the recipient to learn quickly.

Therefore, for lower playback speeds, the system generates a 15WPM. Correspondingly, a period is 0.08 seconds, but the interval between characters and words is no longer three dit or seven dit, instead, it is adjusted to adapt to the overall speed.

Generate sound

In PHP code, a character (that is, the index of the preceding array) represents a group of MOS sounds consisting of "lag", "tick", and blank spaces. We use digital sampling to form an audio sequence, write it into a file, and add appropriate header information to define it as a WAVE format.

The code for generating sound is actually quite simple. You can find them in the PHP file of the project. I found it quite convenient to define a "Digital oscillator. Every time an osc () is called, it returns a scheduled sample generated from the Xuan-bo. Using sound sampling and sound frequency specifications, it is sufficient to generate audio in the WAVE format. The-1 to + 1 in the produced positive and Xuan waves are moved and adjusted, so that the byte data of the sound can be expressed as 0 to 255, and 128 represents the zero amplitude.

At the same time, we need to consider another issue in generating sound. In general, we generate moms code through the switch of Zheng xuanbo. However, if you do this directly, you will find that the signal you generate will occupy a very large bandwidth. Therefore, Radio devices usually modify the bandwidth to reduce bandwidth usage.

In our project, we will also make such corrections, but only by using numbers. Now that we know the time length of the Minimum sound sample "duration", it can be proved that the minimum bandwidth occurs in the half-cycle of the positive and negative waves whose length is equal to the "duration. In fact, we use a low pass filter to filter audio signals. However, since we already know all the signal characters, we can simply filter each character signal.

The PHP code that generates "timer", "tick", and blank signals is like the following:

while ($dt < $DitTime) { $x = Osc(); if ($dt < (0.5*$DitTime)) { // Generate the rising part of a dit and dah up to half the dit-time $x = $x*sin((M_PI/2.0)*$dt/(0.5*$DitTime)); $ditstr .= chr(floor(120*$x+128)); $dahstr .= chr(floor(120*$x+128)); } else if ($dt > (0.5*$DitTime)) { // For a dah, the second part of the dit-time is constant amplitude $dahstr .= chr(floor(120*$x+128)); // For a dit, the second half decays with a sine shape $x = $x*sin((M_PI/2.0)*($DitTime-$dt)/(0.5*$DitTime)); $ditstr .= chr(floor(120*$x+128)); } else { $ditstr .= chr(floor(120*$x+128)); $dahstr .= chr(floor(120*$x+128)); } // a space has an amplitude of 0 shifted to 128 $spcstr .= chr(128); $dt += $sampleDT; }// At this point the dit sound has been generated// For another dit-time unit the dah sound has a constant amplitude$dt = 0;while ($dt < $DitTime) { $x = Osc(); $dahstr .= chr(floor(120*$x+128)); $dt += $sampleDT; }// Finally during the 3rd dit-time, the dah sound must be completed// and decay during the final half dit-time$dt = 0;while ($dt < $DitTime) { $x = Osc(); if ($dt > (0.5*$DitTime)) { $x = $x*sin((M_PI/2.0)*($DitTime-$dt)/(0.5*$DitTime)); $dahstr .= chr(floor(120*$x+128)); } else { $dahstr .= chr(floor(120*$x+128)); } $dt += $sampleDT; }

Files in WAVE format

WAVE is a common audio format. In the simplest form, a WAVE file contains an integer sequence in the header to indicate the audio amplitude at the specified sampling rate. For more information about the WAVE File, see Audio File Format Specifications website. For Moz code generation, we do not need to use all the Parameter options in the WAVE format. We only need an 8-Bit Single Channel. so easy. It should be noted that multi-byte data must adopt the byte sequence of low priority (little-endian. A wave file uses a RIFF format consisting of records called chunks.

A wave file starts with an ASCII identifier RIFF, followed by a 4-Byte "Block", followed by a header containing the ASCII character WAVE, and finally defines the format of data and sound data.

In our program, the first "Block" contains a format specifier, which consists of the ASCII character fmt and a four-Byte "Block ". Here, because I use plain vanilla PCM format, each "Block" is 16 bytes. Then, we also need the data: number of channels, sound sampling/second, average byte/second, a block alignment indicator, bit/sound sampling. In addition, because we do not need high quality stereo sound, we only use single channel. We use a sampling rate of 11050 sampling/second (the sampling rate of standard CD quality audio is 44200 sampling/second) to generate sound, it is also saved in 8 bits.

Finally, the real audio data is stored in the next "Block. It contains the ASCII character data, a 4-Byte "Block", and finally the real audio data consisting of a byte sequence (because we use 8-bit (bit)/sampling.

In the program, the sound consisting of an 8-bit audio amplitude sequence is stored in the variable $ soundstr. Once the audio data is generated, all the "Block" sizes can be calculated, and then they can be combined and written into the disk file. The following code shows how to generate the header information and the audio "Block ". Note that $ riffstr indicates the RIFF header, $ fmtstr indicates the block format, and $ soundstr indicates the audio data block ".

$riffstr = 'RIFF'.$NSizeStr.'WAVE';$x = SAMPLERATE;$SampRateStr = '';for ($i=0; $i<4; $i++) { $SampRateStr .= chr($x % 256); $x = floor($x/256); }$fmtstr = 'fmt '.chr(16).chr(0).chr(0).chr(0).chr(1).chr(0).chr(1).chr(0)   .$SampRateStr.$SampRateStr.chr(1).chr(0).chr(8).chr(0);$x = $n;$NSampStr = '';for ($i=0; $i<4; $i++) { $NSampStr .= chr($x % 256); $x = floor($x/256); }$soundstr = 'data'.$NSampStr.$soundstr;

Summary and comments

Our text Moz code generator looks good now. Of course, we can also make many modifications and improvements to it, such as using other character sets, Directly Reading text from files, and generating compressed audio. Because our project aims to make it easy to use on the network, our simple solution has already achieved our goal.

Of course, as always, I hope you will give suggestions on these simple and crude code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.