A small summary of speech recognition under Linux _

A small summary of speech recognition under Linux __linux

Last Update:2018-08-02 Source: Internet

Author: User

Tags fread

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have previously sent my snake code, I would like to turn it into speech control up and down, so select the Hkust Flying SDK, there are some documents, but there are some details will let in the development of Linux under the children to create confusion such as I, now summed up here ~

First download the sdk-linux version of Hkust, need to register first ... Under the downloaded include folder, there are four files: msp_errors.h msp_types.h qisr.h qtts.h. The first two are generic data structures, the rest of the qisr.h is the header file for speech recognition, Qtts.h is the header file for speech synthesis, because I need speech recognition function, as long as I include the qisr.h header file in my code OK. In the Bin folder is more messy, but mainly libmsc.so and libspeex.so two dynamic libraries, I directly to the two dynamic library into the/usr/lib inside.

Notice a asr_keywords_utf8.txt file in the Bin folder, the SDK's idea is this: write the text you want to identify to Asr_keywords_ Utf8.txt, and then upload to the server on the fly, and then return to a Grammarid, said to upload a "lifelong effective", meaning is not to repeat the use of the server space upload, anyway, after the Grammarid in different programs to identify the same text is directly used well, For example, I would like to identify "left, right, up, down, library, alone", write these characters into the asr_keywords_utf8.txt, and must be in the Utf-8 format, of course, under Linux by default. Here is the code I wrote to upload this txt and get Grammarid:

#include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <
	qisr.h> #define TRUE 1 #define FALSE 0 int main () {int ret = Qisrinit ("appid=xxxxxxx");
		if (ret!= msp_success) {printf ("Qisrinit with errorcode:%d \ n", ret);
	return 0;
	} Char grammarid[128];
	memset (grammarid, 0, sizeof (Grammarid));
	const int max_keyword_len = 4096;
	ret = msp_success;
	
	const char * SessionID = NULL;
	SessionID = Qisrsessionbegin (NULL, "Ssm=1,sub=asr", &ret);
		if (ret!= msp_success) {printf ("Qisrsessionbegin with errorcode:%d \ n", ret);
	return ret;
	} Char Userdata[max_keyword_len];
	memset (UserData, 0, Max_keyword_len);
	file* fp = fopen ("Asr_keywords_utf8.txt", "RB");
		if (fp = = NULL) {printf ("keyword file cannot open\n");
	return-1;
	unsigned int len = (unsigned int) fread (UserData, 1, Max_keyword_len, FP);
	Userdata[len] = 0;
	Fclose (FP); Const char* TestID = Qisruploaddata (SessionID, "contact", UserData, Len, "Dtt=keylist", &ret);
		if (ret!= msp_success) {printf ("Qisruploaddata with errorcode:%d \ n", ret);
	return ret;
	} memcpy ((void*) Grammarid, TestID, strlen (TestID));

	printf ("Grammarid: \"%s\ "\ n", Grammarid);
	Qisrsessionend (SessionID, "normal");
return 0;
 }

So Grammarid will be output to the terminal, write it down. Then you record the audio file you want to identify, originally do not know the requirements, directly with Ubuntu tape recorder recorded a paragraph, and found that the total can not identify, in the BBS asked the question to understand, the voice of voice on the requirements of the following: sampling rate of 16K or 8KHz, sampling bit is 16, mono, Format is PCM or WAV. The recording software is the default 32-bit sampling, can only use ffmpeg or write their own code to record, FFmpeg command is as follows:

Ffmpeg-f alsa-i hw:0-ar 16000-ac 1 lib.wav

I recorded the "library" audio in Mandarin for 2 seconds, and the following code was identified:

#include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <

qisr.h> #define TRUE 1 #define FALSE 0 int run_asr (const char* asrfile);
const int buffer_num = 4096;

const int max_keyword_len = 4096;
	int main (int argc, char* argv[]) {int ret = msp_success;
	Const char* Asrfile = "Lib.wav";
	ret = Qisrinit ("appid=xxxxxx");
		if (ret!= msp_success) {printf ("Qisrinit with errorcode:%d \ n", ret);
	return 0;
	ret = RUN_ASR (asrfile);
	Qisrfini ();
	Char key = GetChar ();
return 0;
	int Run_asr (const char* asrfile) {int ret = msp_success;
	int i = 0;
	file* fp = NULL;
	Char Buff[buffer_num];
	unsigned int len;

	int status = msp_audio_sample_continue, Ep_status =-1, Rec_status =-1, rslt_status =-1;
	Const char* grammarid= "E7EB1A443EE143D5E7AC52CB794810FE";
	const char *grammarid= "C66D4EECD37D4FE1C8274A2224B832D5"; Const char* param = "rst=json,sub=asr,ssm=1,aue=speex,auf=audio/l16;rate=16000";/Note SUB=ASR Const CHar* sess_id = Qisrsessionbegin (Grammarid, param, &ret);	
		if (msp_success!= ret) {printf ("Qisrsessionbegin err%d\n", ret);
	return ret;
	fp = fopen (Asrfile, "RB");
		if (NULL = fp) {printf ("Failed to open file,please check the file.\n");
		Qisrsessionend (sess_id, "normal");
	return-1;
	printf ("Writing audio...\n");
	int count=0;
		while (!feof (FP)) {len = (unsigned int) fread (buff, 1, buffer_num, FP); feof (FP)?
		Status = Msp_audio_sample_last:status = Msp_audio_sample_continue;
		if (status==msp_audio_sample_last) printf ("msp_audio_sample_last\n");
		if (status==msp_audio_sample_continue) printf ("msp_audio_sample_continue\n");
		ret = Qisraudiowrite (sess_id, Buff, Len, status, &ep_status, &rec_status);
			if (ret!= msp_success) {printf ("\nqisraudiowrite err%d\n", ret);
		Break
		}//printf ("%d\n", count++); if (Rec_status = = msp_rec_status_success) {Const char* result = Qisrgetresult (sess_ID, &rslt_status, 0, &ret);
				if (ret!= msp_success) {printf ("Error code:%d\n", ret);
			Break
			else if (rslt_status = = Msp_rec_status_no_match) printf ("Get result nomatch\n"); else {if (Result!= NULL) printf (' Get result[%d/%d]:len:%d\n%s\n ', ret, Rslt_status,strlen (result), result
			);
	} printf (".");

	printf ("\ n");
		if (ret = = msp_success) {printf ("Get reuslt~~~~~~~\n");
		Char asr_result[1024] = "";
		unsigned int pos_of_result = 0;
		int loop_count = 0;
			Do {Const char* result = Qisrgetresult (sess_id, &rslt_status, 0, &ret);
				if (ret!= 0) {printf ("Qisrgetresult err%d\n", ret);
			Break
			} if (rslt_status = = Msp_rec_status_no_match) {printf ("Get result nomatch\n");
				else if (result!= NULL) {//File*f=fopen ("Data.txt", "WB");
				printf ("~~~%d\n", strlen (result));
				Fwrite (Result,1,strlen (Result), f);
				Fclose (f); printf ("[%d]:get result[%d/%d]:%s\n ", (Loop_count), ret, rslt_status, result);
				strcpy (Asr_result+pos_of_result,result);
			Pos_of_result + = (unsigned int) strlen (result);
			else {printf ("[%d]:get result[%d/%d]\n", (Loop_count), ret, rslt_status);
		} usleep (500000);
		while (Rslt_status!= msp_rec_status_complete && loop_count++ < 30);
		if (strcmp (Asr_result, "") ==0) {printf ("no result\n");
	} qisrsessionend (sess_id, NULL);
	printf ("qisrsessionend.\n"); 

	Fclose (FP);
return 0;
 }

The output results are as follows:

kl@kl-latitude:~/xunfeisdk$./a.out 
Writing audio ...
Msp_audio_sample_continue
0
. Msp_audio_sample_continue
1
. Msp_audio_sample_continue
2
. Msp_audio_sample_continue
3
. Msp_audio_sample_continue
4
. Msp_audio_sample_continue
5
. Msp_audio_sample_continue
6
. Msp_audio_sample_continue
7
. Msp_audio_sample_continue
8
. Msp_audio_sample_continue
9
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_continue
. Msp_audio_sample_last
.
Get reuslt~~~~~~~
[0]:get RESULT[0/2]
~~~123
[1]:get RESULT[0/5]: {"SN": 1, "ls": true, "BG": 0, "Ed": 0, " WS ": [{" BG ": 0," CW ": [{" SC ":" "", "GM": "0", "W": "Library", "MN": [{"Contact": "Library"}]}]}
qisrsessionend.

This output format is a pit, because the official example default is the direct output recognition results, but the result is GB2312 format, in the Linux terminal is garbled, and later to understand, When the Qisrsessionbegin () function is initialized, the second parameter in the Param is changed to JSON, that is, after all the results are lost in the JSON format, it is the UTF8 format of the Chinese characters, and then the JSON module to solve the ~ The whole code is very clear, To put it simply:

1. Call the Qisrinit () function first, parameters are their own appid, each SDK is registered to download, so is the only, used to differentiate users, different levels of users can use the SDK number of times a day limit, after all, people use more speech recognition performance will certainly decline;

2. After that is the Grammarid, input and output parameters param and call status return value RET as a parameter in the Qisrsessionbegin () function initialization, the return value is SessionID, which is one of the main parameters of all subsequent functions;

3. Open your own audio file, call Qisraudiowrite () function write, can be segmented or once, the first parameter is SessionID, the above initialization function returned the value, the second parameter is the audio data head pointer, the third parameter is the audio file size, The fourth parameter is the state of the audio sent, indicating that no, the remaining two is the server-side detection of voice state and recognition status of the return value;

4. Call the Qisrgetresult () function to get the result of the recognition, the first argument is SessionID, the second parameter is the state of the output recognition, the third parameter is the interval between interacting with the server, official recommendation 5000, I take 0, and the fourth parameter is the call state return value RET, Finally, the return value of this function is the JSON data of the above result;

5. The final clean-up work, after all, is C language, not java~ haha

Reprint Please specify: Transfer from http://blog.csdn.net/littlethunder/article/details/17047663

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More