Implementing a dialogue robot with Raspberry Pi
Recently, we have implemented a robot that can talk to people with Raspberry Pi, a brief introduction.
Raspberry Pi is the world's most popular micro-computer motherboard, is the lead product of open source hardware, it is designed for students ' computer programming education, only credit card size, and low price. Support for operating systems such as Linux (Debian). The most important thing is that the information is perfect and the community is active.
I use the Raspberry Pi B + version, the basic configuration is the Broadcom BCM2836 processor, 4 core 900M clock, 1G RAM.
My goal is to make a robot that talks to people, which requires robots to have input devices and output devices. The input device is a microphone, the output can be HDMI, headphones or stereo, I use the stereo here. Here is a picture of my Raspberry Pi. 4 USB ports are connected with wireless network card, wireless keyboard, microphone, audio power supply respectively.
We can divide the robot conversation into three parts: Listen, think, say.
"Listen" is to write down what people say, and convert it into words.
"Thinking" is to give different outputs based on different inputs. For example, the other person said "now time", you can answer "now is Beijing time xx point xx points."
"To say" is to convert the text into speech and play it out.
These three parts involve a lot of speech recognition, speech synthesis, artificial intelligence and other technologies, these are to spend a lot of time and effort to study, fortunately, some companies have opened the interface for customers to use. Here, I chose the API of Baidu. The implementation of these three sections is described below.
Listen
The first thing is to record what people say, and I use the Arecord tool. The command is as follows:
- Arecord-d "Plughw:1"-F s16_le-r 16000 Test.wav
Where the-d parameter is followed by the recording device, after the microphone is connected, there are 2 devices on the Raspberry Pi: internal devices and external USB devices, and the plughw:1 represents the use of external devices. -F indicates the recording format,-R indicates the sound sampling frequency. Because the following mentioned Baidu speech recognition on the audio file format is required, we need to record into a format that meets the requirements. Also, I don't specify a recording time here, it will keep recording until the user presses CTRL-C. The recorded audio file is saved as test.wav.
Next, we want to convert the audio into text, namely speech recognition (ASR), Baidu's voice open platform provides free services and supports rest API
See document: HTTP://YUYIN.BAIDU.COM/DOCS/ASR/57
The basic process is to obtain tokens, the need to identify the voice information, voice data, tokens, etc. sent to the voice recognition server Baidu, you can obtain the corresponding text. Because the server supports rest APIs, we can implement the client code in any language, using Python
- # Coding:utf-8
- Import Urllib.request
- Import JSON
- Import Base64
- Import Sys
- Def get_access_token ():
- url = "Https://openapi.baidu.com/oauth/2.0/token"
- Grant_type = "Client_credentials"
- client_id = "Xxxxxxxxxxxxxxxxxx"
- Client_secret = "Xxxxxxxxxxxxxxxxxxxxxx"
- url = url + "?" + "grant_type=" + Grant_type + "&" + "client_id=" + client_id + "&" + "client_secret=" + client_se Cret
- resp = Urllib.request.urlopen (URL). Read ()
- data = Json.loads (Resp.decode ("Utf-8"))
- return data["Access_token"]
- def BAIDU_ASR (data, ID, token):
- Speech_data = Base64.b64encode (data). Decode ("Utf-8")
- Speech_length = Len (data)
- Post_data = {
- "Format": "WAV",
- "Rate": 16000,
- "Channel": 1,
- "CUiD": ID,
- "Token": token,
- "Speech": Speech_data,
- "Len": Speech_length
- }
- url = "Http://vop.baidu.com/server_api"
- Json_data = Json.dumps (post_data). Encode ("Utf-8")
- Json_length = Len (json_data)
- #print (Json_data)
- req = urllib.request.Request (url, data = Json_data)
- Req.add_header ("Content-type", "Application/json")
- Req.add_header ("Content-length", Json_length)
- Print ("ASR start request\n")
- RESP = Urllib.request.urlopen (req)
- Print ("ASR Finish request\n")
- RESP = Resp.read ()
- Resp_data = Json.loads (Resp.decode ("Utf-8"))
- If resp_data["err_no"] = = 0:
- return resp_data["Result"]
- Else
- Print (Resp_data)
- Return None
- def asr_main (filename):
- f = open (filename, "RB")
- Audio_data = F.read ()
- F.close ()
- #token = Get_access_token ()
- token = "Xxxxxxxxxxxxxxxxxx"
- UUID = "xxxx"
- RESP = Baidu_asr (Audio_data, uuid, token)
- Print (Resp[0])
- return resp[0]
Thinking
Here I use the Baidu API store Turing device person. See the documentation for: http://apistore.baidu.com/apiworks/servicedetail/736.html
It's very simple to use, no longer repeat here, the code is as follows:
- Import Urllib.request
- Import Sys
- Import JSON
- def robot_main (words):
- url = "Http://apis.baidu.com/turing/turing/turing?"
- Key = "879a6cb3afb84dbf4fc84a1df2ab7319"
- UserID = "1000"
- Words = urllib.parse.quote (words)
- url = url + "key=" + key + "&info=" + words + "&userid=" + userid
- req = urllib.request.Request (URL)
- Req.add_header ("Apikey", "xxxxxxxxxxxxxxxxxxxxxxxxxx")
- Print ("Robot start Request")
- RESP = Urllib.request.urlopen (req)
- Print ("Robot Stop Request")
- Content = Resp.read ()
- If content:
- data = Json.loads (Content.decode ("Utf-8"))
- Print (data["text"])
- return data["text"]
- Else
- Return None
Said
First, you need to convert the text into speech, which is speech synthesis (TTS). Then play the sound out.
Baidu's voice open platform provides a TTS interface, and can be configured for male and female voice, intonation, speed, volume. The server returns audio data in MP3 format. We write the data in a binary way into the file.
See http://yuyin.baidu.com/docs/tts/136
The code is as follows:
- # Coding:utf-8
- Import Urllib.request
- Import JSON
- Import Sys
- def baidu_tts_by_post (data, ID, token):
- Post_data = {
- "Tex": Data,
- "LAN": "ZH",
- "CTP": 1,
- "CUiD": ID,
- "Tok": Token,
- }
- url = "Http://tsn.baidu.com/text2audio"
- Post_data = Urllib.parse.urlencode (post_data). Encode (' Utf-8 ')
- #print (Post_data)
- req = urllib.request.Request (url, data = Post_data)
- Print ("TTS start Request")
- RESP = Urllib.request.urlopen (req)
- Print ("TTS finish Request")
- RESP = Resp.read ()
- Return RESP
- def tts_main (filename, words):
- token = "Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
- Text = Urllib.parse.quote (words)
- UUID = "xxxx"
- RESP = baidu_tts_by_post (text, uuid, token)
- f = open ("Test.mp3", "WB")
- F.write (RESP)
- F.close ()
After you get the audio file, you can play it using the mpg123 player.
- mpg123 Test.mp3
Integration
Finally, put together these three parts.
You can first integrate Python-related code into main.py, as follows:
- Import ASR
- Import TTS
- Import Robot
- Words = Asr.asr_main ("Test.wav")
- New_words = Robot.robot_main (words)
Then use the script to invoke the relevant tool:
- #! /bin/bash
- Arecord-d "Plughw:1"-F s16_le-r 16000 Test.wav
- Python3 main.py
- mpg123 Test.mp3
Okay, now you can talk to the robot. Run the script, say a word to the microphone, then press CTRL-C, and the robot will return your words.
http://www.bkjia.com/PHPjc/1108027.html www.bkjia.com true http://www.bkjia.com/PHPjc/1108027.html techarticle using Raspberry Pi to implement a dialogue robot recently implemented a robot that can talk to people with a Raspberry Pi, a brief introduction. Raspberry Pi is the most popular miniature computer in the world Raspberry ...