Use Raspberry Pi to implement chatbots
I recently used Raspberry Pi to implement a robot that can talk to people. I will give a brief introduction.
Raspberry Pi is the world's most popular micro-computer motherboard. It is a leading product of open-source hardware. It is designed for students in computer programming education, with only credit card sizes and low prices. Supports linux (debian) and other operating systems. The most important thing is comprehensive information and active community.
I use Raspberry Pi B +. The basic configuration is the BCM2836 processor, 4-core 900M clock speed, and 1G RAM.
My goal is to create a robot that talks to people. This requires robots to have input devices and output devices. The input device is a microphone, and the output can be HDMI, earphone, or audio. I use audio here. Below are my Raspberry Pi photos. The four USB interfaces are connected to wireless NICs, wireless keyboards, microphones, and audio power supplies.
We can divide robot conversations into three parts: listening, thinking, and speaking.
"Listen" is to record what people say and convert it into text.
"Thinking" means giving different outputs based on different inputs. For example, if the other party says "the current time", you can answer "the current time is Beijing Time ".
"Speaking" refers to converting text into speech and playing it out.
These three parts involve a large number of speech recognition, speech synthesis, artificial intelligence and other technologies, which require a lot of time and effort to study. Fortunately, some companies have opened interfaces for customers. Here, I chose Baidu's API. The following describes the implementation of these three parts.
"Listen"
First, I recorded what people said. I used the arecord tool. The command is as follows:
- Arecord-D "plughw: 1"-f S16_LE-r 16000 test.wav
The-D parameter is followed by the recording device. After the microphone is connected, there are two devices on the Raspberry Pi: the internal device and the external usb device. plughw: 1 indicates that the external device is used. -F indicates the recording format, and-r indicates the audio sampling frequency. As the Baidu speech recognition mentioned later requires the audio file format, we need to record it into a compliant format. In addition, I didn't specify the recording time here. It keeps recording until the user presses ctrl-c. The audio files behind the recording are saved as test.wav.
Next, we will convert audio into text, that is, asr. Baidu's open speech platform provides free services and supports REST APIs.
See http://yuyin.baidu.com/docs/asr/57 for documentation
The process is basically to get the token, and send the speech information, voice data, and token to Baidu's Speech Recognition server to get the corresponding text. Because the server supports REST APIs, we can use any language to implement client code. Here we use python
- # coding: utf-8
- import urllib.request
- import json
- import base64
- import sys
- def get_access_token():
- url = "https://openapi.baidu.com/oauth/2.0/token"
- grant_type = "client_credentials"
- client_id = "xxxxxxxxxxxxxxxxxx"
- client_secret = "xxxxxxxxxxxxxxxxxxxxxx"
- url = url + "?" + "grant_type=" + grant_type + "&" + "client_id=" + client_id + "&" + "client_secret=" + client_secret
- resp = urllib.request.urlopen(url).read()
- data = json.loads(resp.decode("utf-8"))
- return data["access_token"]
- def baidu_asr(data, id, token):
- speech_data = base64.b64encode(data).decode("utf-8")
- speech_length = len(data)
- post_data = {
- "format" : "wav",
- "rate" : 16000,
- "channel" : 1,
- "cuid" : id,
- "token" : token,
- "speech" : speech_data,
- "len" : speech_length
- }
- url = "http://vop.baidu.com/server_api"
- json_data = json.dumps(post_data).encode("utf-8")
- json_length = len(json_data)
- #print(json_data)
- req = urllib.request.Request(url, data = json_data)
- req.add_header("Content-Type", "application/json")
- req.add_header("Content-Length", json_length)
- print("asr start request\n")
- resp = urllib.request.urlopen(req)
- print("asr finish request\n")
- resp = resp.read()
- resp_data = json.loads(resp.decode("utf-8"))
- if resp_data["err_no"] == 0:
- return resp_data["result"]
- else:
- print(resp_data)
- return None
- def asr_main(filename):
- f = open(filename, "rb")
- audio_data = f.read()
- f.close()
- #token = get_access_token()
- token = "xxxxxxxxxxxxxxxxxx"
- uuid = "xxxx"
- resp = baidu_asr(audio_data, uuid, token)
- print(resp[0])
- return resp[0]
"Thinking"
Here I use the Turing robot of Baidu api store. See the document: http://apistore.baidu.com/apiworks/servicedetail/736.html
It is very simple to use and will not be described here. The Code is as follows:
- import urllib.request
- import sys
- import json
- def robot_main(words):
- url = "http://apis.baidu.com/turing/turing/turing?"
- key = "879a6cb3afb84dbf4fc84a1df2ab7319"
- userid = "1000"
- words = urllib.parse.quote(words)
- url = url + "key=" + key + "&info=" + words + "&userid=" + userid
- req = urllib.request.Request(url)
- req.add_header("apikey", "xxxxxxxxxxxxxxxxxxxxxxxxxx")
- print("robot start request")
- resp = urllib.request.urlopen(req)
- print("robot stop request")
- content = resp.read()
- if content:
- data = json.loads(content.decode("utf-8"))
- print(data["text"])
- return data["text"]
- else:
- return None
"Say"
First, you need to convert the text into speech, that is, speech synthesis (tts ). Then play the sound.
The open speech platform of Baidu provides the tts interface and can be used to configure the voice, tone, speed, and volume of men and women. The server returns audio data in mp3 format. We write data into the file in binary mode.
See http://yuyin.baidu.com/docs/tts/136
The Code is as follows:
- # coding: utf-8
- import urllib.request
- import json
- import sys
- def baidu_tts_by_post(data, id, token):
- post_data = {
- "tex" : data,
- "lan" : "zh",
- "ctp" : 1,
- "cuid" : id,
- "tok" : token,
- }
- url = "http://tsn.baidu.com/text2audio"
- post_data = urllib.parse.urlencode(post_data).encode('utf-8')
- #print(post_data)
- req = urllib.request.Request(url, data = post_data)
- print("tts start request")
- resp = urllib.request.urlopen(req)
- print("tts finish request")
- resp = resp.read()
- return resp
- def tts_main(filename, words):
- token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
- text = urllib.parse.quote(words)
- uuid = "xxxx"
- resp = baidu_tts_by_post(text, uuid, token)
- f = open("test.mp3", "wb")
- f.write(resp)
- f.close()
After obtaining the audio file, you can use the mpg123 player to play the video.
- Mpg123 test.pdf
Integration
Finally, combine the three parts.
You can first integrate python-related code into main. py, as shown below:
- import asr
- import tts
- import robot
- words = asr.asr_main("test.wav")
- new_words = robot.robot_main(words)
- tts.tts_main("test.mp3", new_words)
Use the script to call related tools:
- #! /Bin/bash
- Arecord-D "plughw: 1"-f S16_LE-r 16000 test.wav
- Python3 main. py
- Mpg123 test.pdf
Now you can talk to the robot. Run the script, say something to the microphone, and press ctrl-c. Then the robot will reply to you.