Nowadays, voice recognition is booming on PCs and smartphones. Ros is at the forefront of technology and will not miss such a handsome technology. Ros uses the code in the CMU sphinx and festival open-source projects, releases an independent speech recognition package, converts the recognized speech into text, and then enables the robot to intelligently process the speech.
I. Speech Recognition package
1. Install and install the dependency library directly by using the Ubuntu command:
$ sudo apt-get install gstreamer0.10-pocketsphinx$ sudo apt-get install ros-fuerte-audio-common$ sudo apt-get install libasound2
Then install the ROS package:
$ svn checkout http://albany-ros-pkg.googlecode.com/svn/trunk/rharmony$ rosmake pocketsphinx
The core file is the recognizer. py file in the nodes folder. This file uses a microphone to collect voice information, and then calls the speech recognition library to identify and generate text information, which is published through/recognizer/output messages, other nodes can subscribe to the message and process it accordingly.
2. After the test and installation are complete, we can run the test.
First, insert your microphone device and test whether the microphone has voice input in the system settings.
Then, run the test program in the package:
$ roslaunch pocketsphinx robocup.launch
At this point, a large segment of information is displayed in the terminal. Try to say some simple statements, of course, must be in English, such as bring me the glass, come with me, to see if it can be identified.
The test in ROS by example is still very accurate, but I feel that the recognition is quite inaccurate during the test. Maybe it is because my English is too bad. We can also directly view the final result message of ROS:
$ rostopic echo /recognizer/output
Ii. Voice Database
1. Viewing the voice database this speech recognition is an Offline Recognition Method. Some commonly used words are put into a file and used as the recognized text library, and then the speech signal is recognized in segments, finally, search for the corresponding text information in the database. To view the text information in the speech recognition database, you can use the following command:
$ roscd pocketsphinx/demo$ more robocup.corpus
2. To add a voice database, we can add other text recognition information to the voice database. Ros by example comes with a voice recognition routine, in addition, an example of adding a voice database is provided.
First, let's take a look at the text information to be added in the example:
$ roscd rbx1_speech/config$ more nav_commands.txt
This is the text to be added. You can modify some of the text and change it to what you need.
Then we need to generate voice information and library files online for this file. This step requires logging on to the website.
Decompress all downloaded files and place them in the config folder of the rbx1_speech package. We can change the names of these files:
$ roscd rbx1_speech/config$ rename -f 's/3026/nav_commands/' *
In the rbx1_speech/launch folder, check the voice_nav_commands.launch file:
<launch><node name="recognizer" pkg="pocketsphinx" type="recognizer.py"output="screen"><param name="lm" value="$(find rbx1_speech)/config/nav_commands.lm"/><param name="dict" value="$(find rbx1_speech)/config/nav_commands.dic"/></node></launch>
We can see that this launch file uses the generated Speech Recognition library and file parameters when running the recognizer. py node, so that we can use our own voice library for speech recognition.
Run the following command to test the effect:
$ roslaunch rbx1_speech voice_nav_commands.launch$ rostopic echo /recognizer/output
3. Speech Control with speech recognition, we can make a lot of sharp applications. First, let's try to use speech to control robot movements.
1. the recognizer. py mentioned earlier in the robot control node will publish the final recognized text information through the message. Then we can compile a robot control node to receive the message and perform corresponding control.
In the pocketsphist package, there is a routine voice_pai_vel.py for voice control to publish twist messages. The rbxw.speech package simplifies the modification. In the nodes folder, you can view the voice_nav.py file:
#!/usr/bin/env python""" voice_nav.py Allows controlling a mobile base using simple speech commands. Based on the voice_cmd_vel.py script by Michael Ferguson in the pocketsphinx ROS package. See http://www.ros.org/wiki/pocketsphinx"""import roslib; roslib.load_manifest('rbx1_speech')import rospyfrom geometry_msgs.msg import Twistfrom std_msgs.msg import Stringfrom math import copysignclass VoiceNav: def __init__(self): rospy.init_node('voice_nav') rospy.on_shutdown(self.cleanup) # Set a number of parameters affecting the robot's speed self.max_speed = rospy.get_param("~max_speed", 0.4) self.max_angular_speed = rospy.get_param("~max_angular_speed", 1.5) self.speed = rospy.get_param("~start_speed", 0.1) self.angular_speed = rospy.get_param("~start_angular_speed", 0.5) self.linear_increment = rospy.get_param("~linear_increment", 0.05) self.angular_increment = rospy.get_param("~angular_increment", 0.4) # We don't have to run the script very fast self.rate = rospy.get_param("~rate", 5) r = rospy.Rate(self.rate) # A flag to determine whether or not voice control is paused self.paused = False # Initialize the Twist message we will publish. self.cmd_vel = Twist() # Publish the Twist message to the cmd_vel topic self.cmd_vel_pub = rospy.Publisher('cmd_vel', Twist) # Subscribe to the /recognizer/output topic to receive voice commands. rospy.Subscriber('/recognizer/output', String, self.speech_callback) # A mapping from keywords or phrases to commands self.keywords_to_command = {'stop': ['stop', 'halt', 'abort', 'kill', 'panic', 'off', 'freeze', 'shut down', 'turn off', 'help', 'help me'], 'slower': ['slow down', 'slower'], 'faster': ['speed up', 'faster'], 'forward': ['forward', 'ahead', 'straight'], 'backward': ['back', 'backward', 'back up'], 'rotate left': ['rotate left'], 'rotate right': ['rotate right'], 'turn left': ['turn left'], 'turn right': ['turn right'], 'quarter': ['quarter speed'], 'half': ['half speed'], 'full': ['full speed'], 'pause': ['pause speech'], 'continue': ['continue speech']} rospy.loginfo("Ready to receive voice commands") # We have to keep publishing the cmd_vel message if we want the robot to keep moving. while not rospy.is_shutdown(): self.cmd_vel_pub.publish(self.cmd_vel) r.sleep() def get_command(self, data): # Attempt to match the recognized word or phrase to the # keywords_to_command dictionary and return the appropriate # command for (command, keywords) in self.keywords_to_command.iteritems(): for word in keywords: if data.find(word) > -1: return command def speech_callback(self, msg): # Get the motion command from the recognized phrase command = self.get_command(msg.data) # Log the command to the screen rospy.loginfo("Command: " + str(command)) # If the user has asked to pause/continue voice control, # set the flag accordingly if command == 'pause': self.paused = True elif command == 'continue': self.paused = False # If voice control is paused, simply return without # performing any action if self.paused: return # The list of if-then statements should be fairly # self-explanatory if command == 'forward': self.cmd_vel.linear.x = self.speed self.cmd_vel.angular.z = 0 elif command == 'rotate left': self.cmd_vel.linear.x = 0 self.cmd_vel.angular.z = self.angular_speed elif command == 'rotate right': self.cmd_vel.linear.x = 0 self.cmd_vel.angular.z = -self.angular_speed elif command == 'turn left': if self.cmd_vel.linear.x != 0: self.cmd_vel.angular.z += self.angular_increment else: self.cmd_vel.angular.z = self.angular_speed elif command == 'turn right': if self.cmd_vel.linear.x != 0: self.cmd_vel.angular.z -= self.angular_increment else: self.cmd_vel.angular.z = -self.angular_speed elif command == 'backward': self.cmd_vel.linear.x = -self.speed self.cmd_vel.angular.z = 0 elif command == 'stop': # Stop the robot! Publish a Twist message consisting of all zeros. self.cmd_vel = Twist() elif command == 'faster': self.speed += self.linear_increment self.angular_speed += self.angular_increment if self.cmd_vel.linear.x != 0: self.cmd_vel.linear.x += copysign(self.linear_increment, self.cmd_vel.linear.x) if self.cmd_vel.angular.z != 0: self.cmd_vel.angular.z += copysign(self.angular_increment, self.cmd_vel.angular.z) elif command == 'slower': self.speed -= self.linear_increment self.angular_speed -= self.angular_increment if self.cmd_vel.linear.x != 0: self.cmd_vel.linear.x -= copysign(self.linear_increment, self.cmd_vel.linear.x) if self.cmd_vel.angular.z != 0: self.cmd_vel.angular.z -= copysign(self.angular_increment, self.cmd_vel.angular.z) elif command in ['quarter', 'half', 'full']: if command == 'quarter': self.speed = copysign(self.max_speed / 4, self.speed) elif command == 'half': self.speed = copysign(self.max_speed / 2, self.speed) elif command == 'full': self.speed = copysign(self.max_speed, self.speed) if self.cmd_vel.linear.x != 0: self.cmd_vel.linear.x = copysign(self.speed, self.cmd_vel.linear.x) if self.cmd_vel.angular.z != 0: self.cmd_vel.angular.z = copysign(self.angular_speed, self.cmd_vel.angular.z) else: return self.cmd_vel.linear.x = min(self.max_speed, max(-self.max_speed, self.cmd_vel.linear.x)) self.cmd_vel.angular.z = min(self.max_angular_speed, max(-self.max_angular_speed, self.cmd_vel.angular.z)) def cleanup(self): # When shutting down be sure to stop the robot! twist = Twist() self.cmd_vel_pub.publish(twist) rospy.sleep(1)if __name__=="__main__": try: VoiceNav() rospy.spin() except rospy.ROSInterruptException: rospy.loginfo("Voice navigation terminated.")
As you can see, the Code defines the control policies when various commands are received.
2. The simulation test is the same as above. we conduct the simulation test in rviz.
First, run a robot model:
$ roslaunch rbx1_bringup fake_turtlebot.launch
Then open rviz:
$ rosrun rviz rviz -d `rospack find rbx1_nav`/sim_fuerte.vcg
If you do not like to view the output in the terminal, you can open the gui:
$ rxconsole
Then open the speech recognition node:
$ roslaunch rbx1_speech voice_nav_commands.launch
Finally, it is the control node of the Robot:
$ roslaunch rbx1_speech turtlebot_voice_nav.launch
OK. After opening the above nodes, you can start. You can use the following command:
It was my test result, but I felt that the accuracy was poor:
4. Playing voice now robots can follow what we have said. It would be better if robots can talk to us. ROS has already integrated such a package. Let's try it.
Run the following command:
$ rosrun sound_play soundplay_node.py$ rosrun sound_play say.py "Greetings Humans. Take me to your leader."
Have you heard the sound! ROS allows the robot to read the input text. The speaker is kal_diphone. If you don't like it, you can also read it by another person:
$ sudo apt-get install festvox-don$ rosrun sound_play say.py "Welcome to the future" voice_don_diphone
Haha, this is another person. Well, this is not our point.
In the rbx1_speech/nodes folder, there is a node talkback. py that allows the robot to speak:
#!/usr/bin/env python""" talkback.py - Version 0.1 2012-01-10 Use the sound_play client to say back what is heard by the pocketsphinx recognizer. Created for the Pi Robot Project: http://www.pirobot.org Copyright (c) 2012 Patrick Goebel. All rights reserved. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.5 This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details at:http://www.gnu.org/licenses/gpl.htmlPoint"""import roslib; roslib.load_manifest('rbx1_speech')import rospyfrom std_msgs.msg import Stringfrom sound_play.libsoundplay import SoundClientimport sysclass TalkBack: def __init__(self, script_path): rospy.init_node('talkback') rospy.on_shutdown(self.cleanup) # Set the default TTS voice to use self.voice = rospy.get_param("~voice", "voice_don_diphone") # Set the wave file path if used self.wavepath = rospy.get_param("~wavepath", script_path + "/../sounds") # Create the sound client object self.soundhandle = SoundClient() # Wait a moment to let the client connect to the # sound_play server rospy.sleep(1) # Make sure any lingering sound_play processes are stopped. self.soundhandle.stopAll() # Announce that we are ready for input self.soundhandle.playWave(self.wavepath + "/R2D2a.wav") rospy.sleep(1) self.soundhandle.say("Ready", self.voice) rospy.loginfo("Say one of the navigation commands...") # Subscribe to the recognizer output and set the callback function rospy.Subscriber('/recognizer/output', String, self.talkback) def talkback(self, msg): # Print the recognized words on the screen rospy.loginfo(msg.data) # Speak the recognized words in the selected voice self.soundhandle.say(msg.data, self.voice) # Uncomment to play one of the built-in sounds #rospy.sleep(2) #self.soundhandle.play(5) # Uncomment to play a wave file #rospy.sleep(2) #self.soundhandle.playWave(self.wavepath + "/R2D2a.wav") def cleanup(self): self.soundhandle.stopAll() rospy.loginfo("Shutting down talkback node...")if __name__=="__main__": try: TalkBack(sys.path[0]) rospy.spin() except rospy.ROSInterruptException: rospy.loginfo("Talkback node terminated.")
Let's take a look at the effect:
$ roslaunch rbx1_speech talkback.launch
Then I spoke again. ROS not only recognized text information, but also read it. Of course, there are no artificial intelligence algorithms that can't make machines speak intelligently. However, this is the foundation. It will be even better if you have time to study AI.
----------------------------------------------------------------
You are welcome to repost my article.
Reprinted, please note: transferred from ancient-month
Http://blog.csdn.net/hcx25909
Continue to follow my blog