ROS探索總結（十）—

ROS探索總結（十）——語音控制

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

如今語音辨識在PC機和智能手機上炒的火熱，ROS走在技術的最前沿當然也不會錯過這麼帥的技術。ROS中使用了CMU Sphinx和Festival開源項目中的代碼，發布了獨立的語音辨識包，而且可以將識別出來的語音轉換成文字，然後讓機器人智能處理後說話。

一、語音辨識包
1、安裝安裝很簡單，直接使用ubuntu命令即可，首先安裝依賴庫：

$ sudo apt-get install gstreamer0.10-pocketsphinx$ sudo apt-get install ros-fuerte-audio-common$ sudo apt-get install libasound2

然後來安裝ROS包：

$ svn checkout http://albany-ros-pkg.googlecode.com/svn/trunk/rharmony$ rosmake pocketsphinx

其中的核心檔案就是nodes檔案夾下的recognizer.py檔案了。這個檔案通過麥克風收集語音資訊，然後調用語音辨識庫進行識別產生文本資訊，通過/recognizer/output訊息發布，其他節點就可以訂閱該訊息然後進行相應的處理了。
2、測試安裝完成後我們就可以運行測試了。
首先，插入你的麥克風裝置，然後在系統設定裡測試麥克風是否有語音輸入。
然後，運行包中的測試程式：

$ roslaunch pocketsphinx robocup.launch

此時，在終端中會看到一大段的資訊。嘗試說一些簡單的語句，當然，必須是英語，例如：bring me the glass，come with me，看看能不能識別出來。
《ros by example》這本書中寫得測試還是很準確的，但是我在測試中感覺識別相當不準確，可能是我的英語太差了吧。我們也可以直接看ROS最後發布的結果訊息：

$ rostopic echo /recognizer/output

二、語音庫
1、查看語音庫這個語音辨識時一種離線識別的方法，將一些常用的詞彙放到一個檔案中，作為識別的文本庫，然後分段識別語音訊號，最後在庫中搜尋對應的文本資訊。如果想看語音辨識庫中有哪些文本資訊，可以通過下面的指令進行查詢：

$ roscd pocketsphinx/demo$ more robocup.corpus

2、添加語音庫我們可以自己向語音庫中添加其他的文本識別資訊，《ros by example》內建的常式中是帶有語音辨識的常式的，而且有添加語音庫的例子。
首先看看例子中要添加的文本資訊：

$ roscd rbx1_speech/config$ more nav_commands.txt

這就是需要添加的文本，我們也可以修改其中的某些文本，改成自己需要的。
然後我們要把這個檔案線上產生語音資訊和庫檔案，這一步需要登陸網站http://www.speech.cs.cmu.edu/tools/lmtool-new.html，根據網站的提示上傳檔案，然後線上編譯產生庫檔案。
把下載的檔案都解壓放在rbx1_speech包的config檔案夾下。我們可以給這些檔案改個名字：

$ roscd rbx1_speech/config$ rename -f 's/3026/nav_commands/' *

在rbx1_speech/launch檔案夾下看看voice_nav_commands.launch這個檔案：

<launch><node name="recognizer" pkg="pocketsphinx" type="recognizer.py"output="screen"><param name="lm" value="$(find rbx1_speech)/config/nav_commands.lm"/><param name="dict" value="$(find rbx1_speech)/config/nav_commands.dic"/></node></launch>

可以看到，這個launch檔案在運行recognizer.py節點的時候使用了我們產生的語音辨識庫和檔案參數，這樣就可以實用我們自己的語音庫來進行語音辨識了。
通過之前的命令來測試一下效果如何吧：

$ roslaunch rbx1_speech voice_nav_commands.launch$ rostopic echo /recognizer/output

三、語音控制有了語音辨識，我們就可以來做很多犀利的應用了，首先先來嘗試一下用語音來控制機器人動作。
1、機器人控制節點前面說到的recognizer.py會將最後識別的文本資訊通過訊息發布，那麼我們來編寫一個機器人控制節點接收這個訊息，進行相應的控制即可。
在pocketsphinx包中本身有一個語音控制發布Twist訊息的常式voice_cmd_vel.py，rbx1_speech包對其進行了一些簡化修改，在nodes檔案夾裡可以查看voice_nav.py檔案：

#!/usr/bin/env python"""  voice_nav.py    Allows controlling a mobile base using simple speech commands.    Based on the voice_cmd_vel.py script by Michael Ferguson in  the pocketsphinx ROS package.    See http://www.ros.org/wiki/pocketsphinx"""import roslib; roslib.load_manifest('rbx1_speech')import rospyfrom geometry_msgs.msg import Twistfrom std_msgs.msg import Stringfrom math import copysignclass VoiceNav:    def __init__(self):        rospy.init_node('voice_nav')                rospy.on_shutdown(self.cleanup)                # Set a number of parameters affecting the robot's speed        self.max_speed = rospy.get_param("~max_speed", 0.4)        self.max_angular_speed = rospy.get_param("~max_angular_speed", 1.5)        self.speed = rospy.get_param("~start_speed", 0.1)        self.angular_speed = rospy.get_param("~start_angular_speed", 0.5)        self.linear_increment = rospy.get_param("~linear_increment", 0.05)        self.angular_increment = rospy.get_param("~angular_increment", 0.4)                # We don't have to run the script very fast        self.rate = rospy.get_param("~rate", 5)        r = rospy.Rate(self.rate)                # A flag to determine whether or not voice control is paused        self.paused = False                # Initialize the Twist message we will publish.        self.cmd_vel = Twist()        # Publish the Twist message to the cmd_vel topic        self.cmd_vel_pub = rospy.Publisher('cmd_vel', Twist)                # Subscribe to the /recognizer/output topic to receive voice commands.        rospy.Subscriber('/recognizer/output', String, self.speech_callback)                # A mapping from keywords or phrases to commands        self.keywords_to_command = {'stop': ['stop', 'halt', 'abort', 'kill', 'panic', 'off', 'freeze', 'shut down', 'turn off', 'help', 'help me'],                                    'slower': ['slow down', 'slower'],                                    'faster': ['speed up', 'faster'],                                    'forward': ['forward', 'ahead', 'straight'],                                    'backward': ['back', 'backward', 'back up'],                                    'rotate left': ['rotate left'],                                    'rotate right': ['rotate right'],                                    'turn left': ['turn left'],                                    'turn right': ['turn right'],                                    'quarter': ['quarter speed'],                                    'half': ['half speed'],                                    'full': ['full speed'],                                    'pause': ['pause speech'],                                    'continue': ['continue speech']}                rospy.loginfo("Ready to receive voice commands")                # We have to keep publishing the cmd_vel message if we want the robot to keep moving.        while not rospy.is_shutdown():            self.cmd_vel_pub.publish(self.cmd_vel)            r.sleep()                                       def get_command(self, data):        # Attempt to match the recognized word or phrase to the         # keywords_to_command dictionary and return the appropriate        # command        for (command, keywords) in self.keywords_to_command.iteritems():            for word in keywords:                if data.find(word) > -1:                    return command            def speech_callback(self, msg):        # Get the motion command from the recognized phrase        command = self.get_command(msg.data)                # Log the command to the screen        rospy.loginfo("Command: " + str(command))                # If the user has asked to pause/continue voice control,        # set the flag accordingly         if command == 'pause':            self.paused = True        elif command == 'continue':            self.paused = False                # If voice control is paused, simply return without        # performing any action        if self.paused:            return                       # The list of if-then statements should be fairly        # self-explanatory        if command == 'forward':                self.cmd_vel.linear.x = self.speed            self.cmd_vel.angular.z = 0                    elif command == 'rotate left':            self.cmd_vel.linear.x = 0            self.cmd_vel.angular.z = self.angular_speed                        elif command == 'rotate right':              self.cmd_vel.linear.x = 0                  self.cmd_vel.angular.z = -self.angular_speed                    elif command == 'turn left':            if self.cmd_vel.linear.x != 0:                self.cmd_vel.angular.z += self.angular_increment            else:                        self.cmd_vel.angular.z = self.angular_speed                        elif command == 'turn right':                if self.cmd_vel.linear.x != 0:                self.cmd_vel.angular.z -= self.angular_increment            else:                        self.cmd_vel.angular.z = -self.angular_speed                        elif command == 'backward':            self.cmd_vel.linear.x = -self.speed            self.cmd_vel.angular.z = 0                    elif command == 'stop':             # Stop the robot!  Publish a Twist message consisting of all zeros.                     self.cmd_vel = Twist()                elif command == 'faster':            self.speed += self.linear_increment            self.angular_speed += self.angular_increment            if self.cmd_vel.linear.x != 0:                self.cmd_vel.linear.x += copysign(self.linear_increment, self.cmd_vel.linear.x)            if self.cmd_vel.angular.z != 0:                self.cmd_vel.angular.z += copysign(self.angular_increment, self.cmd_vel.angular.z)                    elif command == 'slower':            self.speed -= self.linear_increment            self.angular_speed -= self.angular_increment            if self.cmd_vel.linear.x != 0:                self.cmd_vel.linear.x -= copysign(self.linear_increment, self.cmd_vel.linear.x)            if self.cmd_vel.angular.z != 0:                self.cmd_vel.angular.z -= copysign(self.angular_increment, self.cmd_vel.angular.z)                        elif command in ['quarter', 'half', 'full']:            if command == 'quarter':                self.speed = copysign(self.max_speed / 4, self.speed)                    elif command == 'half':                self.speed = copysign(self.max_speed / 2, self.speed)                        elif command == 'full':                self.speed = copysign(self.max_speed, self.speed)                        if self.cmd_vel.linear.x != 0:                self.cmd_vel.linear.x = copysign(self.speed, self.cmd_vel.linear.x)            if self.cmd_vel.angular.z != 0:                self.cmd_vel.angular.z = copysign(self.angular_speed, self.cmd_vel.angular.z)                        else:            return        self.cmd_vel.linear.x = min(self.max_speed, max(-self.max_speed, self.cmd_vel.linear.x))        self.cmd_vel.angular.z = min(self.max_angular_speed, max(-self.max_angular_speed, self.cmd_vel.angular.z))    def cleanup(self):        # When shutting down be sure to stop the robot!        twist = Twist()        self.cmd_vel_pub.publish(twist)        rospy.sleep(1)if __name__=="__main__":    try:        VoiceNav()        rospy.spin()    except rospy.ROSInterruptException:        rospy.loginfo("Voice navigation terminated.")

可以看到，代碼中定義了接收到各種命令時的控制策略。
2、模擬測試和前面一樣，我們在rviz中進行模擬測試。
首先是運行一個機器人模型：

$ roslaunch rbx1_bringup fake_turtlebot.launch

然後開啟rviz：

$ rosrun rviz rviz -d `rospack find rbx1_nav`/sim_fuerte.vcg

如果不喜歡在終端裡看輸出，可以開啟gui介面：

$ rxconsole

再開啟語音辨識的節點：

$ roslaunch rbx1_speech voice_nav_commands.launch

最後就是機器人的控制節點了：

$ roslaunch rbx1_speech turtlebot_voice_nav.launch

OK，開啟上面這一堆的節點之後，就可以開始了。可以使用的命令如下：

是我的測試結果，不過感覺準確度還是欠佳：

四、播放語音現在機器人已經可以按照我們說的話行動了，要是機器人可以和我們對話就更好了。ROS中已經整合了這樣的包，下面就來嘗試一下。
運行下面的命令：

$ rosrun sound_play soundplay_node.py$ rosrun sound_play say.py "Greetings Humans. Take me to your leader."

有沒有聽見聲音！ROS通過識別我們輸入的文本，讓機器人讀了出來。發出這個聲音的人叫做kal_diphone，如果不喜歡，我們也可以換一個人來讀：

$ sudo apt-get install festvox-don$ rosrun sound_play say.py "Welcome to the future" voice_don_diphone

哈哈，這回換了一個人吧，好吧，這不也是我們的重點。
在rbx1_speech/nodes檔案夾中有一個讓機器人說話的節點talkback.py:

#!/usr/bin/env python"""    talkback.py - Version 0.1 2012-01-10        Use the sound_play client to say back what is heard by the pocketsphinx recognizer.        Created for the Pi Robot Project: http://www.pirobot.org    Copyright (c) 2012 Patrick Goebel.  All rights reserved.    This program is free software; you can redistribute it and/or modify    it under the terms of the GNU General Public License as published by    the Free Software Foundation; either version 2 of the License, or    (at your option) any later version.5        This program is distributed in the hope that it will be useful,    but WITHOUT ANY WARRANTY; without even the implied warranty of    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the    GNU General Public License for more details at:http://www.gnu.org/licenses/gpl.htmlPoint"""import roslib; roslib.load_manifest('rbx1_speech')import rospyfrom std_msgs.msg import Stringfrom sound_play.libsoundplay import SoundClientimport sysclass TalkBack:    def __init__(self, script_path):        rospy.init_node('talkback')        rospy.on_shutdown(self.cleanup)                # Set the default TTS voice to use        self.voice = rospy.get_param("~voice", "voice_don_diphone")                # Set the wave file path if used        self.wavepath = rospy.get_param("~wavepath", script_path + "/../sounds")                # Create the sound client object        self.soundhandle = SoundClient()                # Wait a moment to let the client connect to the        # sound_play server        rospy.sleep(1)                # Make sure any lingering sound_play processes are stopped.        self.soundhandle.stopAll()                # Announce that we are ready for input        self.soundhandle.playWave(self.wavepath + "/R2D2a.wav")        rospy.sleep(1)        self.soundhandle.say("Ready", self.voice)                rospy.loginfo("Say one of the navigation commands...")        # Subscribe to the recognizer output and set the callback function        rospy.Subscriber('/recognizer/output', String, self.talkback)            def talkback(self, msg):        # Print the recognized words on the screen        rospy.loginfo(msg.data)                # Speak the recognized words in the selected voice        self.soundhandle.say(msg.data, self.voice)                # Uncomment to play one of the built-in sounds        #rospy.sleep(2)        #self.soundhandle.play(5)                # Uncomment to play a wave file        #rospy.sleep(2)        #self.soundhandle.playWave(self.wavepath + "/R2D2a.wav")    def cleanup(self):        self.soundhandle.stopAll()        rospy.loginfo("Shutting down talkback node...")if __name__=="__main__":    try:        TalkBack(sys.path[0])        rospy.spin()    except rospy.ROSInterruptException:        rospy.loginfo("Talkback node terminated.")

我們來運行看一下效果：

$ roslaunch rbx1_speech talkback.launch

然後再說話，ROS不僅將文本資訊識別出來了，而且還讀了出來，厲害吧。當然了，現在還沒有加入什麼人工智慧的演算法，不能讓機器人和我們聰明的說話，不過這算是基礎了，以後有時間再研究一下人工智慧就更犀利了。

----------------------------------------------------------------

歡迎大家轉載我的文章。

轉載請註明：轉自古-月

http://blog.csdn.net/hcx25909

歡迎繼續關注我的部落格

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

ROS探索總結（十）——語音控制

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support