Alexa is Amazon's own voice recognition technology that needs to be used with its own echo speakers. Developers can build their own programs (Skill) on Amazon to connect to their apps or hardware. For example, the user home a set of XX card smart lights, now want to control by voice. First, the user speaks to the Echo speaker (the default is Alexa, can also be set to echo), and then say the command, such as Ask XX to turn on the lights,xx is the keyword of the application. Amazon then, after identifying the user's intentions, initiates a POST request to the developer's server and the developer server responds to the request and controls the corresponding light bulb. The above is a typical request and response process.
The process of recognition is also relatively clear, first echo voice data sent to Alexa identification, Alexa identified will be converted to the developer's custom intent (not text, the original text is not available) and slots (if any), post it to the developer server. The developer server analyzes intent and slots, and makes corresponding actions and responses to different requests. There are several mappings in this process, 1 is the voice-to-text mapping, which is done by Alexa, the developer cannot control. 2 is the text to intent mapping, this definition in the sample utterance files, the user can modify themselves; This mapping is usually many-to-one, that is, multiple text may correspond to the same intent. The third is the intent to the actual action mapping, the application of the number of intent and the name are defined in the background intent schema. As to how each intent to be handled, it is the developer's own set on the server.
Amazon Alexa Speech Recognition 1: Introduction