Google Open Voice Command data set, help beginners to use deep learning to solve audio recognition problems

Source: Internet
Author: User

Voice Command Data set address: http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

Audio Recognition Tutorial Address: https://www.tensorflow.org/versions/master/tutorials/audio_recognition


At Google, we are often asked how to use deep learning to solve speech recognition and other audio recognition problems, such as detecting keywords or commands. Although there are already many large open-source speech recognition systems, such as Kaldi, these systems can use neural networks as a module, but their complexity makes it difficult to guide simple tasks. More importantly, there are not many free, open source datasets for beginners (some datasets need to be preprocessed before building a neural model) or a dataset for simple keyword detection tasks.


To address these issues, the TensorFlow and AIY team created a voice command dataset and used it to add training and inferred sample code to the TensorFlow. The data set has 30 short words of 65,000 lengths for 1 seconds of pronunciation, which are provided by thousands of people via the AIY website. It is released with Creative Commons by 4.0 license and will continue to release new versions as audio grows. The dataset is designed to help build a basic but useful application voice interface, including commonly used words "yes" (yes), "no" (no), numbers, and directional words. We also open up the infrastructure for creating the dataset, and want more people to use it to create their own datasets, especially for languages and applications where service levels are low.


To try it yourself, download the pre-set dataset for the TensorFlow Android demo app (Http://ci.tensorflow.org/view/Nightly/job/nightly-android/ lastsuccessfulbuild/artifact/out/tensorflow_demo.apk) and open "TF speech". You can apply for access to the headset, and then you'll see a list of 10 words, which will light up when you say which word.


The recognition results depend on whether your voice mode is overwritten by the dataset, so this is not perfect, and commercial speech recognition systems are much more complex than this teaching example. But we hope that as more accents and variants are added to the dataset, and the community contributes to TensorFlow's improved model, we can see the continuous improvement and expansion of the data set.


You can also learn how to train your own models by tensorflow.org on the new audio recognition tutorials. With the latest development version of the framework (https://hub.docker.com/r/tensorflow/tensorflow/) and modern desktops, you can download the dataset and train the model within a few hours. You also have a variety of options to customize neural networks for different problems, resulting in different latency, scale, and precision balances to suit different platforms.


We look forward to seeing new applications built with the help of this dataset and tutorials, so I hope you have the opportunity to take advantage of these resources and start doing audio recognition tasks.


The Convolutional neural networks for small-footprint, presented at the interspeech 2015  meeting  keyword spotting "(http://www.isca-speech.org/archive/interspeech_2015/papers/i15_1478.pdf) describes the architecture of the network.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.