Google Open Voice Command data set, help beginners to use deep learning to solve audio recognition problems

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Voice Command Data set address: http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

Audio Recognition Tutorial Address: https://www.tensorflow.org/versions/master/tutorials/audio_recognition

At Google, we are often asked how to use deep learning to solve speech recognition and other audio recognition problems, such as detecting keywords or commands. Although there are already many large open-source speech recognition systems, such as Kaldi, these systems can use neural networks as a module, but their complexity makes it difficult to guide simple tasks. More importantly, there are not many free, open source datasets for beginners (some datasets need to be preprocessed before building a neural model) or a dataset for simple keyword detection tasks.

To address these issues, the TensorFlow and AIY team created a voice command dataset and used it to add training and inferred sample code to the TensorFlow. The data set has 30 short words of 65,000 lengths for 1 seconds of pronunciation, which are provided by thousands of people via the AIY website. It is released with Creative Commons by 4.0 license and will continue to release new versions as audio grows. The dataset is designed to help build a basic but useful application voice interface, including commonly used words "yes" (yes), "no" (no), numbers, and directional words. We also open up the infrastructure for creating the dataset, and want more people to use it to create their own datasets, especially for languages and applications where service levels are low.

To try it yourself, download the pre-set dataset for the TensorFlow Android demo app (Http://ci.tensorflow.org/view/Nightly/job/nightly-android/ lastsuccessfulbuild/artifact/out/tensorflow_demo.apk) and open "TF speech". You can apply for access to the headset, and then you'll see a list of 10 words, which will light up when you say which word.

The recognition results depend on whether your voice mode is overwritten by the dataset, so this is not perfect, and commercial speech recognition systems are much more complex than this teaching example. But we hope that as more accents and variants are added to the dataset, and the community contributes to TensorFlow's improved model, we can see the continuous improvement and expansion of the data set.

You can also learn how to train your own models by tensorflow.org on the new audio recognition tutorials. With the latest development version of the framework (https://hub.docker.com/r/tensorflow/tensorflow/) and modern desktops, you can download the dataset and train the model within a few hours. You also have a variety of options to customize neural networks for different problems, resulting in different latency, scale, and precision balances to suit different platforms.

We look forward to seeing new applications built with the help of this dataset and tutorials, so I hope you have the opportunity to take advantage of these resources and start doing audio recognition tasks.

The Convolutional neural networks for small-footprint, presented at the interspeech 2015 meeting keyword spotting "(http://www.isca-speech.org/archive/interspeech_2015/papers/i15_1478.pdf) describes the architecture of the network.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Google Open Voice Command data set, help beginners to use deep learning to solve audio recognition problems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Google Open Voice Command data set, help beginners to use deep learning to solve audio recognition problems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support