160413. Neural network processor

Source: Internet
Author: User
Tags svm xeon e5

Chen Yun

http://novel.ict.ac.cn/ychen/
Chen Yun Ji, male, born in 1983, Jiangxi Nanchang People, Research Institute of Computational Technology, CAS, doctoral tutor. At the same time, he served as a distinguished researcher of the Center for Brain Science, Chinese Academy of Sciences and University post professor. He currently leads his laboratory to develop the Cambrian series of deep learning processors. Prior to this, he engaged in the development of domestic processors for more than 10 years, has been responsible for or participated in a variety of godson processor design. He has published more than 60 papers in academic conferences and periodicals including ISCA, HPCA, MICRO, Asplos, ICSE, ISSCC, Hot Chips, Ijcai, FPGA, SPAA, IEEE MICRO, and 8 ieee/acm Trans. Chenyun won the first National Natural Science Foundation "Outstanding Youth Fund", the first national million people program "Youth notch Talent", the Chinese Computer Society Young Scientist Award and the Academy of Sciences Young Talent Award. He also led the scientific research team as the head of the national "Youth Civilization number" and the Central State organ "youth civilization" title.

Smart Apps
    • Intelligent processing is the core problem
    • 20w Human brain Power consumption
    • Multilayer large-scale neural network ≈ convolutional Neural Network + LRM (different feature map extracts different features to complete normalization) + Pooling (de-sampled) + Classifier (All-in-one, 2-3-tier)
    • DeepMind: Deep School + Enhanced learning = 49 Games
Requirements for neural network processors
    • Google cat:1.6 million CPUx7 day = Cat Face Recognition task
    • 100 billion synapse (Google Brain) = 100 trillion synapse
Dedicated Neural network processor
    • Each computer requires a dedicated neural network processor
Cambrian 2008-2016
    • Architecture method to complete the calculation of neural networks
    • 2012: Results
      can use the CPU (Xeon e5-4620) and GPU (k20m) One-tenth of the area, respectively to achieve CPU-117 times, GPU-1.1 times performance.
    • 2013: First deep learning processor-Diannao
      The traditional neural network chip is the method of the hardware unit and the algorithm neuron one by one corresponding, so that only a fixed neural network to calculate. They used a small-scale neural network time-sharing method to support arbitrary-scale neural networks, which is very powerful, greatly improving the ability of the chip to different algorithms.
    • 2014: multicore deep learning processor-Dadiannao
    • 2015: Universal machine learning Processor-Pudiannao-(Artificial neural network, K-nn,svm,bayes, etc.)
    • 2016: Smart identification on the camera Ip-shidiannao
    • 2016: Neural network universal Instruction set-Diannaoyu
Methodological innovation
    • Curing small-scale hardware = Any large variable neural network
    • Optimize the storage hierarchy to minimize the number of accesses to memory
    • Increase the bandwidth of the memory visit
Time-sharing of hardware operation unit
    • Diannao: Calculation of small-scale neurons in sequence
    • Dadiannao:
      • Edram Technology
      • Neural networks are structured so that the efficiency can be improved by scheduling
      • Up to 21 times times faster than K20
      • Under the 28nm process, the Dadiannao has a frequency of 606MHz, an area of 67.7mm^2, and a power consumption of about 16W. Single-chip performance is 21 times times larger than the mainstream GPU, and consumes only 1/330 of the mainstream GPU. The high-performance computing system consisting of 64 chips can be up to 450 times times faster than the mainstream GPU, but the total energy consumption is only 1/150.
      • Cons: 1. Edram was successful in 28nm, but the 7nm process appeared to be difficult. (Transistor leakage) 2. A fully connected network means that there is also a need for full communication between the chips. So there are some flaws in the design.
    • Pudiannao

      • Small sample Learning Method-bayes Method (large sample learning method is not a panacea)
      • No large sample of economics can be used
      • The evolution of the algorithm itself means that the hardware chip has a strong universality
      • Vector inner Product (SVM), vector distance, count, non-linear function, sort = = 95% The arithmetic involved in the machine learning algorithm has designed the MLU (machine learning function part)
    • Shidiannao

      • Both the input and output and the system model are on the chip without the need for a range of memory
      • Essentially, it's von Neumann structure.
      • The supercomputer in the phone

Brain-like computer and wind Neumann structure
    • The universal architecture of hardware, brain-like computer is not a breakthrough in nature

160413. Neural network processor

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.