Martin Wainwright: Accelerating the spread of artificial intelligence with statistical machine learning algorithms

Source: Internet
Author: User

650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M01/9C/42/wKiom1luAC6iJEzZAAI1boYZYD0637.jpg-wh_500x0-wm_ 3-wmp_4-s_1003339291.jpg a copy of the "title=" img_6837. JPG "alt=" Wkiom1luac6ijezzaai1boyzyd0637.jpg-wh_50 "/>

(for Martin Wainwright , professor at the University of California, Berkeley, USA )

Martin Wainwright is an internationally renowned expert in statistics and computational science, and is a professor at the University of California, Berkeley, where he teaches in the Department of Statistics as well as in the school of Electronic Engineering and Computational science (ee&cs ) system, which has a unique perspective and advantages spanning two fields of mathematics and computational Science.

2017 year Span style= "margin:0px;padding:0px;line-height:normal;" >7 month ai and Intelligent Logistics Roundtable Forum, martin This paper introduces a new statistical machine learning algorithm newton Sketch , which appeared in recent two years. This algorithm is helpful for the rapid optimization, analysis and comprehension of super-scale data sets.

As the winner of the global statistics top prize copss ,Martin emphasized Newton Sketch Large-scale high-dimensional datasets and high-dimensional neuron networks can be processed with shorter computational time and lower computational costs, which is important for promoting the rapid spread of AI in the business world.

High-dimensional phenomena caused by big data

statistics originated in ancient Greece more than 2000 years ago. Modern statistics are represented by mathematical statistics, and the mathematical statistics are based on probability theory, which belongs to the basic mathematics discipline, and the statistics enter the phase of statistical and mathematical integration.

t A new era of statistics was opened up by the publication of distributed papers, in which small samples were used instead of large samples for statistical research. The core of the statistics is translated into: Based on the sample to explore the real situation of the whole. This has been possible in the past decade because of the inability to get a full set of data due to limitations such as computing equipment, storage devices, and computing power. Video data, social data, industrial data, and all kinds of sensor data nurture the so-called big data phenomenon. According to IBM in 2013 A study of the year, before The amount of data generated in two years is close to the global total of data 90% . IDC forecasts from 2013 the beginning of the year, global data totals doubled every two years.

There was no data in the world in the past that could be inferred only by very few, and now there is not only data for the whole world, but it is still expanding. Further, a data object has thousands and even tens of thousands of dimensions (attributes), that is, "high-dimensional data." When the computing and storage devices can capture all the data, the problem becomes how to dimension the data of the whole world, so that the real world can be understood and reflected in a limited time and cost.

The significance of statistical machine learning to artificial intelligence

data Science emerged in the intersection of classical statistics, computational science and artificial intelligence applications. Data science is the intersection of classical statistics, computers and applications. Martin said that in the past few years, academia and industry have witnessed the transformation of data science, statistical machine learning has also emerged.

Statistical machine learning is a new interdisciplinary subject, which integrates computational science, optimization and system science, so many research propositions come from practical application. In reality, the increasing scale of data flow, and also become more dynamic and heterogeneous, so the requirements of the algorithm is more and more high, and statistical machine learning for this provides a very effective analysis method. Related fields such as bioinformatics, artificial intelligence, signal processing, communication, finance, cybernetics are not subject to the great impact of statistical machine learning.

Martin Indicates that real-world big data problems are challenging due to data noise and data loss, and that the goal of machine learning is to extract reliable and useful information from data through automated software processes, and statistical reasoning itself can extract useful information from data noise, and the combination will have a better effect.

random projection ( Randomized Projection ) is an emerging algorithm in statistical machine learning, which "projects" high-dimensional large datasets into low-dimensional datasets, and does not lose effective information in the process of dimensionality reduction, so it is only necessary to study data in low-dimensional space. martin indicates that the random projection has been widely used in many fields and proved to be an effective algorithm. On this basis, martin applies the algorithm to the classical Newton iterative nonlinear optimization algorithm, which is newton Sketch .

2015 year 5 Span style= "margin:0px;padding:0px;line-height:normal;" > month, Martin with colleagues Mert Pilanci published the paper " Newton sketch:a linear-time optimization algorithm with linear-quadratic convergence hessian function for Newton iterative method, A very good approximation of the linear effect, which greatly simplifies the complexity of Newton's iterations, can be widely used in large-scale linear programming and two-time programming and other nonlinear programming problems, For example logistic regression ( Logistic regression

Newton Sketch What is the meaning of the machine learning algorithm that the represents for deep learning? martin indicates that the deep neuron network needs gpu and other special hardware support, although in recent years, Google and other companies are also developing tpu and other new specialized hardware, Span style= "margin:0px;padding:0px;line-height:normal;" >gpu has also made significant progress, but prices are still expensive. On the other hand, the deep neuron network itself lacks the engineering stability in the actual commercial application, especially in the case of poor data quality, it is easy to fail. Most importantly, deep neuron networks have a "data hunger" phenomenon: large amounts of data are needed for model training. and newton Sketch can greatly simplify the application of deep neuron network.

Newton Sketch The is well suited for distributed machine learning tasks. In the Distributed machine learning, the mass data is distributed on the nodes of the computer cluster, and the machine learning algorithm must traverse the data repeatedly to find the optimal model. and newton Sketch method by random generalization (randomized skeching ) to calculate a "synthetic dataset". This data set summarizes the essential information of the original data and is often small in size and can even be handled by a single machine. Further analysis and modeling on this data set will result in faster, lower cost, and more efficient calculations.

Newton Sketch The is the representative of the Statistical machine learning algorithm, for the rapid popularization of artificial intelligence in the real business world opened a road for urban traffic, intelligent logistics, power network and other complex giant system research and modeling has very important practical significance, even for the e-commerce recommendation system, The social network scoring system is also very valuable because these are high-dimensional data.

As Martin said in the "AI and Smart Logistics Roundtable", most of the real-life data is "living" in "high-dimensional space", and the simpler it is to deal with high-dimensional data, the more practical it is. With international academics like Martin introducing algorithms such as statistical machine learning to China, it is expected to accelerate the challenge of solving China's big data phenomena with artificial intelligence, in an engineered way to make artificial intelligence algorithms truly landing and creating business value. (Wen/Ningchuang)


This article is from the "Cloud Technology Age" blog, please be sure to keep this source http://cloudtechtime.blog.51cto.com/10784015/1948727

Martin Wainwright: Accelerating the spread of artificial intelligence with statistical machine learning algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.