Ten truths you must know about machine learning

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As a person who often explains machine learning to non-professionals, I have compiled the following ten points as some explanations for machine learning.

Machine learning means learning from data; AI is a buzzword. Machine learning is not like the hype of hype: by providing the appropriate training data to the appropriate learning algorithms, you can solve countless problems. Call it AI if it helps sell your AI system. But you have to know that AI is just a fashionable word, which only represents people's expectations of it.

Machine learning mainly involves data and algorithms, but the most important thing is data. Machine learning algorithms, especially advances in deep learning, have a lot of exciting things. But data is a key factor in making machine learning possible. Machine learning can be done without complicated algorithms, but not without good data.

Unless you have a lot of data, you should stick to a simple model. Machine learning trains models based on patterns in the data, exploring the space of possible models defined by parameters. If the parameter space is too large, the training data will be over-fitting and a model that cannot generalize itself will be trained. If you want to explain this in detail, you need to do more mathematical calculations, and you should use this as a guideline to make your model as simple as possible.

The quality of machine learning is strongly related to the quality of the data used in training. As the saying goes, "You enter a bunch of garbage into the computer, and the output must be a pile of garbage data." Although this sentence appears earlier than machine learning, it is precisely the key limitation of machine learning. Machine learning can only find patterns that exist in the training data. For supervising machine learning tasks (such as classification), you need a robust, well-marked, rich training data set.

Machine learning works only if the training data is representative. As the fund's prospectus warns, "Past performance does not guarantee future results." Machine learning should also issue a similar warning statement: it can only work based on the same distribution of data as the training data. Therefore, be wary of the deviation between training data and production data, and repeat the training model frequently to ensure that it will not be outdated.

Most of the work of machine learning is data conversion. Under the hype of machine learning technology, you might think that machine learning is mainly about selecting and adjusting algorithms. But the reality is unremarkable: most of your time and effort will be spent on data cleansing and feature engineering, which translates the original features into features that better represent the data signals.

Deep learning is a revolutionary advancement, but it is not a panacea. Since machine learning has been applied and developed in many fields, deep learning has also been promoted. In addition, deep learning has automated some of the work traditionally done through feature engineering, especially for image and video data. But deep learning is not a panacea. There is no ready-made for you to use, you still need to put a lot of effort into cleaning and transforming data.

Machine learning systems are susceptible to operator error. Apologize to the NRA, "The machine learning algorithm will not kill, it is killing people." When the machine learning system fails, it is rarely because of a problem with the machine learning algorithm. More likely, human error is introduced into the training data, causing deviations or other system errors. We should always be skeptical and treat machine learning in a way that is suitable for software engineering.

Machine learning may inadvertently create a self-fulfilling prophecy. In many applications of machine learning, the decisions you make today affect the training data collected tomorrow. Once the machine learning system incorporates deviations into the model, it can continue to generate new training data with increased bias. Moreover, some deviations may ruin people's lives. Please be responsible for any point: don't create self-fulfilling prophecies.

AI does not self-awaken, rebel and destroy humanity. Quite a few people seem to get the concept of artificial intelligence from science fiction movies. We should be inspired by science fiction, but not so stupid, mistake the novel for reality. From conscious evil humans to unconscious biased machine learning models, there are too many realities and dangers to worry about. So you don't have to worry about SkyNet and "superintelligence" (Translator's Note: SkyNet and superintelligence are sci-fi movies and science fiction, respectively).

The content of machine learning is far more than the ten points I mentioned above. I hope that these introductory content will be useful to non-professionals.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More