Introduction to Machine learning

Last Update:2014-10-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

In real life, we may unknowingly use a variety of machine learning algorithms every day.

For example, when you use Google every time, it works well, and one of the important reasons is that a learning algorithm implemented by Google can "learn" how to rank pages. Every time you use a Facebook or Apple photo-processing app, they can automatically identify your friends ' photos, which is one of the machine learning. Whenever you read an e-mail message, your spam filter will help you avoid a lot of spam, which is also achieved through a learning algorithm.

We have a dream that someday we can create machines as smart as humans. Many AI experts believe that the best way to achieve this goal is through learning algorithms that mimic the way the human brain learns.

Machine learning originates from the field of artificial intelligence, and we want to create machines that are intelligent. We can program the machine to do some basic work, such as how to find the shortest path from a to B. But in most cases, we don't know how to explicitly write AI programs to do more interesting tasks, such as web searches, tagged photos, and blocking spam. people realize that the only way to achieve these goals is for the machine to learn how to do it.

Today, machine learning has developed into a new capability in the field of computing and is closely linked to industry and the basic scientific community. In Silicon Valley, machine learning leads to a large number of classes, such as autonomous robots, computational biology, and so on. There are many examples of machine learning, such as data mining.

One of the reasons why machine learning has become so popular is the explosive growth of network and automation algorithms. This means we have a much larger set of data than ever before. In Silicon Valley, for example, there are countless Silicon Valley companies that are collecting data about web clicks and trying to use machine-learning algorithms to better understand and serve users in the Clickstream, which has become a huge industry.

With the development of electronic automation, we have an electronic medical record, and if we can turn these records into medical knowledge, we will be able to understand the various diseases more deeply. At the same time, computational biology has grown rapidly with the aid of electronic automation, and biologists have collected a great deal of data on gene sequences and DNA sequences, and their application to machine learning algorithms can help us to understand more deeply the human genome and its human genome's significance to us.

Almost all areas of the engineering community are using machine learning algorithms to analyze the growing mass of data sets. Some machine applications can not be achieved by manual programming. For example, it is almost impossible to write a program that allows a helicopter to fly autonomously. the only viable solution is for a computer to be able to learn how to fly a helicopter autonomously.

such as handwriting recognition, the cost of sending large amounts of mail to the world by address classification is now greatly reduced, and one of the important reasons is that whenever you write such a letter, a machine learning algorithm has learned how to read your handwriting and automatically send your letters to its destination.

You may have been exposed to natural language processing and computer vision. In fact, these areas are trying to understand human language and images through AI, and today most of the natural language processing and computer vision is an application of machine learning.

Machine learning algorithms are also widely used in the Self-customizing program. Every time you use Amazon Netflix or iTunes Genius services, you receive a movie or product that they recommend for you, which is achieved through learning algorithms. Obviously, these applications have tens of millions of users, and for these massive users, it is obviously impossible to write thousands of different programs, the only effective solution is to develop self-learning, customize the software that fits your preferences and recommends it accordingly.

Finally, machine learning algorithms have been used to explore human learning patterns and to try to understand the human brain.

What's machine learning

What are machine learning? Different people have different definitions of machine learning. The following is the definition of machine learning given by Asser Semours (Arthur Samuel):

Arthur Samuel (1959).

Machine Learning:field of study, gives computers the ability to learn without being explicitly programmed.
Asser Semours defines machine learning as a research area where computers have the ability to "learn" without explicitly writing a program for a computer (to accomplish a specific task).

Samuel was famous because in the 50 's he programmed a program to play checkers. The magic of this Checkers program is that he let the program with the program itself under the thousands of chess, checkers program through the observation of what kind of chess game more easy to win, what kind of chess more easy to lose, gradually learned what is good chess, what is bad chess . In the end, the Checkers program's level of chess is more than Samuel.

This is a remarkable achievement, although Samuel himself is not a very good chess player, but because the computer (Checkers program) can play with their own thousands of times, through such training, the computer got a lot of chess experience, eventually make the computer finally become better than Samuel chess player.

The above is a less formal and somewhat old definition, below is an updated definition of Tom Mitchell from Carnegie Mellon University:

Tom Mitchell (1998).

Well-posed Learning PROBLEM:A Computer program was said to learn from experience E with respect to some task T and some PE Rformance measure P, if its performance on T, as measured by P, and improves with experience E.

If a computer program on the task T performance measurement P, through experience e and improve, then we call this computer program through experience e to learn.

Specific to the example of checkers, the training experience E refers to the computer program with the Samuel thousands of experience; task T refers to the task of checkers, performance standards P
The probability that the Checkers program wins in the next game against a new opponent.

There are several types of learning algorithms, mainly divided into two categories, namely supervised learning (supervised learning) and unsupervised learning (unsupervised learning), which I will describe in a later blog post. but ultimately, supervised learning is that we have to tell the computer exactly how to do something, and unsupervised learning means we have to let the program learn for itself .

In future posts, we'll discuss some other terms, such as intensive learning (reinforcement learning) and recommender systems (Recommender systems), which we'll discuss later, But the two most common learning algorithms are actually supervised learning and unsupervised learning.

Next, let's discuss what is supervised learning, what is unsupervised learning, and will discuss under what circumstances these two algorithms are used.

Supervised learning

Let's start with an example of what supervised learning is, and the formal definition will be described later in this article.

Let's say you want to predict the rate now and have some data about the house price, as follows:

Where the horizontal axis represents the size of the house (in square feet), the vertical axle represents the price (in thousands of dollars), if you have a 750 square feet size house to sell, then based on the above data, how do you speculate about how much the house worth.

For this problem, we can apply the machine learning algorithm, draw a line in this set of data or a line of quasi-unity, according to this line we can speculate that the house may sell $000. Of course, this is not the only algorithm, such as a two-time function may be more suitable for the existing data, we use this two-time function of the curve to predict the effect will be better.

The above is a supervised learning example, you can see that supervised learning refers to the learning algorithm we give a data set, the data set by the "correct answer" composition . In the housing price example, we gave a series of house data, we give the correct price of each sample in the data set, that is, their actual price, and then use the learning algorithm to calculate more correct answers, such as your new house price, in terms of the term, this is called regression problem.

We tried to speculate on the result of a continuous value, that is, the price of the house. The price of a general house will be credited to the cents, so house prices are actually a series of discrete values, but we usually think of house prices as real numbers, as scalars, and so see it as a continuous number, and the meaning of regression is that we're trying to figure out the series of consecutive value attributes .

Regression problem: The result we predict is a continuous value.

Let's talk about another example of supervised learning, if you want to predict whether a breast cancer is benign by looking at the case, the data concentration, the horizontal axis, the size of the tumor, the longitudinal axes, I marked 1 and the "X", respectively, are malignant or not malignant. The tumors we have seen before, if they are malignant, are recorded as 1, not malignant (or benign) and are recorded as 0.

Suppose now that we have a friend who is unfortunately checking out breast tumors, assuming that her tumors are about this large, then the problem with machine learning is whether you can estimate the probability that a tumour is malignant or benign. In terms of terminology, this is a classification problem.

Classification refers to our attempt to infer discrete output values: 0 or 1, benign or malignant. In fact, in the classification problem, the output may be more than two values. For example, there may be three kinds of breast cancer, so you want to predict discrete outputs 0, 1, 2, 3. 0 of them are benign, 1 for the first type of breast cancer, 2 for the second type of cancer, and 3 for the third category. However, this is actually a classification problem, as these discrete outputs correspond to benign, first, second, or third-class cancers, respectively.

In the classification problem we can draw these data points in a different way. We can use different symbols to represent the data, since we think of the size of the tumour as distinguishing between malignant or benign traits, then we can draw, using different symbols to denote benign and malignant tumors, or negative and positive samples. Now we do not draw all x, but instead to a benign tumour with O, the malignant continuation is indicated by X. All we do is map the above data one by one, map it to a straight line, and use different symbols O and X to represent benign and malignant samples.

Note that in this case, we only use a feature of the size of the tumour to predict whether the tumour is malignant or not, and there may be more than one feature in some other machine learning problems. For example, not only do we know the size of the tumour, but we also know the age of the corresponding patient, and now the data set may look like this:

In other words, your existing data sets are the age of different patients and the size of their tumors and the benign or not of these tumors. We take the horizontal axis as the size of the tumor, with the ordinate as the patient's age, O for benign tumors, and X for malignant tumors. Our learning algorithm to do is to identify such a line, the malignant tumors and benign separation. If your friend's tumour falls on a benign side, the reality is more likely to be benign than malignant, according to your learning algorithm.

In this example, we have two characteristics, patient age and tumor size, and in other machine learning problems, we usually have more features. In the case of previous breast cancer, these features can also be used: mass density, consistency of tumor cell size, and shape consistency, etc.

Our later Boven introduces a learning algorithm that can handle not only 2, 3, or 5 features, but even even an infinite number of features. If you want to use an infinite number of features, so that your algorithm can use a large number of features or clues to speculate, then how you deal with this infinite number of features, or even how to store these features there is a big problem, such as your computer's memory is certainly not enough. We will then introduce this algorithm, called SVN (Support vector machine), which has a clever mathematical technique that allows the computer to handle an infinite number of features.

Summary

In this chapter we introduce supervised learning, whose basic idea is that each sample in our data set has a corresponding "correct answer" and then make predictions based on these samples, as in the case of houses and tumors.

We also introduced the regression problem, which is to infer a continuous output by regression. We then introduced the classification problem, whose goal was to infer a discrete set of results.

Now for a quiz, if you run a company, you want to develop a learning algorithm to handle the following two questions.

The first problem is that you have a large number of identical goods, and you want to predict how many items you can sell in the next three months. The second problem is that you have a lot of customers, and you want to write a software to test each user's account, and for each account, you have to decide if they have ever been stolen. Are these two questions, whether they belong to the classification problem or the regression problem?

Obviously, the problem is a regression problem, because if there are thousands of goods, we will think of it as a real number, as a continuous value, so the item sold is also a continuous value. Problem two is a classification problem, we can put the predicted value of the account is not stolen, with 1 to indicate that the account has been stolen, like the example of breast cancer 0 is benign, 1 is malignant, so we based on whether the account has been stolen and set them to 0 or 1, and then use the algorithm to speculate on an account is 0 or 1, because A few discrete values, so we classify it as a classification problem.

The above is supervised learning content, the following we see unsupervised learning.

Unsupervised learning

We are now going to discuss unsupervised learning, before we have talked about supervised learning. Recall the previous data set, each of which has been marked as
Positive samples or negative samples, namely benign or malignant tumors. Therefore, for each sample in supervised learning, we have been clearly informed of what is called the correct answer, that is, whether they are benign or malignant.

In unsupervised learning, the data we use will look a little different from those in supervised learning. In unsupervised learning, there is no concept of attributes or labels, which means that all data is the same, no difference.

So in unsupervised learning, we have only one data set, no one tells us what to do, and we don't know what each data point means. Instead, it just tells us that there is now a dataset where can you find some structure?

For a given dataset, unsupervised learning algorithm may determine that the dataset contains two different clusters. Unsupervised learning algorithms divide these data into two different clusters, the so-called clustering algorithm.

Clustering Algorithm Example

In fact unsupervised learning is used in many places. Let's give an example of a clustering algorithm, which is an example of Google News.

What is Google News doing every day? They collect thousands of news on the Internet every day and then group them together to form a news topic. What Google News does is search for thousands of news, and then automatically bring them together, and news about the same subject is displayed together.

In fact, clustering algorithms and unsupervised learning algorithms can also be used in many other problems. Here, let's take one of its applications in genomics, here's an example of a gene chip:

The basic idea is that given a different set of individuals, for each individual, they are tested for a specific gene. In other words, you have to analyze how many genes are showing up. So these colors: red, green, gray and so on, they show whether these different individuals have different degrees of a particular gene.

And then all you can do is run a clustering algorithm that puts different individuals into different classes or different types of people, which is unsupervised learning. We did not inform this algorithm in advance which are the first class of people, which are the second class of people, which are the third category of people and so on. Instead we just tell the algorithm, there's a bunch of data, I don't know what the data is, I don't know what kind of stuff it is, what it's called, and I don't even know what the types are. However, can you automatically find the types of these data? These individuals are then automatically sorted by the resulting type, although I do not know what types are in advance, because for these data samples, we do not give the algorithm a correct answer, so this is unsupervised learning.

Unsupervised learning or clustering algorithms are also used in a number of other areas, and it is designed to organize large clusters of computers. Some friends are managing large data centers (big computer clusters) and trying to figure out which machines tend to work together, and if you put them together, you can make your data center work more efficiently.

There are also applications that can be used for social network analysis. So, if you can tell which of your friends are most connected by email, or if you know your Facebook friends, or your Google + friends, we can automatically identify which ones are good friends and which are just groups of friends that we know about.

There are also applications in the market segmentation, many companies have a large customer information database, then give you a customer data set, you can automatically identify different market segmentation, and automatically put your customers into different market segments, thus helping you to more effective sales in different market segments, which is also unsupervised learning. We now have these customer data, but we do not know in advance which market segments, and for our data set of a customer, we can not know in advance who belongs to the market segment one, who belongs to the market segment two and so on. But we have to get this algorithm to find all of this from the data itself.

In fact unsupervised learning is also used for astronomical data analysis, and through these clustering algorithms, we have found many amazing, interesting, and practical theories about how galaxies are born, all of which are examples of clustering algorithms.

Cocktail reception Issues

Another example of a unsupervised learning algorithm is the cocktail feast issue. Imagine a party with a room full of people sitting together and talking at the same time, so there will be a lot of voices mixed together, because many people will talk at the same time, in which case you can hardly hear what the people in front of you say.

So, for example, there was a scene where there were only two people at the party and two people talking at the same time (well, yes ...) It's a very small cocktail party, we've got two microphones, we put them in the room, and because the two microphones are different from the two people, each microphone records different combinations of voices from two people.

Maybe the sound of a sounds a bit louder in the first microphone, maybe B's sound will be louder in the second microphone because the position of the 2 microphones is different from the 2 speakers, but each microphone will record the sound from the overlapping portions of the two speakers.

So what we can do is to put these two recordings into an unsupervised learning algorithm called "Cocktail Party Algorithm". Let the algorithm help you find the classification, and then the algorithm can identify these recordings and isolate the two audio sources that are superimposed on each other. The above mentioned is a simplified version of the "cocktail party problem".

Cocktail party Cocktail (problem), in a room full of people, people are talking to each other, we use some microphones to record the voice in the room, using unsupervised learning algorithms to identify what a person in the room said.

Summary: According to the recording, the algorithm finds the implication classification, then the algorithm can identify other synthetic recordings, which belong to the classification, which belong to that category.

Summarize

Supervised learning (classification, regression)

Unsupervised Learning (clustering)

Introduction to Machine learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to Machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support