Machine Learning is actually easier than you think.

Last Update:2014-02-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Many people think that machine learning is unattainable. This is a mysterious technology that only a few professional scholars know.

After all, you are letting a machine running in the binary world come up with its own understanding of the real world. You are teaching them how to think. However, this article is hardly the obscure, complex, and full of mathematical formulas you think. Just like all the basic knowledge that helps us understand the world (for example, Newton's law of motion, work needs to be done, supply-demand relationship, etc ), the best methods and concepts of machine learning should also be concise and clear. Unfortunately, the vast majority of machine learning literature is filled with complicated and difficult symbols, difficult and obscure mathematical formulas, and unnecessary nonsense. It is precisely this that has placed a thick wall on the idea of the simple foundation of machine learning.

Now let's look at a practical example. We need to add a "you may like" recommendation function at the end of an article. How can we achieve this?

To achieve this idea, we have a simple solution:

1. Obtain the title of the current article and divide it into independent words (note: the original text is in English and only needs to be separated by spaces. A word divider is required for Chinese Word Segmentation)
2. Get all articles except the current article
3. Sort these articles based on the degree of coincidence between their content and the title of the current article

def similar_posts(post)
title_keywords = post.title.split(' ')
Post.all.to_a.sort |post1, post2|
post1_title_intersection = post1.body.split(' ') & title_keywords
post2_title_intersection = post2.body.split(' ') & title_keywords
post2_title_intersection.length <=> post1_title_intersection.length
end[0..9]
end

This method is used to find articles similar to the blog article "how to improve product quality by the Support Team", and we will get the following top 10 articles of relevance:

How to implement a verified Solution
Learn how your customers make decisions
Design the first running interface to please your users
How to recruit designers
Icon design
Interview with singer Ryan
Actively support customers through internal communication
It doesn't matter why you become the first
Interview with Joshua Porter
Customer Retention, group analysis and Visualization

As you can see, the benchmark article is about how to efficiently provide team support, which has little to do with the advantages of Customer Group Analysis and Design Discussion, in fact, we can also adopt better methods.

Now, we try to solve this problem with a real machine learning method. There are two steps:

The article is represented in mathematics;
The K-means clustering algorithm is used to analyze the preceding data points.

1. Represent the article in Mathematics

If we can present the article in mathematics, we can plot the article based on the degree of similarity and identify different clusters:

As shown in, it is not difficult to map each article into a coordinate system coordinate point. You can perform the following two steps:

Find all words in each article;
Create an array for each article. The element in the array is 0 or 1, which indicates whether a word appears in this article. The order of array elements in each article is the same, the value is different.

The Ruby code is as follows:

@posts = Post.all
@words = @posts.map do |p|
p.body.split(' ')
end.flatten.uniq
@vectors = @posts.map do |p|
@words.map do |w|
p.body.include?(w) ? 1 : 0
end
end

Assume that the value of @ words is:

["Hello", "internal", "internal communication", "Reader", "blog", "publish"]

If the content of an article is "Hello blog publisher", the corresponding array is:

[1, 0, 1, 1]

Of course, we can't use a simple tool to present the six-dimensional coordinate point like a two-dimensional coordinate system, but the basic concepts involved in this, such as the distance between two points, are interconnected, it can be extended to a higher dimension through two dimensions (so it is feasible to use two-dimensional examples ).

2. Use the K-means clustering algorithm to perform cluster analysis on the Data Base.

Now we get the coordinates of a series of articles. We can try to find the group clusters of similar articles. Here we use a simple clustering algorithm-K-means algorithm, which has five steps:

Sets K to indicate the number of objects in the cluster;
K objects are randomly selected from all data objects as the initial K cluster centers;
Traverse all objects and assign them to the group nearest to them;
Update the cluster center, that is, calculate the mean of objects in each cluster, and use the mean value as the new center of the cluster;
Repeat steps 3 and 4 until each cluster center does not change.

We will then visually display these steps in the form of graphs. First, we randomly select two points (K = 2) from the coordinates of a series of articles ):

We assign each article to the group closest to it:

We calculate the mean coordinate of all objects in each cluster as the new center of the cluster.

In this way, we have completed the first data iteration. Now we will re-assign the article to the corresponding cluster based on the new cluster center.

So far, we have found the cluster corresponding to each article! Obviously, even if you continue to iterate into the cluster center, the cluster corresponding to each article will not change.

The Ruby code for the above process is as follows:

@cluster_centers = [rand_point(), rand_point()]
15.times do
@clusters = [[], []]
@posts.each do |post|
min_distance, min_point = nil, nil
@cluster_centers.each.with_index do |center, i|
if distance(center, post) < min_distance
min_distance = distance(center, post)
min_point = i
end
end
@clusters[min_point] << post
end
@cluster_centers = @clusters.map do |post|
average(posts)
end
end

The following is the top 10 articles on the similarity between this method and the blog post "how to improve product quality by Support Teams:

You know this better or you are smarter.
Three guidelines for customer feedback
Obtain the information you want from the customer
Product Delivery is just the beginning
What do you think function extensions look like?
Understand your user base
Convert customers with correct information and time
Communicate with your customers
Does your application have a message push schedule?
Have you tried to communicate with the customer?

The results are self-explanatory.

We only use less than 40 lines of code and SIMPLE algorithm introduction to implement this idea. However, if you read an academic paper, you will never know how simple this is. The following is a summary of the K-means algorithm paper (I don't know who proposed the K-means algorithm, but this is the first article to propose the term "K-means ).

If you like to express your thoughts with mathematical symbols, there is no doubt that academic papers are very useful. However, there are actually more high-quality resources to replace these complicated mathematical formulas, which are more practical and approachable.

Wiki Encyclopedia (for example, potential semantic index and cluster analysis)
Source code of the open-source machine learning Library (for example, Scipy's K-Means, Scikit's DBSCAN)
Books written by programmers (for example, collective intelligent programming and hacker machine learning)
Khan College

Try

How do I manage application recommendation labels for your project? How to design your customer support tools? Or how do Users Group in social networks? These can be implemented through simple answer code and simple algorithms, which is a good opportunity to practice! Therefore, if you think that the problems faced by the project can be solved through machine learning, why do you have to hesitate?

Machine Learning is actually easier than you think!

Original article: Intercom Translation: bole online-zhibinzeng
Http://blog.jobbole.com/53546/.

========================================================== ====================
PPC platform activated!
Search for "PHPChina" and click "follow" to obtain the latest and most professional industry information pushed by PPC. More topics will be provided for you.
[PPC mining]: provides you with stories about classic products and product people from time to time.
[Ppc foreign language]: Share a foreign language translation article every day.
[PPCoder]: replies to users' questions on a daily basis.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning is actually easier than you think.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning is actually easier than you think.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support