Recently seen a good article, transferred from the cloud Habitat community.
The K-means algorithm has a long history and is one of the most commonly used clustering algorithms. The K-means algorithm is very simple to implement, so it is ideal for novice machine learning enthusiasts. First, we review the origin of the K-means algorithm, and then introduce its typical application scenario. Origins
In 1967, James Macqueen the term "K-means" for the first time in his paper, "Some methods for multivariate observational classification and analysis". In 1957, Bell Labs also used the standard algorithms for pulse-coded modulation techniques. In 1965, E.W Forgy published essentially the same algorithm--lloyd-forgy algorithm. What is the K-means algorithm.
Clustering is the grouping of data into groups so that data points in the same group are more similar to data points in other groups. In short, clustering is the division of data points with similar characteristics into groups, that is, a cluster. The goal of the K-means algorithm is to find groups in the data, with the number of groups represented by the variable K. According to the characteristics provided by the data, each data point is assigned to one of the groups in the K group by an iterative operation. In the figure below, K = 2, so that two clusters can be identified from the original data set.
10 Interesting use cases of the K-means algorithm
Executes the K-means algorithm on a data set whose output is:
1.K Center point: Each center point of the K cluster identified from the data set.
2. Complete markup of the dataset to ensure that each data point can be assigned to one of the clusters.
Ten use cases of K-means algorithm
The K-means algorithm can often be applied to data sets of dimensions, small numbers, and continuous datasets, such as grouping the same things from a randomly distributed set of things.
1. Document Classifier
Divides a document into several different categories based on the label, subject, and document content. This is a very standard and classic K-means algorithm classification problem. First, it is necessary to initialize the document, to represent each document as vectors, and to use the term frequency to identify common terms for document categorization. The document vectors are then clustered to identify similarities in the document group. Here is a case of K-means algorithm implementation for document classification.
2. Item Transfer Optimization
Using the combination of K-means algorithm to find the best transmitting location and genetic algorithm of UAV to solve the traveling route problem, optimize the UAV goods transfer process. This is the white paper for the project.
3. Identify the location of the crime
The use of relevant crime data in specific areas of the city, the analysis of the crime category, the location of the crime, and the link between the two can be a high-quality survey of areas prone to crime in cities or regions. This is a paper based on crime data from the Delhi flight information region.
4. Customer classification
Clustering can help marketers improve their customer base (working within their target area) and further segment customer categories based on customer purchase history, interest, or activity monitoring. This is a white paper on how telecom operators divide pre-paid customers into top-up models, send text messages, and browse the website in several categories. Classifying customers helps companies create specific ads for a specific customer base.
5. Team Status Analysis
Analyzing the player's state has always been a key element of the sport. Machine learning plays a crucial role in this field as the competition gets fiercer. The K-means algorithm is a good choice if you want to create a good team and like to identify players based on player status. Please refer to this article for specific details and implementation.
6. Insurance Fraud Detection
Machine learning also plays a critical role in fraud detection and is widely used in automotive, medical insurance, and insurance fraud detection. Use historical data from past fraudulent claims to identify new claims based on their similarity to fraudulent pattern clustering. Fraud detection is critical to companies because insurance fraud can cost the company $ millions of. This is a white paper that uses clustering to detect fraud in auto insurance.
7. Bus Data Analysis
The data set for Uber's drive information, which is open to the public, provides us with a wealth of valuable data sets for transportation, transit times, peak rides, and more. Analyzing this data is not only good for Uber, but it also helps us understand the city's traffic patterns to help us with our urban future planning. This is an article that uses a single sample dataset to analyze the Uber data process.
8. Cyber-analysis of criminals
Network analysis is the process of collecting data from individuals and groups to identify important relationships between them. The network analysis originates from the crime file, which provides information about the investigative department to classify criminals at the crime scene. This is an academic environment, how to according to user data preferences to Cyber-profile network users of the paper.
9. Detailed analysis of call history
Call Detail Recording (CDR) is a collection of telephone calls, text messages, and network activity information that is being collected by telecommunications companies. Combining call detail records with customer profiles can help telecom companies anticipate more of their customers ' needs. In this article, you will learn how to use the unsupervised K-means Clustering algorithm to cluster customers 24 hours a day to understand how customers are using them within a few hours.
automated clustering of 10.IT alarms
Large Enterprise IT infrastructure technology components, such as networks, storage, or databases, generate a large number of alert messages. Because alert messages can point to specific actions, you must manually filter the alert information to ensure that subsequent processes are prioritized. Clustering data allows you to gain insight into alert categories and average repair time, which can help predict future failures.
The above is the translation.
This article is translated by Alibaba Cloud community organization.
Article original title "Ten interesting use Cases for the K-means algorithm", translator: Mags, Revision: Roman.