Unsupervised machine learning algorithms no guidance is provided by any supervisor. That's why they are tightly integrated with real AI.
In unsupervised learning, there is no correct answer and no supervisor guidance. The algorithm needs to discover interesting data patterns for learning.
What is clustering?
Basically, it is a unsupervised learning method and a common technique used for statistical data analysis in many fields. Clustering is the task of dividing the observation set into subsets (called clusters) in a sense similar to the observations in the same cluster and not in the same way as the observations in other clusters. In short, it can be said that the main goal of clustering is to group data based on similarity and non-similarity.
For example, it shows similar data in different clusters-
Data Clustering Algorithm
Here are some common algorithms for data clustering-
K-means algorithm
K-means clustering algorithm is one of the well-known data clustering algorithms. We need to assume that the number of clusters is already known. This is also known as planar clustering. It is an iterative clustering algorithm. The algorithm needs to follow these steps-
1th Step -You need to specify the number of K subgroups that you want.
2nd Step -Repair the number of clusters and randomly assign each data point to the cluster. In other words, we need to classify the data according to the number of clusters.
In this step, the cluster centroid is computed.
Since this is an iterative algorithm, it is necessary to update the position of the K centroid in each iteration until the global optimal value is found or in other words the centroid reaches its best position.
The following code will help implement the K-means clustering algorithm in Python. We will use the Scikit-learn
module.
Import the required packages-
Import Matplotlib.pyplot as Plt Import Seaborn as SNS; Sns.set () Import NumPy as NP from Import Kmeans
The following line of code sklearn.dataset
make_blob
will generate a two-dimensional dataset containing four BLOBs by using the package.
from Import = make_blobs (N_samples = $, Centers = 4, = 0.40, random_state = 0)
You can use the following code to visualize a dataset-
Plt.scatter (x[:, 0], x[:, 1], s =);p lt.show ()
Get the following results-
Here, the Kmeans is initialized to the Kmeans algorithm and the n_clusters
required parameters for the number of clusters ().
Kmeans = Kmeans (n_clusters = 4)
The K-means model needs to be trained with input data.
='viridis'= Kmeans.cluster_centers_
The code given below will draw and visualize the discovery of the machine based on the data, and fit according to the number of clusters to be found.
' Black ', s = $, alpha = 0.5);p lt.show ()
Get the following results-
Mean offset algorithm
It is another popular and powerful clustering algorithm used in unsupervised learning. It does not make any assumptions, so it is a non-parametric algorithm. It is also referred to as hierarchical clustering or mean cluster analysis. The following will be the basic steps of the algorithm-
- First, you need to start with the data points that are assigned to their own clusters.
- Now it calculates the centroid and updates the position of the new centroid.
- By repeating this process, the vertices of the cluster are moved closer to the higher-density areas.
- The algorithm stops at the stage where the centroid is no longer moving.
With the help of the following code, the mean shift clustering algorithm is implemented in Python. Use the Scikit-learn module.
Import the necessary packages-
Import NumPy as NP from Import Meanshift Import Matplotlib.pyplot as Plt from Import Stylestyle.use ("ggplot")
The following code generates a sklearn.dataset
two-dimensional dataset containing four BLOBs by using Make_blob in the package.
from Import Make_blobs
You can visualize the dataset with the following code-
Centers = [[2,2],[4,5],[3,10= make_blobs (N_samples = $, Centers = centers, cluster_std = 1) plt.scatter (x[: , 0],x[:,1]) plt.show ()
Execute the above example code to get the following results-
Now we need to train the mean shift clustering model with input data.
ms == = Ms.cluster_centers_
The following code prints the cluster center and the expected number of clusters according to the input data-
print (cluster_centers) N_clusters_ = Len (np.unique (labels)) print ( " estimated clusters: ", N_clusters_)
[[ 3.23005036 3.84771893][ 3.02057451 9.88928991]]estimated Clusters: 2
The code given below will help to draw and visualize machine discoveries based on data, and assemble according to the number of clusters to be found.
colors = 10*['R.','G.','B.','c.','K.','y.','m.'] forIinchRange (len (X)): Plt.plot (x[i][0], x[i][1], Colors[labels[i]], markersize = 10) Plt.scatter (cluster_centers[:,0],cluster_centers[:,1],marker="x", color ='k', s = linewidths, = 5, ZOrder = 10) plt.show ()
Execute the above example code to get the following results-
Measure cluster Performance
Real-world data is not naturally organized into many unique clusters. For this reason, it is not easy to imagine and infer reasoning. This is why it is necessary to measure the clustering performance and its quality. It can be done with the help of contour analysis.
Contour Analysis
This method can be used to check the quality of clustering by measuring distances between clusters. Basically, it provides a way to evaluate parameters such as the number of clusters by giving a contour fraction. This score is a measure of the distance between each point in a cluster and the points in an adjacent cluster.
Analyze Contour Fractions
The score range is [-1,1]
. The following is an analysis of this score-
- A score of 1 points-close to +1 points indicates that the sample is far from the neighboring cluster.
- A score of 0 points-a score of 0 indicates that the decision boundary between a sample and two adjacent clusters is or is very close.
- A score of-1 points-a negative score indicates that the sample has been assigned to the wrong cluster.
Calculating Contour Fractions
In this section, we will learn how to calculate contour fractions.
Contour fractions can be calculated by using the following formula-
Here, '
Easy Hundred Tutorial Mobile : Please scan the bottom of this page (right) QR code and follow the public number, reply: " tutorial " Select the relevant tutorials to read or direct access: http://m.yiibai.com.
Yi Hundred tutorial ai python correction-ai unsupervised learning (clustering)