Improving the pairing effect of dating sites using the K-Nearest neighbor algorithm

Source: Internet
Author: User


If you think of an online dating site looking for a date, you're likely to classify all users of the dating site as three categories:

1. Do not like the

2. A bit of glamour

3. Very attractive

How do you decide which category A user belongs to? Presumably you will analyze the user's information to get a conclusion, such as the user "frequent flyer miles per year", "playing online games consumes more time than", "consumption of ice cream litres per week."

The K-Nearest neighbor algorithm using machine learning can help you to obtain the user's three information, automatically help you to classify the user, how convenient!

This article will show you how to implement such an automatic classification program specifically.

First step: Collect and prepare data

First, gather some dating data-as much as you can.

It then stores the data collected in a TXT file, for example, each sample data can be a row,

The three analysis data (features) and analysis Results (integer representations) mentioned in the preface are each a column, as follows:


The function then writes the data out and into the in-memory data structure:

1 #Import numpy Math Operations Library2 ImportNumPy3 4 # ==============================================5 #Input:6 #Training Set file name (with path)7 #Output:8 #feature matrix and label vectors9 # ==============================================Ten defFile2matrix (filename): One     'Get training Set data' A      -     #Open the training set file -FR =open (filename) the     #gets the number of file rows -NumberOfLines =Len (Fr.readlines ()) -     #file pointer 0 - (0) +     #initializing a feature matrix -Returnmat = Numpy.zeros ((numberoflines,3)) +     #Initialize Tag vectors AClasslabelvector = [] at     #the line number of the characteristic matrix is also the sample number -index =0 -      -      forLineinchFr#traverse all rows in the training set file -         #Remove line breaks and tabs at the end of a line of clothes.  -line =Line.strip () in         #to split a row with a tab -Listfromline = Line.split ('\ t') to         #deposit The characteristic part of the row data into the feature matrix +Returnmat[index,:] = Listfromline[0:3] -         #put the row label part of the data into the label matrix theClasslabelvector.append (int (listfromline[-1])) *         #Sample number +1 $Index + = 1Panax Notoginseng          -     returnReturnmat,classlabelvector

Step two: Analyze the data

Once you get the data, you can print to see what you got, as follows:


Obviously, such a display is very unfriendly, and you should use the Python matplotlib library to visualize the captured data.

If you are using the Eclipse plugin to compile Pydev under Ubuntu, installing Matplotlib is a pit.

After getting to the installation package, you also have to add a new library path to the plug-in settings, because Matplotlib is not automatically installed into the Python2.7 Library directory, which differs from NumPy.

The following is the correct library path:


You can then write the following code to analyze the data:

 1  #   2  fig = Plt.figure ()  3   set 1 row 1 to list the area of the plot and select the 1th area to display the data.  4  ax = fig.add_subplot (111)  5  #   in the first column of the training set (the time consumed by playing online games) is the row of the data analysis diagram, the second column ( The number of ice cream litres consumed per week) is the column of the data analysis chart.  6  Ax.scatter (datingdatamat[:,1], Datingdatamat[:,2])  7  #   8 () 

Also remember to include the required matplotlib Library at the top of the code:

1 # Import matplotlib Library 2 Import Matplotlib.pyplot as Plt 3 Import Matplotlib

After running, the output data analysis diagram is as follows:


A problem is found here, the above data analysis diagram does not show the results of the classification.

Further optimizing the data analysis diagram shows some of the code:

1     #Create a new Diagram object2Fig =plt.figure ()3     #set 1 rows and 1 columns of the plot area and select the 1th area to display the data. 4Ax = Fig.add_subplot (111)5     #in the first column of the training set (time spent playing online games) is the row of the data analysis chart, the second column (the number of ice cream litres consumed per week) is the column of the data analysis chart. 6Ax.scatter (datingdatamat[:,1], datingdatamat[:,2], 15.0*numpy.array (datinglabels), 15.0*Numpy.array (datinglabels))7     #Axis demarcation8Ax.axis ([ -2,25,-0.2,2.0])9     #Axis description (matplotlib configuration Chinese display a little trouble here directly in English well)TenPlt.xlabel ('Percentage of time spent Playing Online games') OnePlt.ylabel ('liters of Ice Cream consumed Per Week') A     #Display data analysis diagram ()

Get the following data analysis diagram:


You can also use the same method to get "frequent flyer miles per year" and "Time spent playing online games" for the axis diagram:


Step three: Live it than the data


Improving the pairing effect of dating sites using the K-Nearest neighbor algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.