Using machine learning to scientifically prove oneself to be a appearance party

Source: Internet
Author: User

Recently in the "Machine learning actual combat" when an idea, and go to the Internet to crawl some data in accordance with the method of the book to deal with, not only can deepen their understanding of the book, the way can also be popular in GitHub Lala. Just look at the decision tree This chapter, the book's Theory and examples let me think that the theory and the choice of objects simply can not be appropriate, read the appearance of education, read the education to see income. If you can crawl women's data from dating sites, manually label them, and build a decision tree based on these data, then you can find your own mate mode! GitHub project: Huatian-funny, here is a detailed explanation.

Data crawling

Before the good edge of the century to crawl a similar data, the overall feeling is that the above user data is either basically not filled or a look on the very false, some old drivers around the suggestion can be seen on the flower field online, the data quality is really much higher, the only drawback is that the above data do not crawl, search user's API needs to login, And it only shows information for more than 30 users. Just the number of data I need is very small, the search conditions are very thin, each time the amount of data collected is very large, but the final collection of the volume is quite considerable, and finally acquired the location in Shanghai age 22-27 total of about 2000 user data. Fill in the user name and password in the spider.py, run this file directly can crawl data, because the amount of data is not very large, can be run soon, the data stored in MongoDB are as follows:

Crawlers Use the tool is requests, the process is also very simple, the first to send a login request to obtain a cookie, and then call the search API to get the data, get the data is in JSON format, do not need any conversion directly store MongoDB, very convenient, The only thing that wants to vomit is the Flower field Search API interface is the Post method, too no professional standards. A little mention of how to use the request to obtain a cookie, using the session to build a Session object, use this object to send a login request, after the request will automatically take the login to return the cookie, the use is very simple.

1  from Import Session 2 3 Session = session ()4 session.post (Login_url, Data=post_data, headers=post_headers)  5 response = Session.get (Search_url, Headers=get_headers)

Label users

Because the decision tree belongs to supervised learning, it needs a given label, so it is necessary to give a label according to the judgment of the user's appearance, age, education, and so on, and the decision tree can reflect the criteria of choosing a mate to some extent. The label for women is very simple and rough, only satisfied and dissatisfied with the two, interested students can be in accordance with the real situation set more labels, such as excellent, general, spare tires, unqualified and so on. Because appearance is an essential element in the process of selecting objects, it is important to quantify the appearance, because there is no relevant tool to score according to the Avatar, only individual subjective quantification, using the very popular now very system.

In order to increase the efficiency of marking, specifically wrote a desktop window, run mark.py can, run the results are as follows. (Tkinter is a pit, the time to tune the code is enough for me to see the whole data set several times, but it's fun when it's really used up)

Note: Because many users just started to look at the picture, age, height, salary, education, this five information, so the whole process only refer to the five dimensions for evaluation, the following decision tree is also based on the five dimensions of processing.

Training Data Decision Tree

In machine learning, a decision tree is a predictive model that represents a mapping between object properties and object values. Each node in the tree represents an object, and each fork path represents a possible property value, and each leaf node represents the value of the object represented by the path from the root node to the leaf node. The decision tree has only a single output, and if you want to have complex output, you can create an independent decision tree to handle different outputs. The machine learning technology from the data generation decision tree is called Decision Tree Learning, the popular point is the decision tree, plainly speaking, this is a kind of prediction tree based on classification, training, according to the known prediction, classification of the future.

Theory I can refer to the "Machine Learning Combat" Chapter III or this blog, very simple and understandable explanation of the specific principles, I will not repeat.

Results show

Code reference is "machine learning actual Combat", for the reality of their own to do some optimization adjustment, and the original code is not exactly the same, run train.py can show the results, as follows:

Because the line is very crowded, adjusted for a long time can only get this effect. Here has been very clear to clarify the theme, I am a appearance party, Yan value High pass, Yan value low neglect, not high not low consideration of quite tangled. Interested students can try it on their own.

PS1: Actually don't want to admit oneself is a appearance association member, person Ugly Yan control doomed lonely life.

PS2: Because the process of labeling is a bit random, so there is a part of the inaccurate.

PS3: No dating plan, no appointment.

Using machine learning to scientifically prove oneself to be a appearance party

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.