International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Machine Learning Quick Start (3)

Last Update:2016-03-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Abstract: This article briefly describes how to use clustering to analyze the actual political trend of American senator through voting records

Statement: (the content of this article is not original, but it has been translated and summarized by myself. Please indicate the source for reprinting)

The content of this article Source: https://www.dataquest.io/mission/60/clustering-basics

The linear regression and classification used in the previous two articles are both supervised Machine Learning (training models based on existing data and predicting unknown data ), unsupervised learning is not an attempt to predict anything, but to find features in data. In unsupervised learning, an important method is clustering. Clustering Algorithms aggregate data with the same features in a group.

Raw data presentation

When the US Senate wants to pass a law, it is up to the senator to vote. These members mainly come from two political parties: Democrats and Republicans ), the data used now is the voting records of these members. Each row represents a member's situation (party-party, D stands for the Republican party, R stands for the Democratic party, and I stands for the non-partisan party, the third column represents the vote of a certain bill. 1 stands for favor, 0 stands for opposition, and 0.5 stands for waiver)

import pandasvotes = pandas.read_csv('114_congress.csv')

Print (votes ["party"]. value_counts ())

From sklearn. metrics. pairwise import euclidean_distancesprint (euclidean_distances (votes. iloc [0, 3:], votes. iloc [:]) # because the first three columns are not numeric type, data in the first three columns must be excluded.

Import pandas slave Rom sklearn. cluster import KMeans # The n_clusters parameter specifies the number of groups. random_state = 1 is used to reproduce the same result. kmeans_model = KMeans (n_clusters = 2, random_state = 1) # Use fit_transform () method To train the model senator_distances = kmeans_model.fit_transform (votes. iloc [:, 3:])

An ndarray is generated. Each row represents a member. The first column represents the distance between the member and the first group. The second column represents the distance between the member and the second group.

Labels = kmeans_model.labels_print (pd. crosstab (labels, votes ["party"])

Democratic_outliers = votes [(labels = 1) & (votes ["party"] = "D")]

Plt. scatter (x = senator_distances [:, 0], y = senator_distances [:, 1], c = labels) plt. show ()

Extremism = (senator_distances ** 3 ). sum (axis = 1) votes ["extremism"] = extremismvotes. sort ("extremism", inplace = True, ascending = False) # sort in descending order based on the radicals

Summary

Clustering is a powerful method used to find data features. When supervised machine learning methods are not making progress, you can try unsupervised learning methods. Generally, it is a good start to use unsupervised learning before using supervised learning methods.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

The difference between OS and sys two modules in Python 04-05

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning Quick Start (3)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support