Predicting a Biological Response, kagglepredicting

Last Update:2014-11-27 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Predicting a Biological Response, kagglepredicting
Predict a biological response of molecules from their chemical properties
Predict biological reactions from the chemical properties of molecules.
The objective of the competition is to help us build as good a model as possible so that we can, as optimally as this data allows, relate molecular information, to an actual biological response.

We have shared the data in the comma separated values (CSV) format. each row in this data set represents a molecule. the first column contains experimental data describing an actual biological response; the molecule was seen to elicit this response (1), or not (0 ). the remaining columns represent molecular descriptors (d1 through d1776), these are calculated properties that can capture some of the characteristics of the molecule-for example size, shape, or elemental constitution. the descriptor matrix has been normalized.

Brief description: Given a csv file, each line represents a molecule. The first column indicates the actual biological reaction, which has a reaction (1) and no reaction (0 ). Columns from 2nd to 1777th represent the attributes of a molecule, such as the size, shape, or element.

The competition for this question has long ended, but you can still submit 5 results to view your score ranking. You only need to submit a result file in csv format.

If we see 0 and 1, we can determine that this is a binary classification problem.

For such a binary classification and multi-attribute problem, you should first try it with Logistic regression.

The following python code uses Logistic Regression for prediction:

#! /Usr/bin/env python # coding: UTF-8 ''' Created on August 1 @ author: zhaohf ''' from sklearn. linear_model import LogisticRegressionfrom numpy import genfromtxt, savetxtdef main (): dataset = genfromtxt (open ('.. /Data/train.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] test = genfromtxt (open ('.. /Data/test.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] target = [x [0] for x in dataset] train = [x [1:] for x in dataset] lr = LogisticRegression () lr. fit (train, target) predicted_probs = [[index + 1, x [1] for index, x in enumerate (lr. predict_proba (test)] savetxt ('.. /Submissions/lr_benchmark.csv ', predicted_probs, delimiter =', ', fmt =' % d, % F', header = 'molecule, PredictedProbability ', comments = '') if _ name _ = '_ main _': main ()

Through the loss function test, the final public score is 0.59425, which is a very bad score. Ranking hundreds outside the market.

Next, use SVM to try it out. The code is very similar.

#! /Usr/bin/env python # coding: UTF-8 ''' Created on August 1 @ author: zhaohf ''' from sklearn import svmfrom numpy import genfromtxt, savetxtdef main (): dataset = genfromtxt (open ('.. /Data/train.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] test = genfromtxt (open ('.. /Data/test.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] target = [x [0] for x in dataset] train = [x [1:] for x in dataset] svc = svm. SVC (probability = True) svc. fit (train, target) predicted_probs = [[index + 1, x [1] for index, x in enumerate (svc. predict_proba (test)] savetxt ('.. /Submissions/svm_benchmark.csv ', predicted_probs, delimiter =', ', fmt =' % d, % F', header = 'moleculeid, PredictedProbability ', comments = '') if _ name _ = '_ main _': main ()

SVM scored 0.52553. It is slightly better than LR.

The best score on the rankings is 0.37356. You have to make some effort to get the best score in the competition.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More