Predicting a Biological Response, kagglepredicting

Source: Internet
Author: User
Tags svm

Predicting a Biological Response, kagglepredicting
Predict a biological response of molecules from their chemical properties
Predict biological reactions from the chemical properties of molecules.
The objective of the competition is to help us build as good a model as possible so that we can, as optimally as this data allows, relate molecular information, to an actual biological response.

We have shared the data in the comma separated values (CSV) format. each row in this data set represents a molecule. the first column contains experimental data describing an actual biological response; the molecule was seen to elicit this response (1), or not (0 ). the remaining columns represent molecular descriptors (d1 through d1776), these are calculated properties that can capture some of the characteristics of the molecule-for example size, shape, or elemental constitution. the descriptor matrix has been normalized.

Brief description: Given a csv file, each line represents a molecule. The first column indicates the actual biological reaction, which has a reaction (1) and no reaction (0 ). Columns from 2nd to 1777th represent the attributes of a molecule, such as the size, shape, or element.

The competition for this question has long ended, but you can still submit 5 results to view your score ranking. You only need to submit a result file in csv format.

If we see 0 and 1, we can determine that this is a binary classification problem.

For such a binary classification and multi-attribute problem, you should first try it with Logistic regression.

The following python code uses Logistic Regression for prediction:

#! /Usr/bin/env python # coding: UTF-8 ''' Created on August 1 @ author: zhaohf ''' from sklearn. linear_model import LogisticRegressionfrom numpy import genfromtxt, savetxtdef main (): dataset = genfromtxt (open ('.. /Data/train.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] test = genfromtxt (open ('.. /Data/test.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] target = [x [0] for x in dataset] train = [x [1:] for x in dataset] lr = LogisticRegression () lr. fit (train, target) predicted_probs = [[index + 1, x [1] for index, x in enumerate (lr. predict_proba (test)] savetxt ('.. /Submissions/lr_benchmark.csv ', predicted_probs, delimiter =', ', fmt =' % d, % F', header = 'molecule, PredictedProbability ', comments = '') if _ name _ = '_ main _': main ()

Through the loss function test, the final public score is 0.59425, which is a very bad score. Ranking hundreds outside the market.

Next, use SVM to try it out. The code is very similar.

#! /Usr/bin/env python # coding: UTF-8 ''' Created on August 1 @ author: zhaohf ''' from sklearn import svmfrom numpy import genfromtxt, savetxtdef main (): dataset = genfromtxt (open ('.. /Data/train.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] test = genfromtxt (open ('.. /Data/test.csv ', 'R'), delimiter =', ', dtype = 'f8') [1:] target = [x [0] for x in dataset] train = [x [1:] for x in dataset] svc = svm. SVC (probability = True) svc. fit (train, target) predicted_probs = [[index + 1, x [1] for index, x in enumerate (svc. predict_proba (test)] savetxt ('.. /Submissions/svm_benchmark.csv ', predicted_probs, delimiter =', ', fmt =' % d, % F', header = 'moleculeid, PredictedProbability ', comments = '') if _ name _ = '_ main _': main ()

SVM scored 0.52553. It is slightly better than LR.

The best score on the rankings is 0.37356. You have to make some effort to get the best score in the competition.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.