cdmc2016 Data Mining Contest topics Android Malware classification

Source: Internet
Author: User
Tags md5 hash

http://www.csmining.org/cdmc2016/

Data Mining Tasks descriptiontask 1:2016 e-news categorisation

For this year, the dataset was sourced from 6 online news media:

The New Zealand Herald (www.nzherald.co.nz), Reuters (www.reuters.com), The Times ( www.timesonline.co.uk), Yahoo News (news.yahoo.com), BBC (www.bbc.co.uk) and the press ( www.stuff.co.nz).

Business, Entertainment, sport, technology, and travel is the selected five news categories. Each document of the dataset was labelled manually by skimming over the text and determining the category. The provided data files, each news piece are formatted as one line pure text with the last character as the class label (for training data), and we removed all punctuations and symbols during the data formation.

Note that; The dataset text is encrypted to fair play purpose, and this task was not aiming for decryption practices. So any uses of such technique is prohibited and should be avoided on your methods used for competition. Any participants alleged with this misconduct would be declared void results.

The statistical information of the training dataset is summarised as below:

Topic No. of News
Business 361
Entertainment 343
Sport 363
Technology 356
Travel 362
Task 2:unitecloud Operation Log for Anomaly Detection

Unitecloud is a resilient private Cloud infrastructure created in New Zealand Unitec Institute of technology using OPENNEB ULA for cloud orchestration and KVM for virtualization.

This dataset is the operational data, captured from real-time running Unitecloud server with a sample period of 1-minu Te interval. There is 243 features for each sample, which correspond to operational measurements of 243 sensors from the Unitecloud SE RVers. The file is labelled accordingly by anomalous events and anomaly category determination over the collected log data. In the supplied training datasets, we provide 57,654 samples, with 243 sensor operation values for each sample, and the non -zero labels in the last column indicate the seven anomalous events.

The goal of this task are to identify various abnormal events accurately from ranges of sensor log files without high Compu Tational costs.

The statistical information of this dataset is summarized as:

No. of Sample No. of Features No. of Classes

No. of Training

No. of testing

82,363 243 8 57,654 24,709
Task 3:android Malware Classification

This dataset was created from a set of APK (application package) files collected from the Opera Mobile Store over the Perio D of January to September of 2014. Just like Windows (PC) systems use a. exe file for installing software,android use APK files for installing software on T He Android operating system.

The permission system is applied as a measure to restrict access to privileged system resources and is considered as th E first barrier to malware. Application developers has to explicitly declare the permissions in the Androidmanifest.xml file contained in the APK. All official Android permissions is categorized into four types:normal, dangerous, Signature and Signatureorsystem. As dangerous permissions has access to restricted resources and can has a negative impact if used incorrectly, they requ IRE user ' s approval at installation.

To be taken as the input of a machine-learning algorithm, permissions is commonly coded as binary variables i.e., an elem Ent in the vector could only take on the values:1 for a requested permission and 0 otherwise. The number of all possible Android permissions varies based on the version of the OS. In this task, for each APK file under consideration, we provide a list of the permissions declared in its andoridmanifest.xml File. The class label of the APK file--+1 if it is regarded as malicious and-1 otherwise--was determined by the detection re Sults of security appliances hosted by VirusTotal. Note that adware is not counted as malware in our setting. The participants of CDMC competition is invited to design a classifier that could best match this result.

The statistical information of the dataset is summarized as:

No. of APK files No. of Permissions No. of Classes No. of Training No. of testing
61,730 Up to 583 2 30,920 30,810

Also, the MD5 hash is provided if you are need for checksum:
Cdmc2016_androidpermissions.train, MD5 (473F64D9E650E82325B1CE0216CC50C9)
Cdmc2016_androidlabels.train, MD5 (784B2CE7DA61FF2935DCA770C4BCBFB3)
Cdmc2016_androidpermissions.test, MD5 (192C70A8489C41FA95F5B95732FCDFB1)

cdmc2016 Data Mining Contest topics Android Malware classification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.