Import Googleclusterdata to MySQL

Source: Internet
Author: User

This essay records how to import Google-cluster-data-2011-1-2 's

Job_events and task_events to MySQL

1. Download the data

Download_job_events:

ImportUrllib2url='https://commondatastorage.googleapis.com/clusterdata-2011-2/'F= Open ('C:\\sha256sum') L=f.readlines () f.close ( ) forIinchL:ifI.count ('job_events') >0:fileaddr= I.split () [1][1:] FileName= Fileaddr.split ('/') [1]        Print 'Downloading', fileName data= Urllib2.urlopen (url+fileaddr). Read ()Print 'Saving', FileName Filedown= Open ('c:\\job_events\\'+filename,'WB') filedown.write (data) filedown.close ()

Download_task_events:

ImportUrllib2url='https://commondatastorage.googleapis.com/clusterdata-2011-2/'F= Open ('C:\\sha256sum') L=f.readlines () f.close ( ) forIinchL:ifI.count ('task_events') >0:fileaddr= I.split () [1][1:] FileName= Fileaddr.split ('/') [1]        Print 'Downloading', fileName data= Urllib2.urlopen (url+fileaddr). Read ()Print 'Saving', FileName Filedown= Open ('c:\\task_events\\'+filename,'WB') filedown.write (data) filedown.close ()

Note: The data used this time is

Clusterdata-2011-2

Unlike the previous redrawing of the Googleclusterdata

Clusterdata-2011-1

2. Unzip

Since the data in the compressed package cannot be imported directly into MySQL, it is first decompressed

Unzip_job_events:

ImportgzipImportOsfilenames= Os.listdir ('c:\\task_events') forLinchFileNames:Print 'Now at :'+L F= Gzip.open ('c:\\job_events\\'+l) FOut= Open ('c:\\job_events_unzip\\'+l[:-3],'W') Content=F.read () fout.write (content) F.close () fout.close ()#raw_input ()

Unzip_task_events:

ImportgzipImportOsfilenames= Os.listdir ('c:\\task_events') forLinchFileNames:Print 'Now at :'+L F= Gzip.open ('c:\\task_events\\'+l) FOut= Open ('c:\\task_events_unzip\\'+l[:-3],'W') Content=F.read () fout.write (content) F.close () fout.close ()

3. Build a Database

Create_job_events:

Create Table  bigint int bigintint,  Usertextinttext text   = MyISAM;

Create_task_events:

Create TableTask_events ( timebigint, Missing_infoint, job_idbigint, Task_indexbigint, machine_idbigint, Event_typeint,User text, Scheduling_classint, Priorityint, Cpu_requestfloat, Memory_requestfloat, Disk_space_requestfloat, difference_machine_restriction Boolean) engine=MyISAM

Note: Due to the large amount of data, it is important to select MyISAM as the engine.

4. Import data

Because there are some empty values in the data, you need to set MySQL to import null values first.

The specific methods are:

console input in MySQL

SET @ @GLOBAL. sql_mode= "No_auto_create_user,no_engine_substitution";

You can then start importing the data.

loadjobevents2mysql.py

ImportOSImportMysqldbfilenames= Os.listdir ('C:\\task_events_unzip') Conn=mysqldb.connect (host="localhost", user="Root", passwd="123456", db="Googleclusterdata", charset="UTF8") Cursor=conn.cursor () cursor.execute ('truncate job_events') forFinchFileNames:Print 'Now at :'+F Order="load Data infile ' c:/job_events_unzip/%s ' into table job_events fields terminated by ', ' lines terminated by ' \ n '"%FPrintOrder Cursor.execute (Order) Conn.commit ()

loadtaskevents2mysql.py

ImportOSImportMysqldbfilenames= Os.listdir ('C:\\task_events_unzip') Conn=mysqldb.connect (host="localhost", user="Root", passwd="123456", db="Googleclusterdata", charset="UTF8") Cursor=conn.cursor () cursor.execute ('truncate task_events') forFinchFileNames:Print 'Now at :'+F Order="load Data infile ' c:/task_events_unzip/%s ' into table task_events fields terminated by ', ' lines terminated by ' \ n ' "%FPrintOrder Cursor.execute (Order) Conn.commit ()

Note: The corresponding change password and database name (db) are required here

Import Googleclusterdata to MySQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.