This essay records how to import Google-cluster-data-2011-1-2 's
Job_events and task_events to MySQL
1. Download the data
Download_job_events:
ImportUrllib2url='https://commondatastorage.googleapis.com/clusterdata-2011-2/'F= Open ('C:\\sha256sum') L=f.readlines () f.close ( ) forIinchL:ifI.count ('job_events') >0:fileaddr= I.split () [1][1:] FileName= Fileaddr.split ('/') [1] Print 'Downloading', fileName data= Urllib2.urlopen (url+fileaddr). Read ()Print 'Saving', FileName Filedown= Open ('c:\\job_events\\'+filename,'WB') filedown.write (data) filedown.close ()
Download_task_events:
ImportUrllib2url='https://commondatastorage.googleapis.com/clusterdata-2011-2/'F= Open ('C:\\sha256sum') L=f.readlines () f.close ( ) forIinchL:ifI.count ('task_events') >0:fileaddr= I.split () [1][1:] FileName= Fileaddr.split ('/') [1] Print 'Downloading', fileName data= Urllib2.urlopen (url+fileaddr). Read ()Print 'Saving', FileName Filedown= Open ('c:\\task_events\\'+filename,'WB') filedown.write (data) filedown.close ()
Note: The data used this time is
Clusterdata-2011-2
Unlike the previous redrawing of the Googleclusterdata
Clusterdata-2011-1
2. Unzip
Since the data in the compressed package cannot be imported directly into MySQL, it is first decompressed
Unzip_job_events:
ImportgzipImportOsfilenames= Os.listdir ('c:\\task_events') forLinchFileNames:Print 'Now at :'+L F= Gzip.open ('c:\\job_events\\'+l) FOut= Open ('c:\\job_events_unzip\\'+l[:-3],'W') Content=F.read () fout.write (content) F.close () fout.close ()#raw_input ()
Unzip_task_events:
ImportgzipImportOsfilenames= Os.listdir ('c:\\task_events') forLinchFileNames:Print 'Now at :'+L F= Gzip.open ('c:\\task_events\\'+l) FOut= Open ('c:\\task_events_unzip\\'+l[:-3],'W') Content=F.read () fout.write (content) F.close () fout.close ()
3. Build a Database
Create_job_events:
Create Table bigint int bigintint, Usertextinttext text = MyISAM;
Create_task_events:
Create TableTask_events ( timebigint, Missing_infoint, job_idbigint, Task_indexbigint, machine_idbigint, Event_typeint,User text, Scheduling_classint, Priorityint, Cpu_requestfloat, Memory_requestfloat, Disk_space_requestfloat, difference_machine_restriction Boolean) engine=MyISAM
Note: Due to the large amount of data, it is important to select MyISAM as the engine.
4. Import data
Because there are some empty values in the data, you need to set MySQL to import null values first.
The specific methods are:
console input in MySQL
SET @ @GLOBAL. sql_mode= "No_auto_create_user,no_engine_substitution";
You can then start importing the data.
loadjobevents2mysql.py
ImportOSImportMysqldbfilenames= Os.listdir ('C:\\task_events_unzip') Conn=mysqldb.connect (host="localhost", user="Root", passwd="123456", db="Googleclusterdata", charset="UTF8") Cursor=conn.cursor () cursor.execute ('truncate job_events') forFinchFileNames:Print 'Now at :'+F Order="load Data infile ' c:/job_events_unzip/%s ' into table job_events fields terminated by ', ' lines terminated by ' \ n '"%FPrintOrder Cursor.execute (Order) Conn.commit ()
loadtaskevents2mysql.py
ImportOSImportMysqldbfilenames= Os.listdir ('C:\\task_events_unzip') Conn=mysqldb.connect (host="localhost", user="Root", passwd="123456", db="Googleclusterdata", charset="UTF8") Cursor=conn.cursor () cursor.execute ('truncate task_events') forFinchFileNames:Print 'Now at :'+F Order="load Data infile ' c:/task_events_unzip/%s ' into table task_events fields terminated by ', ' lines terminated by ' \ n ' "%FPrintOrder Cursor.execute (Order) Conn.commit ()
Note: The corresponding change password and database name (db) are required here
Import Googleclusterdata to MySQL