匯入GoogleClusterData到MySQL

來源:互聯網
上載者:User

標籤:style   blog   http   io   ar   color   os   使用   sp   

  本篇隨筆記錄如何匯入google-cluster-data-2011-1-2的

job_events和task_events到MySQL

1. 下載資料

download_job_events:

import urllib2url = ‘https://commondatastorage.googleapis.com/clusterdata-2011-2/‘f = open(‘C:\\SHA256SUM‘)l = f.readlines()f.close()for i in l:    if i.count(‘job_events‘)>0:        fileAddr = i.split()[1][1:]        fileName = fileAddr.split(‘/‘)[1]        print ‘downloading‘, fileName        data = urllib2.urlopen(url+fileAddr).read()        print ‘saving‘, fileName        fileDown = open(‘C:\\job_events\\‘+fileName, ‘wb‘)        fileDown.write(data)        fileDown.close()

download_task_events:

import urllib2url = ‘https://commondatastorage.googleapis.com/clusterdata-2011-2/‘f = open(‘C:\\SHA256SUM‘)l = f.readlines()f.close()for i in l:    if i.count(‘task_events‘)>0:        fileAddr = i.split()[1][1:]        fileName = fileAddr.split(‘/‘)[1]        print ‘downloading‘, fileName        data = urllib2.urlopen(url+fileAddr).read()        print ‘saving‘, fileName        fileDown = open(‘C:\\task_events\\‘+fileName, ‘wb‘)        fileDown.write(data)        fileDown.close()

注意:這次用的資料是

clusterdata-2011-2

不同於之前重畫GoogleCLusterData中的

clusterdata-2011-1

2. 解壓縮

由於不能直接匯入壓縮包裡的資料到mysql,故先將它們解壓縮

unzip_job_events:

import gzipimport osfileNames = os.listdir(‘C:\\task_events‘)for l in fileNames:    print ‘now at: ‘+ l    f = gzip.open(‘C:\\job_events\\‘+l)    fOut = open(‘C:\\job_events_unzip\\‘+l[:-3], ‘w‘)    content = f.read()    fOut.write(content)    f.close()    fOut.close()    #raw_input()

unzip_task_events:

import gzipimport osfileNames = os.listdir(‘C:\\task_events‘)for l in fileNames:    print ‘now at: ‘+ l    f = gzip.open(‘C:\\task_events\\‘+l)    fOut = open(‘C:\\task_events_unzip\\‘+l[:-3], ‘w‘)    content = f.read()    fOut.write(content)    f.close()    fOut.close()

3. 建資料庫

create_job_events:

create table job_events(time bigint,missing_info int,job_id bigint,event_type int,user text,scheduling_class int,job_name text,logical_job_name text)engine = myisam;

create_task_events:

create table task_events(time bigint,missing_info int,job_id bigint,task_index bigint,machine_id bigint,event_type int,user text,scheduling_class int,priority int,cpu_request float,memory_request float,disk_space_request float,difference_machine_restriction boolean)engine = myisam;

注意:由於資料量非常大,這裡一定要選擇myisam作為engine。

4. 匯入資料

由於資料中有部分為空白的值,需要先設定mysql使其能夠匯入空值。

具體方法為:

在mysql的控制台輸入

SET @@GLOBAL.sql_mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION";

之後就可以開始匯入資料了。

loadJobEvents2MySQL.py

import osimport MySQLdbfileNames = os.listdir(‘C:\\task_events_unzip‘)conn=MySQLdb.connect(host="localhost",user="root",passwd="123456",db="googleclusterdata",charset="utf8")cursor = conn.cursor()cursor.execute(‘truncate job_events‘)for f in fileNames:    print ‘now at: ‘+ f    order = "load data infile ‘C:/job_events_unzip/%s‘ into table job_events fields terminated by ‘,‘ lines terminated by ‘\n‘" %f    print order    cursor.execute(order)    conn.commit()

loadTaskEvents2MySQL.py

import osimport MySQLdbfileNames = os.listdir(‘C:\\task_events_unzip‘)conn=MySQLdb.connect(host="localhost",user="root",passwd="123456",db="googleclusterdata",charset="utf8")cursor = conn.cursor()cursor.execute(‘truncate task_events‘)for f in fileNames:    print ‘now at: ‘+ f    order = "load data infile ‘C:/task_events_unzip/%s‘ into table task_events fields terminated by ‘,‘ lines terminated by ‘\n‘" %f    print order    cursor.execute(order)    conn.commit()

注意:這裡需要相應的修改密碼和使用的資料庫名(db)

匯入GoogleClusterData到MySQL

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.