Step by step and learn from me Hadoop (7)----Hadoop connection MySQL database perform data read-write database operations

Source: Internet
Author: User
Tags shuffle hadoop fs

to facilitate the MapReduce direct access to the relational database (mysql,oracle), Hadoop offers two classes of Dbinputformat and Dboutputformat. Through the Dbinputformat class, the database table data is read into HDFs, and the result set generated by MapReduce is imported into the database table according to the Dboutputformat class. when running MapReduce error: Java.io.IOException:com.mysql.jdbc.Driver, generally because the program can not find the MySQL driver package. The workaround is for each Tasktracker to be able to locate the driver package when it runs the MapReduce program.

There are two ways to add a package:

(1) Add the package under ${hadoop_home}/lib under each node. Restarting the cluster is generally a more primitive approach.

(2) a) upload the package to the cluster: Hadoop fs-put mysql-connector-java-5.1.0-bin.jar/hdfspath/

b) Before the Mr Program submits the job, add the statement: Distributedcache.addfiletoclasspath ("New Path" ("/hdfspath/mysql-connector-java-5.1.0-bin.jar") , conf);

MySQL database storage to Hadoop hdfsmysql table creation and data initialization
DROP TABLE IF EXISTS ' Wu_testhadoop '; CREATE TABLE ' wu_testhadoop ' (  ' id ' int (one) not null auto_increment,  ' title ' varchar (255) DEFAULT NULL,  ' Content ' varchar (255) Default NULL,  PRIMARY KEY (' id ')) engine=innodb auto_increment=3 default Charset=utf8;------- -------------------------Records of Wu_testhadoop------------------------------INSERT into ' wu_testhadoop ' VALUES (' 1 ', ' 123 ', ' 122312 '); INSERT into ' Wu_testhadoop ' VALUES (' 2 ', ' 123 ', ' 123456 ');
Defining Hadoop data Access

After the MySQL table is created, we need to define the rules for Hadoop to access MySQL;

Hadoop provides the org. Apache. Hadoop. io. The writable interface implements a simple and efficient serializable protocol that implements related functions based on datainput and DataOutput.

Hadoop also provides an orgfor database access. Apache. Hadoop. mapred. Lib. db. The Dbwritable interface, where the Write method is used to set the value of the PreparedStatement object, the ReadFields method is used to bind the column value of the object read from the database;

The above two interfaces are used as follows (content is derived from source)

Writable
Public class Mywritable implements writable {       //Some data            private int counter;       private long timestamp;              public void Write (DataOutput out) throws IOException {         out.writeint (counter);         Out.writelong (timestamp);       }              public void ReadFields (Datainput in) throws IOException {         counter = In.readint ();         timestamp = In.readlong ();       }              public static mywritable read (Datainput in) throws IOException {         mywritable w = new mywritable ();         W.readfields (in);         Return w;       }     

Dbwritable
Public class Mywritable implements writable, dbwritable {   //Some data        private int counter;   private long timestamp;          Writable#write () implementation public   void write (DataOutput out) throws IOException {     Out.writeint (counter );     Out.writelong (timestamp);   }          Writable#readfields () Implementation public   void ReadFields (Datainput in) throws IOException {     counter = In.readint ();     timestamp = In.readlong ();   }          public void Write (PreparedStatement statement) throws SQLException {     Statement.setint (1, counter);     Statement.setlong (2, timestamp);   }          public void ReadFields (ResultSet ResultSet) throws SQLException {     counter = resultset.getint (1);     timestamp = Resultset.getlong (2);   }  }
Database-Corresponding implementation
Package Com.wyg.hadoop.mysql.bean;import Java.io.datainput;import Java.io.dataoutput;import java.io.IOException; Import Java.sql.preparedstatement;import Java.sql.resultset;import Java.sql.sqlexception;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import org.apache.hadoop.mapred.lib.db.DBWritable ;p Ublic class Dbrecord implements writable, Dbwritable{private int id;private string Title;private string Content;public I NT GetId () {return ID;} public void setId (int id) {this.id = ID;} Public String GetTitle () {return title;} public void Settitle (String title) {this.title = title;} Public String getcontent () {return content;} public void SetContent (String content) {this.content = content;} @Overridepublic void ReadFields (ResultSet set) throws SQLException {this.id = Set.getint ("id"); this.title = Set.getstring ("title"); this.content = set.getstring ("content");} @Overridepublic void Write (PreparedStatement pst) throws SQLException {Pst.setint (1, id);p st.setstring (2, title);p St.sEtstring (3, content);} @Overridepublic void ReadFields (Datainput in) throws IOException {this.id = In.readint (), this.title = Text.readstring (in ); this.content = Text.readstring (in);} @Overridepublic void Write (DataOutput out) throws IOException {Out.writeint (this.id); Text.writestring (out, this.title); Text.writestring (out, this.content);}  @Overridepublic String toString () {return this.id + "" + This.title + "" + this.content; }}

Implement Map/reduce

Package Com.wyg.hadoop.mysql.mapper;import Java.io.ioexception;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.mapreducebase;import Org.apache.hadoop.mapred.Mapper; Import Org.apache.hadoop.mapred.outputcollector;import Org.apache.hadoop.mapred.reporter;import Com.wyg.hadoop.mysql.bean.DBRecord, @SuppressWarnings ("deprecation") public class Dbrecordmapper extends Mapreducebase implements Mapper<longwritable, Dbrecord, longwritable, text>{@Overridepublic void Map ( Longwritable Key, Dbrecord value,outputcollector<longwritable, text> Collector, Reporter Reporter) throws IOException {collector.collect (New longwritable (Value.getid ()), New Text (Value.tostring ()));}  }

Test Hadoop to connect to MySQL and store the data in HDFs

Package Com.wyg.hadoop.mysql.db;import Java.io.ioexception;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileoutputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.lib.identityreducer;import Org.apache.hadoop.mapred.lib.db.dbconfiguration;import Org.apache.hadoop.mapred.lib.db.dbinputformat;import Com.wyg.hadoop.mysql.bean.dbrecord;import Com.wyg.hadoop.mysql.mapper.dbrecordmapper;public class DBAccess {public             static void Main (string[] args) throws IOException {jobconf conf = new jobconf (dbaccess.class);             Conf.setoutputkeyclass (Longwritable.class);             Conf.setoutputvalueclass (Text.class);             Conf.setinputformat (Dbinputformat.class);             Path PATH = new Path ("Hdfs://192.168.44.129:9000/user/root/dbout");           Fileoutputformat.setoutputpath (conf, path);  Dbconfiguration.configuredb (conf, "Com.mysql.jdbc.Driver", "jdbc:mysql://your ip:3306/database name", "username", "password");             String [] fields = {"id", "title", "Content"};             Dbinputformat.setinput (conf, Dbrecord.class, "Wu_testhadoop", NULL, "id", fields);             Conf.setmapperclass (Dbrecordmapper.class);             Conf.setreducerclass (Identityreducer.class);      Jobclient.runjob (conf); }}

The

Executes the program with the following results:

15/08/11 16:46:18 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=jobtracker, SESSIONID=15/08/11 16:46:18 WARN mapred. Jobclient:use Genericoptionsparser for parsing the arguments. Applications should implement Tool for the SAME.15/08/11 16:46:18 WARN mapred.  Jobclient:no job jar file set. User classes May is not found. See jobconf (Class) or Jobconf#setjar (String). 15/08/11 16:46:19 INFO mapred. Jobclient:running JOB:JOB_LOCAL_000115/08/11 16:46:19 INFO mapred. MAPTASK:NUMREDUCETASKS:115/08/11 16:46:19 INFO mapred. MAPTASK:IO.SORT.MB = 10015/08/11 16:46:19 INFO mapred. Maptask:data buffer = 79691776/9961472015/08/11 16:46:19 INFO mapred. Maptask:record buffer = 262144/32768015/08/11 16:46:19 INFO mapred. maptask:starting flush of Map OUTPUT15/08/11 16:46:19 INFO mapred. maptask:finished spill 015/08/11 16:46:19 INFO mapred. TaskRunner:Task:attempt_local_0001_m_000000_0 is done. and is in the process of COMMITING15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. Taskrunner:task ' Attempt_local_0001_m_000000_0 ' DONE.15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. Merger:merging 1 sorted segments15/08/11 16:46:19 INFO mapred. Merger:down to the last Merge-pass, with 1 segments left of total size:48 bytes15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. TaskRunner:Task:attempt_local_0001_r_000000_0 is done. and is in the process of COMMITING15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. Taskrunner:task Attempt_local_0001_r_000000_0 is allowed to commit NOW15/08/11 16:46:19 INFO mapred. fileoutputcommitter:saved output of Task ' attempt_local_0001_r_000000_0 ' to hdfs://192.168.44.129:9000/user/root/ DBOUT15/08/11 16:46:19 INFO mapred. Localjobrunner:reduce > REDUCE15/08/11 16:46:19 INFO mapred. Taskrunner:task ' Attempt_local_0001_r_000000_0 ' DONE.15/08/11 16:46:20 INFO mapred. Jobclient:map 100% reduce 100%15/08/11 16:46:20 INFO mapred. JobcliEnt:job COMPLETE:JOB_LOCAL_000115/08/11 16:46:20 INFO mapred. JOBCLIENT:COUNTERS:1415/08/11 16:46:20 INFO mapred. JOBCLIENT:FILESYSTEMCOUNTERS15/08/11 16:46:20 INFO mapred. JOBCLIENT:FILE_BYTES_READ=3460615/08/11 16:46:20 INFO mapred. JOBCLIENT:FILE_BYTES_WRITTEN=6984415/08/11 16:46:20 INFO mapred. JOBCLIENT:HDFS_BYTES_WRITTEN=3015/08/11 16:46:20 INFO mapred. Jobclient:map-reduce FRAMEWORK15/08/11 16:46:20 INFO mapred. Jobclient:reduce input GROUPS=215/08/11 16:46:20 INFO mapred. Jobclient:combine output RECORDS=015/08/11 16:46:20 INFO mapred. Jobclient:map input RECORDS=215/08/11 16:46:20 INFO mapred. Jobclient:reduce Shuffle BYTES=015/08/11 16:46:20 INFO mapred. Jobclient:reduce output RECORDS=215/08/11 16:46:20 INFO mapred. jobclient:spilled RECORDS=415/08/11 16:46:20 INFO mapred. Jobclient:map output BYTES=4215/08/11 16:46:20 INFO mapred. Jobclient:map input BYTES=215/08/11 16:46:20 INFO mapred. Jobclient:combine input records=015/08/11 16:46:20 INFO mapred. Jobclient:map output RECORDS=215/08/11 16:46:20 INFO mapred. Jobclient:reduce input records=2


At the same time you can see the HDFs file system more than a dbout directory, the file inside the database to save the corresponding data, the content is saved as follows

11 123 12231222) 123 123456

HDFs data import to MySQL

The HDFs file is stored in MySQL, which also requires the upper Dbrecord class as an aid, since the operation of the database is done through dbinput and dboutput;

First you need to define the map and the reduce implementation (map to parse the HDFs document, reduce to parse the map output and output)

Package Com.wyg.hadoop.mysql.mapper;import Java.io.ioexception;import Java.io.datainput;import java.io.DataOutput; Import Java.sql.preparedstatement;import java.sql.resultset;import Java.sql.sqlexception;import java.util.Iterator ; Import Org.apache.hadoop.filecache.distributedcache;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.mapreducebase;import Org.apache.hadoop.mapred.mapper;import Org.apache.hadoop.mapred.outputcollector;import Org.apache.hadoop.mapred.reducer;import Org.apache.hadoop.mapred.reporter;import Com.wyg.hadoop.mysql.bean.dbrecord;public class Writedb {//Map process public static class map extends Mapreducebase I Mplements mapper<object, text, text, dbrecord> {Private final static dbrecord one = new Dbrecord (        );        Private text Word = new text (); @Override PubLIC void map (Object key, Text value, Outputcollector<text, dbrecord> output, Reporter Reporter)            Throws IOException {String line = value.tostring ();            string[] Infos = Line.split ("");            String id = infos[0].split ("") [1];            One.setid (new Integer (ID));            One.settitle (Infos[1]);            One.setcontent (infos[2]);            Word.set (ID);        Output.collect (Word, one); }} public static class Reduce extends Mapreducebase implements Reducer<text, Dbrecord, Dbrecord, text> {@O verridepublic void reduce (Text key, iterator<dbrecord> Values,outputcollector<dbrecord, text> collector,    Reporter Reporter) throws IOException {Dbrecord record = Values.next (); Collector.collect (Record, New Text ());}}}
Test HDFs Import data to the database

Package Com.wyg.hadoop.mysql.db;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileinputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.textinputformat;import Org.apache.hadoop.mapred.lib.db.dbconfiguration;import Org.apache.hadoop.mapred.lib.db.dbinputformat;import Org.apache.hadoop.mapred.lib.db.dboutputformat;import Com.wyg.hadoop.mysql.bean.dbrecord;import         Com.wyg.hadoop.mysql.mapper.writedb;public class Dbinsert {public static void main (string[] args) throws Exception {        jobconf conf = new jobconf (writedb.class);        Set input and Output type Conf.setinputformat (Textinputformat.class);        Conf.setoutputformat (Dboutputformat.class);        Not to add these two sentences, but the online example does not have these two sentences.        Text, Dbrecord Conf.setmapoutputkeyclass (Text.class); Conf.setmapoutputvalueclass (DBREcord.class);        Conf.setoutputkeyclass (Text.class);        Conf.setoutputvalueclass (Dbrecord.class);        Set up Map and reduce class Conf.setmapperclass (WriteDB.Map.class);        Conf.setreducerclass (WriteDB.Reduce.class);        Set Output as Directory Fileinputformat.setinputpaths (Conf, new Path ("Hdfs://192.168.44.129:9000/user/root/dbout"));        Establish database connection dbconfiguration.configuredb (conf, "Com.mysql.jdbc.Driver", "jdbc:mysql://database ip:3306/database name", "username", "password");        string[] Fields = {"id", "title", "Content"};        Dboutputformat.setoutput (conf, "Wu_testhadoop", fields);    Jobclient.runjob (conf); }}

The test results are as follows

15/08/11 18:10:15 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=jobtracker, SESSIONID=15/08/11 18:10:15 WARN mapred. Jobclient:use Genericoptionsparser for parsing the arguments. Applications should implement Tool for the SAME.15/08/11 18:10:15 WARN mapred.  Jobclient:no job jar file set. User classes May is not found. See jobconf (Class) or Jobconf#setjar (String). 15/08/11 18:10:15 INFO mapred. Fileinputformat:total input paths to PROCESS:115/08/11 18:10:15 INFO mapred. Jobclient:running JOB:JOB_LOCAL_000115/08/11 18:10:15 INFO mapred. Fileinputformat:total input paths to PROCESS:115/08/11 18:10:15 INFO mapred. MAPTASK:NUMREDUCETASKS:115/08/11 18:10:15 INFO mapred. MAPTASK:IO.SORT.MB = 10015/08/11 18:10:15 INFO mapred. Maptask:data buffer = 79691776/9961472015/08/11 18:10:15 INFO mapred. Maptask:record buffer = 262144/32768015/08/11 18:10:15 INFO mapred. maptask:starting flush of Map OUTPUT15/08/11 18:10:16 INFO mapred. Maptask:finished spill 015/08/11 18:10:16 inFO mapred. TaskRunner:Task:attempt_local_0001_m_000000_0 is done. and is in the process of COMMITING15/08/11 18:10:16 INFO mapred. LOCALJOBRUNNER:HDFS://192.168.44.129:9000/USER/ROOT/DBOUT/PART-00000:0+3015/08/11 18:10:16 INFO mapred. Taskrunner:task ' Attempt_local_0001_m_000000_0 ' DONE.15/08/11 18:10:16 INFO mapred. LOCALJOBRUNNER:15/08/11 18:10:16 INFO mapred. Merger:merging 1 sorted segments15/08/11 18:10:16 INFO mapred. Merger:down to the last Merge-pass, with 1 segments left of total size:40 bytes15/08/11 18:10:16 INFO mapred. LOCALJOBRUNNER:15/08/11 18:10:16 INFO mapred. TaskRunner:Task:attempt_local_0001_r_000000_0 is done. and is in the process of COMMITING15/08/11 18:10:16 INFO mapred. Localjobrunner:reduce > REDUCE15/08/11 18:10:16 INFO mapred. Taskrunner:task ' Attempt_local_0001_r_000000_0 ' DONE.15/08/11 18:10:16 INFO mapred. Jobclient:map 100% reduce 100%15/08/11 18:10:16 INFO mapred. Jobclient:job COMPLETE:JOB_LOCAL_000115/08/11 18:10:16 INFO mapred. Jobclient:counteRS:1415/08/11 18:10:16 INFO mapred. JOBCLIENT:FILESYSTEMCOUNTERS15/08/11 18:10:16 INFO mapred. JOBCLIENT:FILE_BYTES_READ=3493215/08/11 18:10:16 INFO mapred. JOBCLIENT:HDFS_BYTES_READ=6015/08/11 18:10:16 INFO mapred. JOBCLIENT:FILE_BYTES_WRITTEN=7069415/08/11 18:10:16 INFO mapred. Jobclient:map-reduce FRAMEWORK15/08/11 18:10:16 INFO mapred. Jobclient:reduce input GROUPS=215/08/11 18:10:16 INFO mapred. Jobclient:combine output RECORDS=015/08/11 18:10:16 INFO mapred. Jobclient:map input RECORDS=215/08/11 18:10:16 INFO mapred. Jobclient:reduce Shuffle BYTES=015/08/11 18:10:16 INFO mapred. Jobclient:reduce output RECORDS=215/08/11 18:10:16 INFO mapred. jobclient:spilled RECORDS=415/08/11 18:10:16 INFO mapred. Jobclient:map output BYTES=3415/08/11 18:10:16 INFO mapred. Jobclient:map input BYTES=3015/08/11 18:10:16 INFO mapred. Jobclient:combine input RECORDS=015/08/11 18:10:16 INFO mapred. Jobclient:map Output RECORDS=215/08/11 18:10:16 INFO mapred. Jobclient:reduce input records=2

Before the test I have emptied the original table, you can see that after the execution of the database added two items;

The next time in the execution of the error, is the normal situation, because we import data when the ID is assigned, if the ID is ignored, it can be added;

Source

Source has been uploaded, for download.csdn.net/detail/wuyinggui10000/8974585



Step by step and learn from me Hadoop (7)----Hadoop connection MySQL database perform data read-write database operations

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.