to facilitate the MapReduce direct access to the relational database (mysql,oracle). Hadoop offers two classes of Dbinputformat and Dboutputformat. Through the Dbinputformat class, the database table data is read into HDFs, and the result set generated by MapReduce is imported into the database table according to the Dboutputformat class.
error when executing mapreduce: java.io.IOException:com.mysql.jdbc.Driver, usually because the program cannot find the MySQL driver package. The workaround is to have each tasktracker execute a mapreduce program to find the driver package.
There are two ways to add a package:
(1) Add the package under each node under ${hadoop_home}/lib. Restarting the cluster is usually the more primitive method.
(2) a) upload the package to the cluster: Hadoop fs-put mysql-connector-java-5.1.0-bin.jar/hdfspath/
b) Before the Mr Program submits the job, add the statement: Distributedcache.addfiletoclasspath ("/hdfspath/mysql-connector-java-5.1.0-bin.jar") , conf);
MySQL database storage to Hadoop hdfsmysql table creation and data initialization
DROP TABLE IF EXISTS ' Wu_testhadoop '; CREATE TABLE ' wu_testhadoop ' ( ' id ' int (one) not null auto_increment, ' title ' varchar (255) DEFAULT NULL, ' Content ' varchar (255) Default NULL, PRIMARY KEY (' id ')) engine=innodb auto_increment=3 default Charset=utf8;------- -------------------------Records of Wu_testhadoop------------------------------INSERT into ' wu_testhadoop ' VALUES (' 1 ', ' 123 ', ' 122312 '); INSERT into ' Wu_testhadoop ' VALUES (' 2 ', ' 123 ', ' 123456 ');
Define Hadoop data Access
Once MySQL table creation is complete, we need to define the rules for Hadoop access to MySQL.
Hadoop provides the org. Apache. Hadoop. io. The writable interface implements a simple and efficient serializable protocol that implements related functions based on datainput and DataOutput.
Hadoop also provides an orgfor database access. Apache. Hadoop. mapred. Lib. db. The Dbwritable interface, in which the Write method is used to set a value on the PreparedStatement object, ReadFields method is used to bind the column value of the object read from the database.
The use of the above two interfaces such as the following (the content is derived from the source code)
Writable
Public class Mywritable implements writable { //Some data private int counter; private long timestamp; public void Write (DataOutput out) throws IOException { out.writeint (counter); Out.writelong (timestamp); } public void ReadFields (Datainput in) throws IOException { counter = In.readint (); timestamp = In.readlong (); } public static mywritable read (Datainput in) throws IOException { mywritable w = new mywritable (); W.readfields (in); Return w; }
Dbwritable
Public class Mywritable implements writable, dbwritable { //Some data private int counter; private long timestamp; Writable#write () implementation public void write (DataOutput out) throws IOException { Out.writeint (counter ); Out.writelong (timestamp); } Writable#readfields () Implementation public void ReadFields (Datainput in) throws IOException { counter = In.readint (); timestamp = In.readlong (); } public void Write (PreparedStatement statement) throws SQLException { Statement.setint (1, counter); Statement.setlong (2, timestamp); } public void ReadFields (ResultSet ResultSet) throws SQLException { counter = resultset.getint (1); timestamp = Resultset.getlong (2); } }
The corresponding implementation of the database
Package Com.wyg.hadoop.mysql.bean;import Java.io.datainput;import Java.io.dataoutput;import java.io.IOException; Import Java.sql.preparedstatement;import Java.sql.resultset;import Java.sql.sqlexception;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import org.apache.hadoop.mapred.lib.db.DBWritable ;p Ublic class Dbrecord implements writable, Dbwritable{private int id;private string Title;private string Content;public I NT GetId () {return ID;} public void setId (int id) {this.id = ID;} Public String GetTitle () {return title;} public void Settitle (String title) {this.title = title;} Public String getcontent () {return content;} public void SetContent (String content) {this.content = content;} @Overridepublic void ReadFields (ResultSet set) throws SQLException {this.id = Set.getint ("id"); this.title = Set.getstring ("title"); this.content = set.getstring ("content");} @Overridepublic void Write (PreparedStatement pst) throws SQLException {Pst.setint (1, id);p st.setstring (2, title);p St.sEtstring (3, content);} @Overridepublic void ReadFields (Datainput in) throws IOException {this.id = In.readint (), this.title = Text.readstring (in ); this.content = Text.readstring (in);} @Overridepublic void Write (DataOutput out) throws IOException {Out.writeint (this.id); Text.writestring (out, this.title); Text.writestring (out, this.content);} @Overridepublic String toString () {return this.id + "" + This.title + "" + this.content; }}
Implement Map/reduce
Package Com.wyg.hadoop.mysql.mapper;import Java.io.ioexception;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.mapreducebase;import Org.apache.hadoop.mapred.Mapper; Import Org.apache.hadoop.mapred.outputcollector;import Org.apache.hadoop.mapred.reporter;import Com.wyg.hadoop.mysql.bean.DBRecord, @SuppressWarnings ("deprecation") public class Dbrecordmapper extends Mapreducebase implements Mapper<longwritable, Dbrecord, longwritable, text>{@Overridepublic void Map ( Longwritable Key, Dbrecord value,outputcollector<longwritable, text> Collector, Reporter Reporter) throws IOException {collector.collect (New longwritable (Value.getid ()), New Text (Value.tostring ()));} }
Test Hadoop to connect to MySQL and store the data in HDFs
Package Com.wyg.hadoop.mysql.db;import Java.io.ioexception;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileoutputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.lib.identityreducer;import Org.apache.hadoop.mapred.lib.db.dbconfiguration;import Org.apache.hadoop.mapred.lib.db.dbinputformat;import Com.wyg.hadoop.mysql.bean.dbrecord;import Com.wyg.hadoop.mysql.mapper.dbrecordmapper;public class DBAccess {public static void Main (string[] args) throws IOException {jobconf conf = new jobconf (dbaccess.class); Conf.setoutputkeyclass (Longwritable.class); Conf.setoutputvalueclass (Text.class); Conf.setinputformat (Dbinputformat.class); Path PATH = new Path ("Hdfs://192.168.44.129:9000/user/root/dbout"); Fileoutputformat.setoutputpath (conf, path); Dbconfiguration.configuredb (conf, "Com.mysql.jdbc.Driver", "jdbc:mysql://your ip:3306/database name", "username", "password"); String [] fields = {"id", "title", "Content"}; Dbinputformat.setinput (conf, Dbrecord.class, "Wu_testhadoop", NULL, "id", fields); Conf.setmapperclass (Dbrecordmapper.class); Conf.setreducerclass (Identityreducer.class); Jobclient.runjob (conf); }}
Run the program with the results such as the following:
15/08/11 16:46:18 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=jobtracker, SESSIONID=15/08/11 16:46:18 WARN mapred. Jobclient:use Genericoptionsparser for parsing the arguments. Applications should implement Tool for the SAME.15/08/11 16:46:18 WARN mapred. Jobclient:no job jar file set. User classes May is not found. See jobconf (Class) or Jobconf#setjar (String). 15/08/11 16:46:19 INFO mapred. Jobclient:running JOB:JOB_LOCAL_000115/08/11 16:46:19 INFO mapred. MAPTASK:NUMREDUCETASKS:115/08/11 16:46:19 INFO mapred. MAPTASK:IO.SORT.MB = 10015/08/11 16:46:19 INFO mapred. Maptask:data buffer = 79691776/9961472015/08/11 16:46:19 INFO mapred. Maptask:record buffer = 262144/32768015/08/11 16:46:19 INFO mapred. maptask:starting flush of Map OUTPUT15/08/11 16:46:19 INFO mapred. maptask:finished spill 015/08/11 16:46:19 INFO mapred. TaskRunner:Task:attempt_local_0001_m_000000_0 is done. and is in the process of COMMITING15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. Taskrunner:task ' Attempt_local_0001_m_000000_0 ' DONE.15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. Merger:merging 1 sorted segments15/08/11 16:46:19 INFO mapred. Merger:down to the last Merge-pass, with 1 segments left of total size:48 bytes15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. TaskRunner:Task:attempt_local_0001_r_000000_0 is done. and is in the process of COMMITING15/08/11 16:46:19 INFO mapred. LOCALJOBRUNNER:15/08/11 16:46:19 INFO mapred. Taskrunner:task Attempt_local_0001_r_000000_0 is allowed to commit NOW15/08/11 16:46:19 INFO mapred. fileoutputcommitter:saved output of Task ' attempt_local_0001_r_000000_0 ' to hdfs://192.168.44.129:9000/user/root/ DBOUT15/08/11 16:46:19 INFO mapred. Localjobrunner:reduce > REDUCE15/08/11 16:46:19 INFO mapred. Taskrunner:task ' Attempt_local_0001_r_000000_0 ' DONE.15/08/11 16:46:20 INFO mapred. Jobclient:map 100% reduce 100%15/08/11 16:46:20 INFO mapred. JobcliEnt:job COMPLETE:JOB_LOCAL_000115/08/11 16:46:20 INFO mapred. JOBCLIENT:COUNTERS:1415/08/11 16:46:20 INFO mapred. JOBCLIENT:FILESYSTEMCOUNTERS15/08/11 16:46:20 INFO mapred. JOBCLIENT:FILE_BYTES_READ=3460615/08/11 16:46:20 INFO mapred. JOBCLIENT:FILE_BYTES_WRITTEN=6984415/08/11 16:46:20 INFO mapred. JOBCLIENT:HDFS_BYTES_WRITTEN=3015/08/11 16:46:20 INFO mapred. Jobclient:map-reduce FRAMEWORK15/08/11 16:46:20 INFO mapred. Jobclient:reduce input GROUPS=215/08/11 16:46:20 INFO mapred. Jobclient:combine output RECORDS=015/08/11 16:46:20 INFO mapred. Jobclient:map input RECORDS=215/08/11 16:46:20 INFO mapred. Jobclient:reduce Shuffle BYTES=015/08/11 16:46:20 INFO mapred. Jobclient:reduce output RECORDS=215/08/11 16:46:20 INFO mapred. jobclient:spilled RECORDS=415/08/11 16:46:20 INFO mapred. Jobclient:map output BYTES=4215/08/11 16:46:20 INFO mapred. Jobclient:map input BYTES=215/08/11 16:46:20 INFO mapred. Jobclient:combine input records=015/08/11 16:46:20 INFO mapred. Jobclient:map output RECORDS=215/08/11 16:46:20 INFO mapred. Jobclient:reduce input records=2
At the same time can see the HDFs file system more than a Dbout folder, the inside of the file saved the database corresponding data, the content is saved such as the following
11 123 12231222) 123 123456
HDFs data import to MySQL
The HDFs file is stored in MySQL and requires the upper Dbrecord class as an aid. Since the operation of the database is carried out through dbinput and dboutput;
The first step is to define the map and reduce implementations (map to parse the HDFs document, reduce to parse the map output and output)
Package Com.wyg.hadoop.mysql.mapper;import Java.io.ioexception;import Java.io.datainput;import java.io.DataOutput; Import Java.sql.preparedstatement;import java.sql.resultset;import Java.sql.sqlexception;import java.util.Iterator ; Import Org.apache.hadoop.filecache.distributedcache;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.mapreducebase;import Org.apache.hadoop.mapred.mapper;import Org.apache.hadoop.mapred.outputcollector;import Org.apache.hadoop.mapred.reducer;import Org.apache.hadoop.mapred.reporter;import Com.wyg.hadoop.mysql.bean.dbrecord;public class Writedb {//Map process public static class map extends Mapreducebase I Mplements mapper<object, text, text, dbrecord> {Private final static dbrecord one = new Dbrecord ( ); Private text Word = new text (); @Override PubLIC void map (Object key, Text value, Outputcollector<text, dbrecord> output, Reporter Reporter) Throws IOException {String line = value.tostring (); string[] Infos = Line.split (""); String id = infos[0].split ("") [1]; One.setid (new Integer (ID)); One.settitle (Infos[1]); One.setcontent (infos[2]); Word.set (ID); Output.collect (Word, one); }} public static class Reduce extends Mapreducebase implements Reducer<text, Dbrecord, Dbrecord, text> {@O verridepublic void reduce (Text key, iterator<dbrecord> Values,outputcollector<dbrecord, text> collector, Reporter Reporter) throws IOException {Dbrecord record = Values.next (); Collector.collect (Record, New Text ());}}}
Test HDFS import data to Database
Package Com.wyg.hadoop.mysql.db;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileinputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.textinputformat;import Org.apache.hadoop.mapred.lib.db.dbconfiguration;import Org.apache.hadoop.mapred.lib.db.dbinputformat;import Org.apache.hadoop.mapred.lib.db.dboutputformat;import Com.wyg.hadoop.mysql.bean.dbrecord;import Com.wyg.hadoop.mysql.mapper.writedb;public class Dbinsert {public static void main (string[] args) throws Exception { jobconf conf = new jobconf (writedb.class); Set input and Output type Conf.setinputformat (Textinputformat.class); Conf.setoutputformat (Dboutputformat.class); Do not add these two sentences, pass just, but online to give the example does not have these two sentences. Text, Dbrecord Conf.setmapoutputkeyclass (Text.class); Conf.setmapoutputvalueclass (DBREcord.class); Conf.setoutputkeyclass (Text.class); Conf.setoutputvalueclass (Dbrecord.class); Set up Map and reduce class Conf.setmapperclass (WriteDB.Map.class); Conf.setreducerclass (WriteDB.Reduce.class); Set the output as folder fileinputformat.setinputpaths (conf, new Path ("Hdfs://192.168.44.129:9000/user/root/dbout")); Establish database connection dbconfiguration.configuredb (conf, "Com.mysql.jdbc.Driver", "jdbc:mysql://database ip:3306/database name", "username", " Password "); string[] Fields = {"id", "title", "Content"}; Dboutputformat.setoutput (conf, "Wu_testhadoop", fields); Jobclient.runjob (conf); }}
Test results such as the following
15/08/11 18:10:15 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=jobtracker, SESSIONID=15/08/11 18:10:15 WARN mapred. Jobclient:use Genericoptionsparser for parsing the arguments. Applications should implement Tool for the SAME.15/08/11 18:10:15 WARN mapred. Jobclient:no job jar file set. User classes May is not found. See jobconf (Class) or Jobconf#setjar (String). 15/08/11 18:10:15 INFO mapred. Fileinputformat:total input paths to PROCESS:115/08/11 18:10:15 INFO mapred. Jobclient:running JOB:JOB_LOCAL_000115/08/11 18:10:15 INFO mapred. Fileinputformat:total input paths to PROCESS:115/08/11 18:10:15 INFO mapred. MAPTASK:NUMREDUCETASKS:115/08/11 18:10:15 INFO mapred. MAPTASK:IO.SORT.MB = 10015/08/11 18:10:15 INFO mapred. Maptask:data buffer = 79691776/9961472015/08/11 18:10:15 INFO mapred. Maptask:record buffer = 262144/32768015/08/11 18:10:15 INFO mapred. maptask:starting flush of Map OUTPUT15/08/11 18:10:16 INFO mapred. Maptask:finished spill 015/08/11 18:10:16 inFO mapred. TaskRunner:Task:attempt_local_0001_m_000000_0 is done. and is in the process of COMMITING15/08/11 18:10:16 INFO mapred. LOCALJOBRUNNER:HDFS://192.168.44.129:9000/USER/ROOT/DBOUT/PART-00000:0+3015/08/11 18:10:16 INFO mapred. Taskrunner:task ' Attempt_local_0001_m_000000_0 ' DONE.15/08/11 18:10:16 INFO mapred. LOCALJOBRUNNER:15/08/11 18:10:16 INFO mapred. Merger:merging 1 sorted segments15/08/11 18:10:16 INFO mapred. Merger:down to the last Merge-pass, with 1 segments left of total size:40 bytes15/08/11 18:10:16 INFO mapred. LOCALJOBRUNNER:15/08/11 18:10:16 INFO mapred. TaskRunner:Task:attempt_local_0001_r_000000_0 is done. and is in the process of COMMITING15/08/11 18:10:16 INFO mapred. Localjobrunner:reduce > REDUCE15/08/11 18:10:16 INFO mapred. Taskrunner:task ' Attempt_local_0001_r_000000_0 ' DONE.15/08/11 18:10:16 INFO mapred. Jobclient:map 100% reduce 100%15/08/11 18:10:16 INFO mapred. Jobclient:job COMPLETE:JOB_LOCAL_000115/08/11 18:10:16 INFO mapred. Jobclient:counteRS:1415/08/11 18:10:16 INFO mapred. JOBCLIENT:FILESYSTEMCOUNTERS15/08/11 18:10:16 INFO mapred. JOBCLIENT:FILE_BYTES_READ=3493215/08/11 18:10:16 INFO mapred. JOBCLIENT:HDFS_BYTES_READ=6015/08/11 18:10:16 INFO mapred. JOBCLIENT:FILE_BYTES_WRITTEN=7069415/08/11 18:10:16 INFO mapred. Jobclient:map-reduce FRAMEWORK15/08/11 18:10:16 INFO mapred. Jobclient:reduce input GROUPS=215/08/11 18:10:16 INFO mapred. Jobclient:combine output RECORDS=015/08/11 18:10:16 INFO mapred. Jobclient:map input RECORDS=215/08/11 18:10:16 INFO mapred. Jobclient:reduce Shuffle BYTES=015/08/11 18:10:16 INFO mapred. Jobclient:reduce output RECORDS=215/08/11 18:10:16 INFO mapred. jobclient:spilled RECORDS=415/08/11 18:10:16 INFO mapred. Jobclient:map output BYTES=3415/08/11 18:10:16 INFO mapred. Jobclient:map input BYTES=3015/08/11 18:10:16 INFO mapred. Jobclient:combine input RECORDS=015/08/11 18:10:16 INFO mapred. Jobclient:map Output RECORDS=215/08/11 18:10:16 INFO mapred. Jobclient:reduce input records=2
Before the test, I had emptied the original table, and was able to see that two items were added to the database after running.
The next time you run the error, it is normal, because we import the data when the ID is assigned, assuming that the ID is ignored. Be able to keep on joining;
Source
The source code has been uploaded for download.csdn.net/detail/wuyinggui10000/8974585
Step by step and learn from me Hadoop (7)----Hadoop connection MySQL database run data read/write database operations