Hadoop core learning notes (1) writing and reading writable data in sequencefile

Source: Internet
Author: User

This blog is an original article, reproduced please indicate the source: http://guoyunsky.iteye.com/blogs/1265944

When I first came into contact with hadoop, sequencefile and writable had a bit of association and thought it was amazing. later, I learned that some I/O protocols are used for input and output. this section describes how to read and write writable data from Sequence File.

Writable is similar to the transmitted data. Compared with Java, writable is equivalent to an object. It is referenced in hadoop and requires a set of protocols for transmission and conversion. therefore, the public void write (dataoutput out) throws ioexception and public void readfields (datainput in) throws ioexception methods are available. How can one write data and one read data. so that these objects can be accessible throughout the hadoop cluster. I have read some introductions about the serialization of hadoop, but I have no idea why I should understand it later.

In this example, we construct a writable object, write it to the sequence file, and read it. Finally, we will compare the read data to see if it is correct. For details, see the code:

 

 

Package COM. guoyun. hadoop. io. study; import Java. io. datainput; import Java. io. dataoutput; import Java. io. ioexception; import Java. util. arraylist; import Java. util. collection; import Java. util. hashset; import Java. util. list; import Java. util. set; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. FS. filesystem; import Org. apache. hadoop. FS. path; import Org. apache. hadoop. io. ioutils; import Org. apache. hadoop. io. longwritable; import Org. apache. hadoop. io. sequencefile; import Org. apache. hadoop. io. writable; import Org. apache. hadoop. util. reflectionutils; public class sequencefilestudy {public static class userwritable implements writable, comparable {private long userid; private string username; private int userage; public long getuserid () {return userid ;} public void setuserid (long userid) {This. userid = userid;} Public String GetUserName () {return username;} public void setusername (string username) {This. username = username;} public int getuserage () {return userage;} public void setuserage (INT userage) {This. userage = userage;} public userwritable (long userid, string username, int userage) {super (); this. userid = userid; this. username = username; this. userage = userage;} public userwritable () {super () ;}@ override public void write (dataoutput out) throws ioexception {out. writelong (this. userid); out. writeutf (this. username); out. writeint (this. userage) ;}@ override public void readfields (datainput in) throws ioexception {This. userid = in. readlong (); this. username = in. readutf (); this. userage = in. readint () ;}@ override Public String tostring () {return this. userid + "\ t" + this. username + "\ t" + this. userage;}/*** only compares userid */@ override public Boolean equals (Object OBJ) {If (OBJ instanceof userwritable) {userwritable U1 = (userwritable) OBJ; return u1.getuserid () = This. getuserid ();} return false;}/*** compare only userid */@ override public int compareto (Object OBJ) {int result =-1; if (OBJ instanceof userwritable) {userwritable U1 = (userwritable) OBJ; If (this. userid> u1.userid) {result = 1;} else if (this. userid = u1.userid) {result = 1 ;}} return result ;}@ override public int hashcode () {return (INT) This. userid & integer. max_value ;}}/*** write to Sequence File ** @ Param filepath * @ Param conf * @ Param datas */public static void write2sequencefile (string filepath, configuration Conf, collection <userwritable> datas) {filesystem FS = NULL; sequencefile. writer writer = NULL; Path = NULL; longwritable idkey = new longwritable (0); try {FS = filesystem. get (CONF); Path = New Path (filepath); writer = sequencefile. createwriter (FS, Conf, path, longwritable. class, userwritable. class); For (userwritable User: datas) {idkey. set (user. getuserid (); // userid is key writer. append (idkey, user) ;}} catch (ioexception e) {// todo auto-generated Catch Block E. printstacktrace ();} finally {ioutils. closestream (writer );}} /*** read data from the Sequence File ** @ Param sequecefilepath * @ Param conf * @ return */public static list <userwritable> readsequencefile (string sequecefilepath, configuration conf) {list <userwritable> result = NULL; filesystem FS = NULL; sequencefile. reader reader = NULL; Path = NULL; writable key = NULL; userwritable value = new userwritable (); try {FS = filesystem. get (CONF); Result = new arraylist <userwritable> (); Path = New Path (sequecefilepath); reader = new sequencefile. reader (FS, path, conf); Key = (writable) reflectionutils. newinstance (reader. getkeyclass (), conf); // obtain the key, that is, the userid while (reader. next (Key, value) {result. add (value); value = new userwritable () ;}} catch (ioexception e) {// todo auto-generated Catch Block E. printstacktrace ();} catch (exception e) {e. printstacktrace ();} finally {ioutils. closestream (Reader);} return result;} Private Static configuration getdefaconf conf () {configuration conf = new configuration (); Conf. set ("mapred. job. tracker "," local "); Conf. set ("FS. default. name "," file: // "); // Conf. set ("Io. compression. codecs "," com. hadoop. compression. lzo. lzocodec "); Return conf;}/*** @ Param ARGs */public static void main (string [] ARGs) {string filepath =" Data/user. sequence "; // file path set <userwritable> Users = new hashset <userwritable> (); userwritable user = NULL; // generate data for (INT I = 1; I <= 10; I ++) {user = new userwritable (I + (INT) (math. random () * 100000), "name-" + (I + 1), (INT) (math. random () * 50) + 10); users. add (User) ;}// write to Sequence File write2sequencefile (filepath, getdefaconf conf (), users); // read list <userwritable> readdatas = readsequencefile (filepath, getdefaconf conf (); // check whether the data is correct and output for (userwritable U: readdatas) {If (users. contains (u) {system. out. println (U. tostring ();} else {system. err. println ("error data:" + U. tostring ());}}}}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.