Brief introduction
In Hadoop, the implementation class of the writable is a huge family, and we are here to briefly describe some of the parts that are often used for serialization.
Java Native type
Except for the char type, all native types have corresponding writable classes, and their values are available through get and set methods.
Intwritable and longwritable also have corresponding variable-length vintwritable and vlongwritable classes.
Fixed length or longer selection similar to the database with char or VCHAR, here is not to repeat.
Text type
The text type uses a variable length int type storage length, so the maximum storage for the text type is 2G.
The text type uses standard UTF-8 encoding, so it can be very good to interact with other text tools, but note that this is a lot different from the Java string type.
Retrieval of different
The Chatat of text returns an integral type and a utf-8 encoded number, rather than a Unicode-encoded char type like string.
@Test public
void Testtextindex () {
text text=new text ("Hadoop");
Assert.assertequals (Text.getlength (), 6);
Assert.assertequals (Text.getbytes (). length, 6);
Assert.assertequals (Text.charat (2), (int) ' d ');
Assert.assertequals ("Out of Bounds", Text.charat (), -1);
}
Text also has a Find method, similar to the IndexOf method in string
@Test public
void Testtextfind () {
text text = new text ("Hadoop");
Assert.assertequals ("Find a substring", Text.find ("Do"), 2);
Assert.assertequals ("Find a ' o '", Text.find ("O"), 3);
Assert.assertequals ("Find ' o ' position 4 or later", Text.find ("O", 4), 4);
Assert.assertequals ("No match", Text.find ("Pig"), -1);
}
The different Unicode
When the uft-8 encoded bytes are greater than two, the difference between text and string is clearer, because string is computed in Unicode char, and text is calculated in bytes.
Let's look at 1 to 4 bytes of different Unicode characters