Java API Read hive orc file
Reprint: http://lxw1234.com/archives/2015/08/462.htm
Orc is a hive-specific file format that has a very high compression ratio and read efficiency, so it quickly replaces the previous rcfile and becomes a very common file format in hive.
In a real business scenario, you might want to use the Java API, or MapReduce read and write orc files.
This article first describes using the Java API to read hive ORC files.
There is already a table lxw1234 stored in the ORC format in hive:
The table has four fields: URL, Word, freq, weight, type string;
There are only 5 data:
The following code reads the orc file directly from the HDFs path of the table lxw1234, using the API:
Package com.lxw1234.test; Import java.util.List; Import java.util.Properties; Import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; Import Org.apache.hadoop.hive.ql.io.orc.OrcSerde; Import Org.apache.hadoop.hive.serde2.objectinspector.StructField; Import Org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; Import Org.apache.hadoop.mapred.FileInputFormat; Import Org.apache.hadoop.mapred.InputFormat; Import Org.apache.hadoop.mapred.InputSplit; Import org.apache.hadoop.mapred.JobConf; Import Org.apache.hadoop.mapred.RecordReader; Import Org.apache.hadoop.mapred.Reporter; /** * LxW large data field-http://lxw1234.com * @author lxw.com */public class Testorcreader { public static void Main (string[] args) throws Exception {jobconf conf = new jobconf (); Path Testfilepath = new Path (args[0]); Properties P = new properties (); Orcserde Serde = new Orcserde (); P.setproperty ("Columns", "url,word,freq,weight"); P.setproperTy ("Columns.types", "string:string:string:string"); Serde.initialize (conf, p); Structobjectinspector inspector = (structobjectinspector) serde.getobjectinspector (); InputFormat in = new Orcinputformat (); Fileinputformat.setinputpaths (conf, testfilepath.tostring ()); Inputsplit[] Splits = in.getsplits (conf, 1); System.out.println ("splits.length==" + splits.length); Conf.set ("Hive.io.file.readcolumn.ids", "1"); Recordreader reader = In.getrecordreader (Splits[0], conf, reporter.null); Object key = Reader.createkey (); Object value = Reader.createvalue (); list<? Extends structfield> fields = Inspector.getallstructfieldrefs (); Long offset = Reader.getpos (); while (reader.