The Java API reads hive orc files __java

Source: Internet
Author: User

Java API Read hive orc file

Reprint: http://lxw1234.com/archives/2015/08/462.htm

Orc is a hive-specific file format that has a very high compression ratio and read efficiency, so it quickly replaces the previous rcfile and becomes a very common file format in hive.

In a real business scenario, you might want to use the Java API, or MapReduce read and write orc files.

This article first describes using the Java API to read hive ORC files.

There is already a table lxw1234 stored in the ORC format in hive:

The table has four fields: URL, Word, freq, weight, type string;

There are only 5 data:

The following code reads the orc file directly from the HDFs path of the table lxw1234, using the API:

Package com.lxw1234.test;   Import java.util.List; Import java.util.Properties;   Import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; Import Org.apache.hadoop.hive.ql.io.orc.OrcSerde; Import Org.apache.hadoop.hive.serde2.objectinspector.StructField; Import Org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; Import Org.apache.hadoop.mapred.FileInputFormat; Import Org.apache.hadoop.mapred.InputFormat; Import Org.apache.hadoop.mapred.InputSplit; Import org.apache.hadoop.mapred.JobConf; Import Org.apache.hadoop.mapred.RecordReader; Import Org.apache.hadoop.mapred.Reporter;  /** * LxW large data field-http://lxw1234.com * @author lxw.com */public class Testorcreader {  public static void Main (string[] args) throws Exception {jobconf conf = new jobconf (); Path Testfilepath = new Path (args[0]); Properties P = new properties (); Orcserde Serde = new Orcserde (); P.setproperty ("Columns", "url,word,freq,weight"); P.setproperTy ("Columns.types", "string:string:string:string"); Serde.initialize (conf, p); Structobjectinspector inspector = (structobjectinspector) serde.getobjectinspector (); InputFormat in = new Orcinputformat (); Fileinputformat.setinputpaths (conf, testfilepath.tostring ()); Inputsplit[] Splits = in.getsplits (conf, 1); System.out.println ("splits.length==" + splits.length); Conf.set ("Hive.io.file.readcolumn.ids", "1"); Recordreader reader = In.getrecordreader (Splits[0], conf, reporter.null); Object key = Reader.createkey (); Object value = Reader.createvalue (); list<? Extends structfield> fields = Inspector.getallstructfieldrefs (); Long offset = Reader.getpos (); while (reader.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.