Get the offset of a line in a file (get byte offset of a lines)

Source: Internet
Author: User

Get the offset of a line in a file (get byte offset of a lines) Scene:The offset of each line is obtained by a trip to the file.

The existing file system cannot directly navigate to a row, and if you know the line number, you can only read the line, and find what you want. However, when the file is very large, it is necessary to pre-processing, save the row of offset, the subsequent processing can be directly based on the offset to the specific line, no doubt this will increase the speed. So we need to implement: line to byte offset mapping.
Google did not find the code available, but the idea is obvious: read the file in byte order, and then read the line delimiter to do the corresponding processing, however easier.

Read Line by line

The first idea is to read one line at a time, and then the length of the line can be used to update offset, as follows:

 Public void Init()throwsException {//Scan the file to construct the data structure for useFile =NewFile (path); BufferedReader reader =NewBufferedReader (NewFileReader (file)); String line =NULL; Entries =Newentry[ This. lines ()];intK =0;intoffset =0; while(line = Reader.readline ())! =NULL) {line = Line.replace ("\"","");intLen = Line.length (); String cells[] = Line.split (",");intStartIP = Ipv4util.ip2int (cells[0]);intEndIP = Ipv4util.ip2int (cells[1]);//system.out.println ("= =" + StartIP + "-" + EndIP);Entry Entry =NewEntry (StartIP, EndIP, offset); entries[k++] = entry;//NB. There is a problem here because offset is byte-level, and the string is related to a specific encodingOffset + = Len; }    }

The problem is that the encoding of strings is easy to change, so the byte level is the most common.

Read byte by byte

Each time a byte is read, we update the offset according to the number of bytes read, and a buffer is required to save the read line.

 Public void Init2()throwsException {entries =NewArraylist<entry> ();//Scan the file to construct the data structure for useRandomaccessfile RAF =NewRandomaccessfile (Path,"R");byteBuffer[] =New byte[1024x768];//buf This current line        intb//Raf.length ();        intoffset =0;intBytesread =0;intBufferindex =0; while((b = Raf.read ())! =-1){//Buffer it firstbuffer[bufferindex++] = (byte) b;//int to byteBytesread + +;if(b = =' \ n ') {String line =NewString (buffer); line = Line.replace ("\"","");//system.out.println (line);                intLen = Line.length (); String cells[] = Line.split (",");intStartIP = Ipv4util.ip2int (cells[0]);intEndIP = Ipv4util.ip2int (cells[1]);//system.out.println ("= =" + StartIP + "-" + EndIP + "," + offset);Entry Entry =NewEntry (StartIP, EndIP, offset); Entries.Add (entry);//ResetBufferindex =0; Arrays.fill (Buffer, (byte)0); Offset + = Bytesread;//Prepare offset for next entryBytesread =0; }        }    }

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Get the offset of a line in a file (get byte offset of a lines)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.