Get the offset of a line in a file (get byte offset of a lines)
Scene:The offset of each line is obtained by a trip to the file.
The existing file system cannot directly navigate to a row, and if you know the line number, you can only read the line, and find what you want. However, when the file is very large, it is necessary to pre-processing, save the row of offset, the subsequent processing can be directly based on the offset to the specific line, no doubt this will increase the speed. So we need to implement: line to byte offset mapping.
Google did not find the code available, but the idea is obvious: read the file in byte order, and then read the line delimiter to do the corresponding processing, however easier.
Read Line by line
The first idea is to read one line at a time, and then the length of the line can be used to update offset, as follows:
Public void Init()throwsException {//Scan the file to construct the data structure for useFile =NewFile (path); BufferedReader reader =NewBufferedReader (NewFileReader (file)); String line =NULL; Entries =Newentry[ This. lines ()];intK =0;intoffset =0; while(line = Reader.readline ())! =NULL) {line = Line.replace ("\"","");intLen = Line.length (); String cells[] = Line.split (",");intStartIP = Ipv4util.ip2int (cells[0]);intEndIP = Ipv4util.ip2int (cells[1]);//system.out.println ("= =" + StartIP + "-" + EndIP);Entry Entry =NewEntry (StartIP, EndIP, offset); entries[k++] = entry;//NB. There is a problem here because offset is byte-level, and the string is related to a specific encodingOffset + = Len; } }
The problem is that the encoding of strings is easy to change, so the byte level is the most common.
Read byte by byte
Each time a byte is read, we update the offset according to the number of bytes read, and a buffer is required to save the read line.
Public void Init2()throwsException {entries =NewArraylist<entry> ();//Scan the file to construct the data structure for useRandomaccessfile RAF =NewRandomaccessfile (Path,"R");byteBuffer[] =New byte[1024x768];//buf This current line intb//Raf.length (); intoffset =0;intBytesread =0;intBufferindex =0; while((b = Raf.read ())! =-1){//Buffer it firstbuffer[bufferindex++] = (byte) b;//int to byteBytesread + +;if(b = =' \ n ') {String line =NewString (buffer); line = Line.replace ("\"","");//system.out.println (line); intLen = Line.length (); String cells[] = Line.split (",");intStartIP = Ipv4util.ip2int (cells[0]);intEndIP = Ipv4util.ip2int (cells[1]);//system.out.println ("= =" + StartIP + "-" + EndIP + "," + offset);Entry Entry =NewEntry (StartIP, EndIP, offset); Entries.Add (entry);//ResetBufferindex =0; Arrays.fill (Buffer, (byte)0); Offset + = Bytesread;//Prepare offset for next entryBytesread =0; } } }
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Get the offset of a line in a file (get byte offset of a lines)