Today ran into a colleague complained, SQL Server exported CSV, obviously have 1000W, but open with Excel only 100W left, a full disappearance of 90%, so she suspects that the file is a sufficient amount of 1000W, Excel smashed the ghost. However, the file capacity is 2g+, with Notepad can not open, how to prove that the CSV file is not missing data, it is difficult to break him.
Well, in the spirit of not looking at other wheels, there is a problem to build a principle, I decided to use the console program to write a simple read program, the specific code is as follows:
usingSystem;usingSystem.Collections.Generic;usingSystem.Linq;usingSystem.Text;usingSystem.Threading.Tasks;namespacebigtextreader{classProgram {Static voidMain (string[] args) { stringPath =""; Do{Console.WriteLine ("Please input the file path:"); Path=Console.ReadLine (); } while(!System.IO.File.Exists (path)); varFileStream =System.IO.File.OpenRead (path); while(true) {Console.WriteLine ("Please input the start position:"); varPosition =Int64.parse (Console.ReadLine ()); if(Position = =-1) {Console.WriteLine ("Finish"); return; } filestream.position=position; varByts =Newbyte[ +]; FileStream.Read (Byts,0, +); varstr =Encoding.UTF8.GetString (Byts); Console.WriteLine (str); } } }}
OK, the program as shown, the first step, the absolute address of the input file, such as D:\a.csv, the second step, the location of the input text, such as 100000, the program by default read 1000 bytes for display. When the position is entered as-1, the program exits.
A basic large text reader is the first prototype, with each row of byte number *200w, sure enough to read the data, the perfect proof of the colleague's conjecture, at the same time, read only 100ms.
PS: Personal feeling, Encoding, read the number of bytes can be written configuration, but will drag the long operation process, at the same time, direct int64.parse is because lazy, small partners do not imitate oh.
C # reading large text files