Listen to people say that text classification when processing 100G text files, incredibly without big data, processing method is to use the shell split to split into a number of small files.
Split command
NAME split-split a file into Piecessynopsis split [OPTION] [INPUT [Prefix]]description Output fixed-s ize pieces of INPUT to Prefixaa, Prefixab, ...; The default size is lines, and the default PREFIX is ' x '. With no input, or when input was-, read standard input. Mandatory arguments to long options is Mandatory for short options too. -A,--suffix-length=n use suffixes of length N (default 2)-B,--bytes=size put SIZE bytes Per output file-c,--line-bytes=size put at the most SIZE bytes of lines per output file-d,--nume ric-suffixes use numeric suffixes instead of alphabetic-l,--lines=number put number line s per output file--verbose print a diagnostic to standard error just before each output file is opened --help Display this Help and exit--version output version information and exit SIZE May H Ave a MultiplierSuffix:b for a, K for 1K, M for 1 Meg.
-l split file by row
-B splits files by the specified size, supports b,k,m
Cases:
Split-b 256m Result_guid_active_train_all Small
Ll-lh
-rw-rw-r--1 256M June 20:29 Smallaa
-rw-rw-r--1 256M June 20:29 Smallab
-rw-rw-r--1 256M June 20:29 Smallac
-rw-rw-r--1 256M June 20:29 Smallad
-rw-rw-r--1 256M June 20:29 smallae
-rw-rw-r--1 256M June 20:29 Smallaf
-rw-rw-r--1 256M June 20:29 Smallag
-rw-rw-r--1 256M June 20:29 Smallah
-rw-rw-r--1 256M June 20:29 Smallai
-rw-rw-r--1 256M June 20:29 Smallaj
Split of shell command