The entire process of encoding and conversion for a 3G text in Linux
Source: Internet
Author: User
Linux implements the entire process of encoding and conversion for a 3G text-Linux general technology-Linux programming and kernel information. The following is a detailed description. Linux commands involved in this process include: split, iconv, cat
Problem: There is a temporary a.txt encoded in gbk format. Now we need to convert it into UTF-8.
Difficulty: iconv conversion is performed in the memory. Therefore, 3G text cannot be directly converted.
Idea: use split to split files, convert ivonv to each word file, and then merge cat.
The procedure is as follows:
1) ll-h a.txt: Check the file size, 2.9 GB
2) Check the number of lines in the wc-l a.txt file, 2 million rows
3) split-l 20000000 a.txt chunk is split into five files by 10 million lines per file.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.