Python simple way to separate a large file into multiple small files by paragraph

Source: Internet
Author: User

Help your classmates with a little corpus today.

The corpus is a bit large, and it is a paragraph mark with two consecutive newline characters, and he wants to separate it into multiple small files by paragraph. That is, each of the 3 paragraphs constitutes a new file. Because I've never had a similar operation, I've looked at some of the same things on the internet that seem a little complicated.

So after trying. Wrote a piece of code yourself. Perfect solution to this problem.
The basic idea is to read the original file content and use the regular form. According to \ n, the slicing process. The result is a list in which each list element holds the contents of a slice, and then creates a handle to the file, then iterates through the slice list and writes to the current slice, infers if 3 paragraphs have been written, assuming no, and continues to read and write the next slice, assuming that it is 3. Closes the previous write file handle, creates a new write file handle again with a different file name, loops over, and waits for the next slice to be read and written.

#-*-Coding:utf8-*-Import Re;p=re.compile (' \ n ', Re. S); filecontent=Open(' files/office-TXT ',' R ', encoding=' UTF8 ').Read();#读文件内容Paralist=p.Split(filecontent)#根据换行符对文本进行切片Filewriter=Open(' Files/0.txt ',' A ', encoding=' UTF8 ');#创建一个写文件的句柄 forParaindex in range (len (paralist)):#遍历切片后的文本列表FileWriter.Write(Paralist[paraindex]);#先将列表中第一个元素写入文件里    if((paraindex+1)%3==0):#推断是否写够3个切片, assuming it's enough.FileWriter.Close();#关闭当前句柄Filewriter=Open(' files/'+str ((paraindex+1)/3)+'. txt ',' A ', encoding=' UTF8 ');#又一次创建一个新的句柄. Waits for the next slice element to be written.

Note The handling techniques for file names here.

fileWriter.close();#关闭最后创建的那个写文件句柄print(‘finished‘);

Python simple way to separate a large file into multiple small files by paragraph

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.