Python simple way to separate a large file into multiple small files by paragraph

Source: Internet
Author: User

Help your classmates with a little corpus today. The corpus is a bit large, and it is a paragraph mark with two consecutive newline characters, and he wants to divide it into multiple small files, that is, every 3 paragraphs form a new file. Having never encountered a similar operation before, I found some similar methods on the internet and looked a bit complicated. So after trying to write a piece of code, the perfect solution to the problem.
The basic idea is to read the original file content, and use regular expressions, according to \ n for slicing, the result is a list, where each list element holds a slice of the content, and then create a handle to write the file, and then traverse the slice list, and write the current slice content, Determine if 3 paragraphs have been written, if not, continue to read and write the next slice, if it is 3, close the previous write file handle, recreate a new write file handle with a different file name, loop over, and wait for the next slice to read and write.

#-*-Coding:utf8-*-Import Re;p=re.compile (' \ n ', Re. S); filecontent=Open(' files/office-TXT ',' R ', encoding=' UTF8 ').Read();#读文件内容Paralist=p.Split(filecontent)#根据换行符对文本进行切片Filewriter=Open(' Files/0.txt ',' A ', encoding=' UTF8 ');#创建一个写文件的句柄 forParaindex in range (len (paralist)):#遍历切片后的文本列表FileWriter.Write(Paralist[paraindex]);#先将列表中第一个元素写入文件中    if((paraindex+1)%3==0):#判断是否写够3个切片, if that's enough.FileWriter.Close();#关闭当前句柄Filewriter=Open(' files/'+str ((paraindex+1)/3)+'. txt ',' A ', encoding=' UTF8 ');#重新创建一个新的句柄, waits for the next slice element to be written. Note The handling techniques for file names here. FileWriter.Close();#关闭最后创建的那个写文件句柄Print(' finished ');

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Python simple way to separate a large file into multiple small files by paragraph

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.