Python separates a large file into multiple small files by paragraphs,

Source: Internet
Author: User

Python separates a large file into multiple small files by paragraphs,

Today, I want to help students with some corpus. The corpus file is a little large and uses two consecutive linefeeds as a paragraph sign. He wants to separate it into multiple small files by paragraph, that is, every three paragraphs form a new file. Since I have never encountered similar operations before, I found some similar methods on the Internet, and they all look a little complicated. So after trying, I wrote a piece of code to solve the problem perfectly.
The basic idea is to first read the content of the original file and use a regular expression to perform Slice Processing Based on \ n. The result is a list, where each list element stores the content of a slice; create a Write File handle. Next, traverse the slice list and write the current slice content to determine whether three paragraphs have been written. If not, continue to read and write the next slice, if there are already three Write File handles, close the previous Write File handles and re-create a new Write File handle with different file names. The loop ends, waiting for the next Shard to be read and written.

#-*-Coding: utf8-*-import re; p = re. compile ('\ n \ n', re. s); fileContent = open ('files/office .txt ', 'R', encoding = 'utf8 '). read (); # read the file content paraList = p. split (fileContent) # segment the text based on the linefeed fileWriter = open ('files/0.txt ', 'A', encoding = 'utf8 '); # create a Write File handle for paraIndex in range (len (paraList): # traverse the sliced text list fileWriter. write (paraList [paraIndex]); # first write the first element in the List into the file if (paraIndex + 1) % 3 = 0 ): # determine whether to write three slices. If fileWriter is enough. close (); # close when Front handle fileWriter = open ('files/'+ str (paraIndex + 1)/3100000000'.txt', 'A', encoding = 'utf8 '); # re-create a new handle and wait for writing the next slice element. Note the file name processing skills. FileWriter. close (); # close the last created Write File handle print ('finished ');

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.