Talk about the harvest in a PHP program that divides txt by chapter

Source: Internet
Author: User
Tags array to string php online

Recently in the automatic division of a TXT novel, can be a whole TXT file to be divided into chapters, and then broken down into a small. txt file saved and can get the number of chapters and chapters of each chapter name.

My initial idea was:

① first open the file with fopen, and then the while loop uses the Fgets function to read each line in the TXT file

The ② side reads the edge using a regular match to see if there is a string in this sentence that contains "chapter * *" or "section * *". If you have something, save it in an array.

③ use Count to calculate the size of the array after all loops are complete, and then use the Foreach Loop array to string-stitch the chapter names of each chapter (such as "#章节名 #" into the database for later segmentation of chapter names using functions such as explode).

Then I started knocking on the code with this idea. But soon came the first question

----How to use regular expressions to match Chinese!!!

Always used regular expressions to match some English characters ah what, I have always thought that the regular expression can directly match the Chinese characters, and then wrote the following code

if (preg_match("/section [0-9 Chisi]*[Chapter]/i",$hangdata,$matches)) {    }

I really think this is a bit too easy ... The result is nothing to match.

Most of the time, the first thought is impossible!!! Then I used the simple code to look at it, but did not find any problem ah. So began Baidu, have to say that Baidu really has a lot of useful things, and soon found the user to share the match between the characters of the content, to change the Chinese characters to Unicode encoding form to match ... The following is the modified code

if (preg_match("/(\X{7B2C}) (\s*) ([\x{4e00}\x{4e8c}\x{4e09}\x{56db}\x{4e94}\x{516d}\x{4e03}\x{516b}\x{ 4e5d}\x{5341}\x{767e}\x{5343}0-9]+) (\s*) ([\x{7ae0}\x{8282}]+)/U ",$hangdata,$matches)) {} 

I think it should be no problem this time ~ is happy to refresh the page ... How come I didn't come out with something. Is the Netizen's contribution wrong? So careful examination of the side but still did not find out what the problem, so online to find a PHP online Chinese manual To see, the original use Preg_match match Chinese requirements are matched content is UTF-8 encoding. and read from the TXT is generally GBK encoded

And then added a line of character conversion code

$hangdata=mb_convert_encoding ($hangdata, "UTF-8", "GBK");

Now a run ~ok is complete.

However, when scrolling the page, it was found that the same chapter name was matched two or more times. This error is very serious Ah, before the author wrote a chapter of the content, with this one point into a lot of chapters.

So just match the chapter name in each chapter and compare it to the chapter name he recently read, and see if it's the same.

So, at the very beginning, we defined an empty string variable.

Each time the loop is compared with the current chapter name if the same, the current chapter name is no longer recorded, if not the same as the record and assign the chapter name to the variable.

Although there is still a bit of unsatisfactory, but the main function has been achieved basically

Talk about the harvest in a PHP program that divides txt by chapter

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.