Characteristics:
1, currently support Tianya (forum), Sina Forum, and so on. Program provides an expansion framework that can increase support for new forums.
2, provides the function of automatic typesetting.
3, provides the simple statistic function.
, see the following section of this article for usage:
Http://pan.baidu.com/s/1ntwkwOD
, download the post:
, Automatic Processing:
Statistics:
The following is a usage note. Novice attention to the Brown text can:
Tz2txt, this tool is used to help you to < posts in the landlord to speak > to < pure TXT file >.
This tool (including source code) has been uploaded to GitHub to get the latest version accessible:
Https://github.com/animalize/tz2txt
"Install Python"
This tool is written in Python and must be installed in the Python runtime environment(version above 3.4). Download to the Python website:
https://www.python.org/downloads/
You can also download it directly using the link below. If you are not sure whether it is 32-bit or 64-bit, download the 32-bit:
Windows 32-bit version of the Python installation package:
Https://www.python.org/ftp/python/3.4.3/python-3.4.3.msi
Windows 64-bit version of the Python installation package:
Https://www.python.org/ftp/python/3.4.3/python-3.4.3.amd64.msi
After installing Python, execute the command pip install Colorama, some of the information will be displayed in color, do not execute this command will work, but will not display color.
"Overall Workflow"
First, download.
Download part or all of the post to generate < compose text > that only includes the landlord's speech.
Ii. editing and typesetting.
This step can be handled automatically or manually.
In < orchestration text >, there is a retention tag behind each reply, such as:
<mark>══════ retention Tag: █
If you want to discard this reply, remove the last black square.
A reply is between lines that start with <time> and <mark>. As long as the <time> lines and <mark> lines are paired, you can edit the contents of the reply.
Third, compile.
To compile the unhandled or processed < text > compiled into < plain text >.
"How To"
One simple usage is:
double-clicking "_a fully automatic. Bat" automatically generates Auto.txt, but does not save the downloaded file and does not save the auto-processed orchestration file.
Second, the more comprehensive usage is:
Double-click _1 Download post. bat, _2 process orchestration. bat, _3 compile final. bat.
In this process, the downloaded posts are saved as dl.txt, and the Automatically processed text is saved as bp.txt.
Final.txt is the plain text after compilation, and ~discard.txt is the content that is discarded when automatically processed.
(Note: Automatic processing has its limitations and users can handle it manually)
Third, to view the statistics of the orchestration file Bp.txt, you can double-click the _b statistics orchestration. bat.
"Small Experience"
☆ Do not recommend editing < arranging text with Notepad, you can use the free open source text editor notepad++ (http://notepad-plus-plus.org/).
☆ If the post is too long, you can download it in sections (e.g. 50 pages per download).
☆ After compiling, you can download the original < orchestration text > and processed < text > keep for a period of time.
☆ If the network speed is slow, you can change the time-out seconds of the single download action in the fetcher.py file, the default is Open_timeout = 60.
☆ saved files are GB18030 encoded (compatible with GB2312/GBK).
☆ Program Reservation expansion space, can be added to support the new forum, see the Sites folder description.
"Appendix: Program Parameters"
1, D function, download the post (only includes the landlord's speech), and save as < layout text;, the parameters are:
tz2txt.py d [-u url] [-t pages] [-o filename]
-u URL: The URL of a page of a post, may not be the homepage
-T Pages: The total number of pages to download, 1 to the final page (if the post is long, use caution-1)
-O file name: Is the output of the < orchestration text > file name
Example: tz2txt.py d-u http://bbs.sample.com/thread-12345.html-t 10-o download.txt
From the current page, download 10 pages, save < Compose text > to Download.txt
2, p function, automatic processing < layout text, such as removing duplicate replies, processing the reference format:
Tz2txt p [-I. File name] [-o filename]
-I file name: Input < orchestration text >
-o file name: Output < orchestration text >
Example: tz2txt.py p-i download.txt-o bp.txt
Automatically process download.txt and save as Bp.txt
3, S function, statistics < layout text > information:
tz2txt.py s [-I. file name]
-I file name: Input < orchestration text >
Example: tz2txt.py s-i bp.txt
Statistics Bp.txt file information and display
4, C function, compile < compose text > to < plain text;:
Tz2txt C [-I file name] [-o file name] [-D file name]
-I file name: Input < orchestration text >
-o file name: Output < plain text >
-D FileName: Saves the reply dropped during compilation to this file
Example: tz2txt.py c-i bp.txt-o final.txt-d ~discard.txt
Compile < text >bp.txt into < plain text >final.txt,
and save the replies discarded during compilation to ~discard.txt.
5, a function, automatic generation < plain text;:
tz2txt.py a [-u url] [-t pages] [-o file name] [-D file name]
-u URL: The URL of a page of a post, may not be the homepage
-T Pages: The total number of pages to download, 1 to the final page (if the post is long, use caution-1)
-O file name: Is the output of the < plain text > file name
-D FileName: Saves the reply dropped during compilation to this file
Example: tz2txt.py a-u http://bbs.sample.com/thread-12345.html-t 10-o auto.txt-d ~discard.txt
Starting from the current page, download 10 pages, generate < plain text > to Auto.txt,
and save the replies discarded during compilation to ~discard.txt.
Small tools, to the end of the forum floor of the Landlord (reply) to save as txt