Perl converts html2txt to Chinese
Use HTML :: Treebuilder; Use HTML :: Formattext; Open Output, " > Output.txt " ; $ File = " C:/test.html " ; $ Tree = Html: treebuilder-> New-> parse_file ( " $ File " ); $ Formatter = Html: formattext-> New (leftmargin => 3 , Rightmargin => 100 ); Print Output $ Formatter -> Format ($ Tree ); Close Output;
1. If the script is executed, the system prompts can't locate html/formattext. pm IN @ INC (@ INC contains: C:/perl/lib C:/perl/site/lib .)
Solution:
1) install the HTML: formattext module. After the installation is successful, run the script.
2) if the error persists after installation, check whether formattext exists in the C:/perl/site/lib/HTML or C:/perl/lib/html directory. PM file, if not, under the HTML-Format-2.10 \ Lib \ HTML directory. copy the PM file to the C:/perl/site/lib/HTML or C:/perl/lib/html directory and run the script.
2. If the Generated Chinese characters contain garbled characters, comment out the following statements in c: \ Perl \ site \ Lib \ HTML \ formattext. PM:
# $ Text = ~ TR/\ xa0 \ XAD // D; # Replace the character values and <of A0 and ad with null.