Question about html tidy programming in GNU C

Source: Internet
Author: User
Tags tidy

Recently, I am taking an online Lab Course to extract the text. The teacher asked me to use dom-tree for extraction. We need to extract HTML and convert it into XML, I am not learning gnu c programming. I just want to write it myself. It seems that HTML conversion to XML is not a matter of one or two days. I just want to know if there are any tools, the technology is quite frustrating), and suddenly found an artifact-HTML tidy. This software can help us fix common source code problems in HTML (loss of end tags, etc.) and convert HTML into standard XHTML, which is good. So I searched for the tidy GNU C API package on the Internet and found that there was basically no use of the tidy API on GNU C, basically jtidy and so on (it seems that there are not many people in gnu c, but I have to try it ). Later I found that in Linux, tidy's support is basically in the native state, but the default installation is not provided.

1. First install the tidy software and run the following command:

 

 
  Sudo Apt-Get install tidy

 

Or enter the new software package to install tidy, for example:

 

Figure 1-1

Figure 1-2

2. After the installation is completeProgramIn writing, use the API, because it does not contain the installation package, which requires us to enter the new software package to install its development package. In fact, add the tidy header file in/usr/include, we can add the header file in the program, and then we can use the tidy API. The installation package is as follows:

Figure 2-1

In fact, the libtidy-dev package adds several. h header files, such:

3. after the installation is complete, you can write the program. Note that the header file is added, for example, under/usr/include/tidy, so when you add a header file, add # include <tidy/tidy. h> class type, the following simple program (this is the example program on the tidy official website ):

# Include <tidy/tidy. h> # include <tidy/buffio. h> # include <stdio. h> # include <errno. h> int main (INT argc, char ** argv) {const char * input = "<title> Foo </title> <p> foo! "; Tidybuffer output = {0}; tidybuffer errbuf = {0}; int rc =-1; int OK; tidydoc tdoc = tidycreate (); // initialize "document" printf ("tidying: \ t % s \ n", input); OK = tidyoptsetbool (tdoc, tidyxhtmlout, yes ); // convert to XHTML if (OK) rc = tidyseterrorbuffer (tdoc, & errbuf); // capture diagnostics if (rc> = 0) rc = tidyparsestring (tdoc, input ); // parse the input if (rc> = 0) rc = tidy Cleanandrepair (tdoc); // tidy it up! If (rc> = 0) rc = tidyrundiagnostics (tdoc); // Kvetch if (rc> 1) // If error, force output. rc = (tidyoptsetbool (tdoc, tidyforceoutput, yes )? RC:-1); If (rc> = 0) rc = tidysavebuffer (tdoc, & output); // pretty print if (rc> = 0) {If (rc> 0) printf ("\ ndiagnostics: \ n % s", errbuf. BP); printf ("\ NAND here is the result: \ n % s", output. BP);} else printf ("A severe error (% d) occurred. \ n ", RC); tidybuffree (& output); tidybuffree (& errbuf); tidyrelease (tdoc); Return RC ;}

4. because the installation of the API package is only the header file, directly compile will report the undefined reference to error, this is because there is no implementation, only the Declaration, so we installed the libtidy-0.99-0 is useful, this is the dynamic link library of tidy. libtidy is installed under/usr/lib. so this file, so we need to use this dynamic link library during compilation, so there will be no declaration, no implementation error, the compilation command is as follows:

 
  GCC file name. c-L/USR/Lib-Ltidy

 

In this way, you can use the tidy API in gnu c.

 

To sum up, we only need the libtidy-dev 4 header file, libtidy. so this dynamic link library, and then add a dynamic link library during program compilation. It is not as difficult as you think. It can be completed in a few steps. It feels like native support, it may be because not so many people on gnu c are doing this, but they are doing a lot in Java, soArticleThere are also many. I wrote this article to commemorate the following half-day waste of time and help the later students who do not know how to use tidy on GNU C, so as not to waste so much time.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.