Recently, according to the instructor's requirements, I made a small file for parsing XML files, capturing HTML webpages, and parsing webpages.Program. But it took a week. There are many gains, and practice is the only criterion to test truth.
1. The first choice is C ++ (a familiar language ).Programming Language), The result shows that the open-source C ++ library is small.
2. There are still some C ++ libraries for parsing XML files. I used tinyxml at the beginning, which is very small and easy to use, but the biggest regret is that it does not support the wchar_t type. This is why I finally had to give up. The last use is rapidxml, which is based on the C ++ template. The interface type is similar to tinyxml. But the biggest highlight (I think) is that it supports wchar_t.
3. c ++ does not have an open-source HTML Parser library in windows. I am very disappointed with this. Fortunately, I just want to take some care of it and do not need to parse the entire HTML file into a DOM tree. Use the find function of wstring (string) to solve the problem. (The problem is far-fetched.)
4. The biggest feeling is that it is necessary to learn java. It is directed at so many open-source libraries.