I now need to perform automatic data collection on the list of articles on a website and the actual content in the list. the list can obtain the id of each article, each article uses a unified interface (with the article id included in the parameter to obtain the corresponding json... I now need to perform automatic data collection on the list of articles on a website and the actual content in the list. the list can obtain the id of each article, each article uses a unified interface (the parameter can be taken with the article id to obtain the corresponding json), and a part of the data needs to be collected and then analyzed.
Is there any mature framework or wheel that can meet my needs? (Multithreading is required, and it can run stably for 7x24 hours, because the collection volume is huge)
In addition, I would like to ask how to store the collected content (millions to tens of millions). There are some digital data in the data that requires statistical analysis. Can I use mysql? Or are there other more mature and easy-to-use wheels?