java-php or Python for data collection and analysis, what is the more mature framework?

Source: Internet
Author: User
I now need to automate data collection on the list of articles in a Web site and the actual content in the list, which can get the ID of each article, And each article through a unified interface (parameter with that article ID to get to the corresponding JSON) there is a part of the data need to collect and then analyze data.

Is there any more mature frame or wheel that can meet my needs? (Multi-threading, and can run at 7x24 hours, because the number of acquisitions is huge)

In addition to ask, how to store the collected content (million to tens of millions), the data there are some digital data, the need for statistical analysis, with MySQL can it? Or is there any other more mature and simple wheels that can be used?

Reply content:

I now need to automate data collection on the list of articles in a Web site and the actual content in the list, which can get the ID of each article, And each article through a unified interface (parameter with that article ID to get to the corresponding JSON) there is a part of the data need to collect and then analyze data.

Is there any more mature frame or wheel that can meet my needs? (Multi-threading, and can run at 7x24 hours, because the number of acquisitions is huge)

In addition to ask, how to store the collected content (million to tens of millions), the data there are some digital data, the need for statistical analysis, with MySQL can it? Or is there any other more mature and simple wheels that can be used?

If it is data analysis.
Map-reduce Doing log analysis
Dpark can solve PV and UV analysis
Spark is good, too.
The production data report can be analyzed and displayed with pandas.

If it is data acquisition. There's a lot of tools.

What do I think you're going to do with a search engine? The volume ratio is large. Recommended for distributed things.
Using MySQL is not very realistic ...

Boy, aren't you a reptile's need?

    1. Reptile Frame: scrapy

    2. Database selection: You can use MySQL to index the level of the 500-year war.

You can also try to use MongoDB

You don't speak any language or environment. Multi-threaded words, the current general use of Nodejs, Python. Both can use storage data such as MySQL. Millions of never a problem.

Have you ever played Python selenium + Phantomjs?

This scrapy of the Python language is still

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.