I. Overview
This product is distributed, fast, stable, applicable to a wide range of acquisition, enterprise-grade products, suitable for large data collection (daily acquisition in tens of millions of, the amount of data at the level of billions), high-timeliness requirements of enterprises, such as public opinion companies and big data analysis companies, data real-time monitoring companies.
Second, specific description
1. Distributed
The distributed architecture consists of a dispatch server and multiple acquisition nodes, and the dispatch server can manage multiple node nodes simultaneously, such as restarting the node at the same time, making the rules release simultaneously, and can view the operation of each node in a unified interface. Provides a collection node early warning mechanism. Multiple acquisition nodes work together to effectively avoid the duplication of data collected by different acquisition nodes.
2. Fast speed
Our products are different from other crawler software on the market, this product is pure background process run, do not need to render graphical interface but directly parse the message format, speed is probably 30~100 times other products .
3. Stable
It can run continuously for up to five hours, stable operation, and customers have used our products for nearly 1 years and still run well.
4. Wide range of collection
This product can collect any format and form of data, such as can collect Baidu map data, gold map data, can collect mobile phone APP data, you can capture the full amount of data for a given website. These capabilities are not available to other acquisition software on the market.
5. Wide data collection format
you can collect HTML,XML,json, picture files, video files, word files,pdf files, Excel All formats, such as files, can be collected.
6, effective breakthrough anti-collection mechanism
built-in multi-breakthrough anti-collection methods and solutions to effectively increase the acquisition range
In short, our customers located in the acquisition of large data volume, timeliness of the big data enterprises, is the real sense of enterprise-class products, different from the market acquisition software (only small-scale data collection, and limited collection range). Our products can save more than half of the company's crawler engineers human resources. Data acquisition looks simple, but to achieve large data acquisition and the stability of the full data collection is a very difficult thing, and now the crawler engineers are scarce, and most of the experience is not enough, even if the crawler engineer can not solve all the crawler problems, from the current view of our product market demand is very large, As big data rises, it gets bigger.
Digital bring together distributed acquisition Platform Trial report