Step by step teach you how to collect big data from public comments on merchant information and telephone collection tutorials

Source: Internet
Author: User

This article mainly introduces how to use the smart mode of the Houyi collector to collect the address, per capita, rating, telephone number and other information of the public comment merchants for free.

Collection tool introduction:

The Houyi collector is a Web page collector based on artificial intelligence technology. You only need to enter a website address to automatically identify web page data and complete data collection without configuration, it is the first web crawler software in the industry that supports three operating systems (including windows, Mac, and Linux.

This software is a truly free data collection software that has no restrictions on the export of collection results. users who do not have programming basics can easily implement data collection requirements.

Collection object introduction:

Public comments are China's leading local life information and trading platform, and are also the world's first independent third-party consumption reviews website. Public comments not only provide information services such as merchant information, consumption reviews, and consumption discounts, but also provide o2o transaction services such as group buying, restaurant reservation, takeout, and electronic membership cards.

Collection field:

Merchant name, seller link, address, comment count, per capita, taste, environment, service, Group Buying number, display image, phone number

Feature directory:

How to collect list + details page type webpage

How to collect the data of Mobile Page

How to download images

Preview collection results:

Export to an Excel table:

Image exported to the local device:

Next we will introduce in detail how to collect public comments online merchant data for free. We take Hangzhou buffet merchant data as an example. The specific steps are as follows:

Step 1: download and install the ghost collector and register for Logon

1. Click here to open the official website of the Houyi collector, download and install the crawler software tool-Houyi collector Software

2. Click register to log on, register a new account, and then register the collector.

[Tips]You can directly use this crawler without registration, but tasks under an anonymous account are lost when switching to a registered user. Therefore, we recommend that you use it after registration.

The Houyi collector is a product of arms cloud. If you are a arms user, you can log on directly.

Step 2: Create a collection task

1. Copy the webpage of the public comments online buffet merchant (the URL of the search result page is required, not the homepage URL)

Click here to learn how to enter the URL correctly.

2. Create a smart mode collection task

You can directly create a collection task on the software, or create a task by importing rules.

Click here to learn how to import and export collection rules.

Step 3: Configure collection rules

1. Set Data Extraction Fields

In the smart mode, after entering the URL, the software can automatically identify the data on the page and generate the collection result. Each type of data corresponds to a collection field. We can right-click the field for relevant settings, this includes modifying field names, adding or removing fields, and processing data.

Click here to learn how to configure collection fields.

On the list page, we need to collect the names, links, addresses, comments, users, tastes, environments, services, group purchases, images, and other content of the online merchant who reviews the public, because the star rating elements are special, this field cannot be collected in later versions v2.1.22. This function will be implemented in later versions. The field setting effect is as follows:

2. Use the in-depth collection function to extract detailed page data

Only part of the information of the buffet merchant is displayed on the list page. To collect the merchant's phone number, right-click the merchant link and use the in-depth collection function to go to the details page for collection.

Click here to learn more about how to collect a list + details page type webpage.

On the details page, we can see the merchant's phone number. Click "add field" and then click the merchant's phone number on the page.

We can see that the added fields collect characters rather than the actual merchant phone number. This is because in PC browser mode, the comments set the merchant phone number element, when we copy this phone number, it is not a real phone number but a character.

Because the content displayed on different web pages may be different in different browser modes, the seller's phone number of the public comment network can display the actual content in the mobile browser mode, therefore, we can extract the Merchant number fields by switching the browser mode.

Click here to learn more about switching the browser mode.

Click here to learn how to collect the content of the Mobile Page.

Step 4: Set and start the collection task

1. Set collection tasks

After adding the collected data, we can start the collection task. Before starting, we need to set the collection task to Improve the collection stability and success rate.

Click the "Settings" button. On the displayed running settings page, we can set the running settings and anti-blocking settings. Here, we select "Skip continue collection" and set the "2" second request wait time, select "Do not load web page images", set the anti-blocking settings according to the system default settings, and then click Save.

Click here to learn more about how to configure collection tasks.

2. Start the collection task

Click "Save and start" to perform advanced settings on the displayed page, including timed start, automatic warehouse receiving, and image downloading, in this example, the timed collection and automatic warehouse receiving functions are not used. After you select the download image to local function, click "start" to run the crawler tool.

Click here to learn more about Scheduled Collection.

Click here to learn more about automatic warehouse receiving.

Click here to learn more about how to download images.

[Tips]The free version can use the non-periodic acquisition function, and the image download function is free of charge. Advanced timing and automatic warehouse receiving functions can be used for personal Professional Edition and later versions.

3. Run the task to extract data

After the task is started, it starts to automatically collect data. We can intuitively see the program running process and collection results on the interface, and a notification will be given after the collection is complete.

Step 5: Export and view Data

After data collection is complete, we can view and export data. The Houyi collector supports multiple export methods (manually export to a local device, manually export to a database, automatically publish to a database, and automatically publish to a website) and the exported file format (Excel, CSV, HTML, and txt). Select the desired method and file type, and click "confirm export ".

Click here to learn more about how to view and clear collected data.

Click here to learn more about how to export the collection results.

[Tips ]:All manual export functions are free of charge. The personal Professional Edition and later versions can use the website publishing function.

Step by step teach you how to collect big data from public comments on merchant information and telephone collection tutorials

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.