For work reasons, a complete zip code database is required. The data on the Internet is relatively old and there is very little data. I really hope the post office can download the zip code for free, or you can have an interface. At last, I did not pay for it, and my eyes were settled in http://www.cpdc.com.cn/(China Post Group Company Name Information Center ). The data in this area is very complete. But the problem also arises, and there is no place to download the database.
Although no database or interface can be directly downloaded is provided, the queried HTML is regarded as data. As long as the HTML source code of the page is analyzed, the data on each page can be retrieved. However, the website also has some small obstacles-verification code. The verification code is required every time you click the next page, but there will always be a solution ......
Everyone must be familiar with httpwatch. If you don't know about httpwatch, You can Google it. We can solve the problem through httpwatch and perform better analysis. We are clicking query, or what is done on the next page.
Step 1: query the zip code of Beijing. The rest are empty. Click query. The httpwatch monitoring status is as follows:
Step 2: click "next page", as shown below:
In fact, through the two pictures above, we can already know what we need:
1. Data is submitted to the "http: // ***/action. Do" Page during query;
2. The post data is: dist_cd_addr: top-level zip code; pageno: page number; reqcode: getpostcdbydistcdaddr should be action
Through the content obtained above, we can submit the data to the specified page through the XMLHTTP object, and then obtain the required data by repeating the number of pages. If you submit the value of dist_cd_addr to an empty value, you will find that the data obtained is all data nationwide, totaling more than 1.7 million.
The C # code snippet is as follows:
MSXML2.XMLHTTP xmlhttp = new MSXML2.XMLHTTP();string PostData;PostData = "addr_name=&cityselect=&ctyselect=&dist_cd_addr=&prvselect=&reqCode=getPostcdByDistcdAddr&pageNo=1&type=1&pageSize=1000";xmlhttp.open("POST", "http://***/postcdQueryAction.do", false, null, null);xmlhttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");xmlhttp.setRequestHeader("Content-Length", PostData.Length.ToString());xmlhttp.send(PostData);s = xmlhttp.responseText;
In this way, we can get data and parse the content of the S string. However, because the data volume is too large, it takes some time to get the data, which is opposed to multithreading, this may cause server instability and affect the normal use of the server. The purpose of writing this blog here is to introduce this method, monitor through httpwath, then use a program to simulate Page Submission and obtain data. In fact, this public database is more likely to be downloaded for free!
Provide a database prepared by the predecessors: http://download.csdn.net/source/2153604