標籤:聯通 選號 採集 正則
最近北京聯通號碼出來了185號段,一直沒有辦北京的號碼,在聯通網上營業廳上選號碼,所以就萌發了選個靚號的想法,於是乎翻了網站的結構,長串連+隨機組實現.
立馬開始寫程式擷取手機號碼,然後對手機號碼篩選靚號(正則匹配,反向引用,零寬斷言).
種子地址:
http://num.10010.com/NumApp/GoodsDetail/queryMoreNums?callback=jsonp_queryMoreNums&province=11&cityCode=110&rankMoney=76&q_p=${page}&net=01&preFeeSel=0&Show4GNum=TRUE&_=${random}
用HttpClient做的http請求,傳回值是Json格式(用的jsel運算式,個人覺得很犀利的),手機號碼中存在moreNumArray的屬性裡,數群組類型.
擷取資料,將資料分類篩選,篩選出AAAA,AAA,AABB,ABCD,ABC,DCBA,CBA類型的手機號碼,然後儲存到本地檔案,就是這麼簡單,程式實現如下:
public class PhoneNumber { private static Set<String> NO4 = new TreeSet<String>(); private static Set<String> AAAA = new TreeSet<String>(); private static Set<String> AAA = new TreeSet<String>(); private static Set<String> AABB = new TreeSet<String>(); private static Set<String> ABCD = new TreeSet<String>(); private static Set<String> DCBA = new TreeSet<String>(); private static Set<String> ABC = new TreeSet<String>(); private static Set<String> CBA = new TreeSet<String>(); private static AtomicLong phoneNumberSize = new AtomicLong(0); public static void main(String[] args) throws IOException, URISyntaxException { String seed = "http://num.10010.com/NumApp/GoodsDetail/queryMoreNums?callback=jsonp_queryMoreNums&province=11&cityCode=110&rankMoney=76&q_p=${page}&net=01&preFeeSel=0&Show4GNum=TRUE&_=${random}"; BasicCookieStore cookieStore = new BasicCookieStore(); CloseableHttpClient httpClient = HttpClients.custom().setDefaultCookieStore(cookieStore).build(); try { for (int i = 0; i < 100; i++) { HttpGet httpget = new HttpGet(seed.replace("${page}", new Integer(1).toString()).replace("${random}", String.valueOf(new Date().getTime()))); request(httpClient, httpget); } print(); report(); } finally { httpClient.close(); } } private static void report() throws IOException { writer("AAAA",AAAA); writer("AAA",AAA); writer("AABB",AABB); writer("ABCD",ABCD); writer("ABC",ABC); writer("DCBA",DCBA); writer("CBA",CBA); } private static void writer(String name, Set set) throws IOException { File file = new File("./${name}-${data}.phone".replace("${name}", name).replace("${data}", new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss").format(new Date()))); file.createNewFile(); PrintWriter writer = new PrintWriter(file); writer.println("report:"); writer.println("================================="); writer.println("size : ".concat(String.valueOf(set.size()))); for (Iterator iterator = set.iterator(); iterator.hasNext(); ) { writer.println(iterator.next()); } writer.close(); } private static CloseableHttpResponse request(CloseableHttpClient httpClient, HttpGet httpget) throws IOException { CloseableHttpResponse response = httpClient.execute(httpget); try { HttpEntity entity = response.getEntity(); // String text = getText(entity.getContent(), Charset.defaultCharset()); String json = text.replaceAll("jsonp_queryMoreNums\\((.*)\\);", "$1"); Map<String, Object> decode = JSONDecoder.decode(json); List moreNumArray = (List) decode.get("moreNumArray"); int size = moreNumArray.size() / 7; phoneNumberSize.addAndGet(size); for (int i = 0; i < size; i++) { String phoneNo = moreNumArray.get(i * 7).toString(); if (/*AAAA*/phoneNo.matches("\\d*(\\d)\\1{3,}\\d*")) { AAAA.add(phoneNo); } else if (/*AAA*/phoneNo.matches("\\d*(\\d)\\1{2,}\\d*")) { AAA.add(phoneNo); } else if (/*AABB*/phoneNo.matches("\\d*(\\d)\\1(\\d)\\2\\d*")) { AABB.add(phoneNo); } else if (/*ABCD*/phoneNo.matches("\\d*(?:(?:0(?=1)|1(?=2)|2(?=3)|3(?=4)|4(?=5)|5(?=6)|6(?=7)|7(?=8)|8(?=9)){3,})\\d*")) { ABCD.add(phoneNo); } else if (/*DCBA*/phoneNo.matches("\\d*(?:9(?=8)|8(?=7)|7(?=6)|6(?=5)|5(?=4)|4(?=3)|3(?=2)|2(?=1)|1(?=0)){3,}\\d*")) { DCBA.add(phoneNo); } else if (/*ABC*/phoneNo.matches("\\d*(?:(?:0(?=1)|1(?=2)|2(?=3)|3(?=4)|4(?=5)|5(?=6)|6(?=7)|7(?=8)|8(?=9)){2,})\\d*")) { ABC.add(phoneNo); } else if (/*CBA*/phoneNo.matches("\\d*(?:9(?=8)|8(?=7)|7(?=6)|6(?=5)|5(?=4)|4(?=3)|3(?=2)|2(?=1)|1(?=0)){2,}\\d*")) { CBA.add(phoneNo); } else if (!phoneNo.matches("\\d*4\\d*")/*NO4*/) { NO4.add(phoneNo); } } } finally { response.close(); } return response; } private static void print() { System.out.println("report:"); System.out.println("================================="); System.out.println("size : ".concat(phoneNumberSize.toString())); System.out.println("\tAAAA : ".concat(String.valueOf(AAAA.size()))); System.out.println("\tAAA : ".concat(String.valueOf(AAA.size()))); System.out.println("\tAABB : ".concat(String.valueOf(AABB.size()))); System.out.println("\tABCD : ".concat(String.valueOf(ABCD.size()))); System.out.println("\tABC : ".concat(String.valueOf(ABC.size()))); System.out.println("\tDCBA : ".concat(String.valueOf(DCBA.size()))); System.out.println("\tCBA : ".concat(String.valueOf(CBA.size()))); } public final static String getText(InputStream inputStream, Charset charset) throws IOException { StringBuilder text = new StringBuilder(); try { BufferedReader read = new BufferedReader(new InputStreamReader(inputStream, charset.name())); String line = null; while ((line = read.readLine()) != null) { text.append(line); } } finally { if (inputStream != null) { inputStream.close(); } } return text.toString(); }}
採集了100次,很快完成了,儲存了本地檔案,檔案如下:
650) this.width=650;" src="http://s3.51cto.com/wyfs02/M01/43/B0/wKioL1PbxTTiYKErAADOSlSScfk484.jpg" title="螢幕快照 2014-08-02 上午12.44.05.png" alt="wKioL1PbxTTiYKErAADOSlSScfk484.jpg" />
開啟檔案一看,分類完成.
650) this.width=650;" src="http://s3.51cto.com/wyfs02/M02/43/AF/wKiom1PbxGiTZfveAADYSBscZ8U033.jpg" title="C0569942-70CE-4EE7-A7E6-12B0A2F82150.png" alt="wKiom1PbxGiTZfveAADYSBscZ8U033.jpg" />
興高采烈的去聯通網上營業廳搜尋.結果發現沒有抓取到的號碼,也就是被篩選掉了(噴血),而後查看了他網頁的實現,有發現元素紀錄著號碼資訊,也就是numid,也嘗試了提交訂單,提交時有號碼id做校正,困了,也不蛋疼了,睡覺!
<p style="display:none;" id="numInfo" numid="numIdVal18515291341" num="185 1529 1341" price="<span>¥0</span> " monfee="0" nummemo="號碼要求月承諾消費0元" numprefee="0" numisnicerule="0" numlevel="0" montime="0"></p>
650) this.width=650;" src="http://s3.51cto.com/wyfs02/M01/43/B0/wKioL1PbxerhsWa-AAC8q66kHj8048.jpg" title="8327A950-F890-435A-838E-A0A9A2C32E3F.png" alt="wKioL1PbxerhsWa-AAC8q66kHj8048.jpg" />
本文出自 “以銅為鏡” 部落格,請務必保留此出處http://tangoo.blog.51cto.com/9130178/1534024