php 如何精準擷取網站中的所有超連結?

來源:互聯網
上載者:User
想擷取網站中的所有超連結,使用的是php snoopy類

$sourceURL = $url;$snoopy->fetchlinks($sourceURL);$content = $snoopy->results;

擷取的結果如下:

array (size=627)  0 => string 'http://www.alibaba.com/https://login.alibaba.com/' (length=49)  1 => string 'http://sh.vip.alibaba.com?tracelog=nav_ma' (length=41)  2 => string 'http://message.alibaba.com/feedback/default.htm?routeto=inbox&tracelog=nav_ma_mc' (length=80)  3 => string 'http://www.alibaba.com//hz-favorite.alibaba.com/favorite/favorite_home.htm?tracelog=nav_ma_fav' (length=94)  4 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_myalibaba' (length=57)  5 => string 'http://hz.sourcing.alibaba.com/rfq/request/rfq_manage_list.htm?tracelog=nav_ma_mana_rfq' (length=87)  6 => string 'http://biz.alibaba.com/generalorders/list_orders.htm?tracelog=ma_mana_orders' (length=76)  7 => string 'http://sh.vip.alibaba.com/product/post_product_interface.htm?tracelog=newschp_nav_madp' (length=86)  8 => string 'http://sh.vip.alibaba.com/product/manage_products.htm?tracelog=newschp_nav_mamng' (length=80)  9 => string 'http://hz.sourcing.alibaba.com/rfq/quotation/rfq_not_quoted_manage_list.htm?nav_ma_rec_rfqs' (length=91)  10 => string 'http://www.alibaba.com/javascript:;' (length=35)  11 => string 'http://www.alibaba.com/Products?tracelog=beacon_cate_140704' (length=59)  12 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_forbuyers' (length=57)  13 => string 'http://globalexpo.alibaba.com?tracelog=beacon_expo_150820' (length=57)  14 => string 'http://wholesale.alibaba.com?tracelog=nav_ws' (length=44)  15 => string 'http://buyer.alibaba.com/bizid_buyer?tracelog=nav_bi' (length=52)  16 => string 'http://tradeassurance.alibaba.com/bao/buyer_advertise.htm?tracelog=from_home_menu' (length=81)  17 => string 'http://activities.alibaba.com/alibaba/secure-payment.php?tracelog=beacon_payment_150114' (length=87)  18 => string 'http://ecredit.alibaba.com/ecl/buyer.htm?tracelog=beacon_credit_140704' (length=70)  19 => string 'http://inspection.alibaba.com/?tracelog=beacon_is_140704' (length=56)  20 => string 'http://buyer.alibaba.com/intelligence?tracelog=beacon_ti_140704' (length=63)  21 => string 'http://buyer.alibaba.com/forum?tracelog=beacon_df_140704' (length=56)  22 => string 'http://ask.alibaba.com/?tracelog=beacon_ta_140704' (length=49)  23 => string 'http://www.alibaba.com/javascript:;' (length=35)  24 => string 'http://seller.alibaba.com/memberships/index.html?tracelog=seller_channel_member_hp_header' (length=89)  25 => string 'http://seller.alibaba.com/learningcenter?tracelog=seller_channel_lc_hp_header' (length=77)  26 => string 'http://seller.alibaba.com/training.htm?tracelog=seller_channel_training_hp_header' (length=81)  27 => string 'http://sourcing.alibaba.com/?tracelog=newschp_nav_narfq' (length=55)  28 => string 'http://www.alibaba.com/javascript:;' (length=35)

怎麼能把“http://www.alibaba.com/javascript:;”類似的URL去掉?

回複內容:

想擷取網站中的所有超連結,使用的是php snoopy類

$sourceURL = $url;$snoopy->fetchlinks($sourceURL);$content = $snoopy->results;

擷取的結果如下:

array (size=627)  0 => string 'http://www.alibaba.com/https://login.alibaba.com/' (length=49)  1 => string 'http://sh.vip.alibaba.com?tracelog=nav_ma' (length=41)  2 => string 'http://message.alibaba.com/feedback/default.htm?routeto=inbox&tracelog=nav_ma_mc' (length=80)  3 => string 'http://www.alibaba.com//hz-favorite.alibaba.com/favorite/favorite_home.htm?tracelog=nav_ma_fav' (length=94)  4 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_myalibaba' (length=57)  5 => string 'http://hz.sourcing.alibaba.com/rfq/request/rfq_manage_list.htm?tracelog=nav_ma_mana_rfq' (length=87)  6 => string 'http://biz.alibaba.com/generalorders/list_orders.htm?tracelog=ma_mana_orders' (length=76)  7 => string 'http://sh.vip.alibaba.com/product/post_product_interface.htm?tracelog=newschp_nav_madp' (length=86)  8 => string 'http://sh.vip.alibaba.com/product/manage_products.htm?tracelog=newschp_nav_mamng' (length=80)  9 => string 'http://hz.sourcing.alibaba.com/rfq/quotation/rfq_not_quoted_manage_list.htm?nav_ma_rec_rfqs' (length=91)  10 => string 'http://www.alibaba.com/javascript:;' (length=35)  11 => string 'http://www.alibaba.com/Products?tracelog=beacon_cate_140704' (length=59)  12 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_forbuyers' (length=57)  13 => string 'http://globalexpo.alibaba.com?tracelog=beacon_expo_150820' (length=57)  14 => string 'http://wholesale.alibaba.com?tracelog=nav_ws' (length=44)  15 => string 'http://buyer.alibaba.com/bizid_buyer?tracelog=nav_bi' (length=52)  16 => string 'http://tradeassurance.alibaba.com/bao/buyer_advertise.htm?tracelog=from_home_menu' (length=81)  17 => string 'http://activities.alibaba.com/alibaba/secure-payment.php?tracelog=beacon_payment_150114' (length=87)  18 => string 'http://ecredit.alibaba.com/ecl/buyer.htm?tracelog=beacon_credit_140704' (length=70)  19 => string 'http://inspection.alibaba.com/?tracelog=beacon_is_140704' (length=56)  20 => string 'http://buyer.alibaba.com/intelligence?tracelog=beacon_ti_140704' (length=63)  21 => string 'http://buyer.alibaba.com/forum?tracelog=beacon_df_140704' (length=56)  22 => string 'http://ask.alibaba.com/?tracelog=beacon_ta_140704' (length=49)  23 => string 'http://www.alibaba.com/javascript:;' (length=35)  24 => string 'http://seller.alibaba.com/memberships/index.html?tracelog=seller_channel_member_hp_header' (length=89)  25 => string 'http://seller.alibaba.com/learningcenter?tracelog=seller_channel_lc_hp_header' (length=77)  26 => string 'http://seller.alibaba.com/training.htm?tracelog=seller_channel_training_hp_header' (length=81)  27 => string 'http://sourcing.alibaba.com/?tracelog=newschp_nav_narfq' (length=55)  28 => string 'http://www.alibaba.com/javascript:;' (length=35)

怎麼能把“http://www.alibaba.com/javascript:;”類似的URL去掉?

QueryList

 ['img','src']])->data;//列印結果print_r($data);//採集某頁面所有的超連結$data = QueryList::Query('http://cms.querylist.cc/google/list_1.html',['link' => ['a','href']])->data;//列印結果print_r($data);

http://git.oschina.net/jae/QueryList
可以看下這個,比snoopy要強大一些,支援jquery選取器文法

  • 相關文章

    聯繫我們

    該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

    如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.