ios-Network Practical Technology OC & web crawler-Crawl network data using the Java language

Source: Internet
Author: User
Tags dota

Web crawler-Crawl network data using the Java language

Prerequisites: Familiar with Java syntax (can read on the line)

    • Prep phase: Get HTML code from a Web page
    • Actual combat phase: the corresponding HTML code is parsed in the Java language, and finally saved to the plist file

The previous article has been introduced we can use two ways to crawl the network data Crawler, and roughly introduce how to use regular expressions to achieve data capture

As I have learned a period of time Java and Android-related technologies, today on how to use Java to crawl network data, about Python has the opportunity to do a good job of the author to share, but in fact, will be a kind, unless your needs are very strong or want to install force.

A: Preparation stage------"Get HTML code

1: Open the page you want to get the data, use Firefox open (because he has an artifact called: Firebug, about Firebug here to say, anyway for web development, she is the artifact), here we use is the DotA home hero introduction.

Let's start by looking at the data we need.

2: Because in the Web development is also sub-module development, so a certain area in the HTML corresponding to the corresponding HTML code module, so we choose a small interface in the module as an exercise.

Find the corresponding module and right-click to view the element in Firebug

(Make sure that you have installed Firebug, that you do not have an add-on installed in the Tools tab of the Firefox toolbar to search for and download the installation)

This time the page will show the corresponding module HTML code, we need to find the data we want to get the corresponding HTML module code, right click to copy the HTML we need.

3: In the interface simply an HTML file to paste the copy good HTML code into the file, and then need to lose the HTML related knowledge, is to supplement the data in the HTML, so that he became a complete HTML file

Note The coding here: the UTF-8 format is commonly used in development

Second: The actual combat stage--"Crawl HTML (Web data)

Then we have to formally start, the back is the focus, the front is a fool is the operation (after the need to have a lost Java or Android development of the foundation, of course, does not have a relationship, the author will complete the introduction process)

1: Open Eclipse

Create a new Java project, and click Src in the project to create a new class specifically for data parsing

2: After the new project, we will use one of our Java jar packages, which are designed to crawl network data: The download link will be given after the downloading of the package.

Import the downloaded jar package into the Java project.

Then we need to add him to the build path (here is a common sense, the Java jar package is not added to the build path is not available, after adding the corresponding coffee icon will become a bottle)

The display after success

The following begins to use Java to formally crawl and parse HTML (Web page) data

Write the HTML Data parsing code in the Java project according to the following image: (note the steps inside)

Java Core Code:

1 try {2  3//file path 4  5 String Path = "/users/icocos/desktop/icocos.html"; 6  7   8  9 Loading Web page Document doc = jsoup.parse (new File (path), "UTF-8");  14 15//Parse Web page         Lements lis = doc.select ("li");  20 21//traversal array for (int i = 0; i < eles.size (); i++) {24 25 The corresponding element is obtained according to I. Li = Lis.get (i);  30 31//Take picture element img = Li.sele CT ("img"). Get (0); System.out.println (img.attr ("src"));  38 39//Get picture name. Imgname = img.attr ("src");  p = li.select ("P"). Get (0); String personname = P.te XT ();  50 51//Print Data System.out.println (imgname + "," + personname);  56 57}5 8. Catch {60 61//Error (Exception) handling E.printstacktrace (); 64 65}

When you click Run, eclipse will output the corresponding information according to your code.

But this time I can not directly use the data so I need to make some corresponding changes in the Java code, so that the output data can be copied directly and read into the plist, in fact, is the array or dictionary data

We know the most common way to read data from a plist file in iOS development, but you can also use other methods, but it's no easier than that.

I'll make some adjustments to the Java code below.

1: Enter this line of code before the for loop for data stitching and data

    • System.out.println ("Nsarray *apps = @[");

2: Next you need to enter the predicted end of the stitching array after the For loop

    • System.out.println ("]");

3: In the last side of the For loop we need to use the above code to stitch each loop and use commas to do the corresponding segmentation

    • System.out.println ("@{@\" name\ ": @\" "+ personname +" \ ", @\" icon\ ": @\" "+ imgname +" \ "},");

The final complete Java gets and parses the HTML data as follows;

 1 public class Icocos {2 3 public static void main (string[] args) {4 5 try {6 7//File path 8 9 String Path = "/users/icocos/desktop/icocos.html"; 10 11//Load Web page Document doc = jsoup.parse (path), "UTF-8"        ); 13 14//Parse page Elements lis = doc.select ("Li"), System.out.println ("Nsarray *apps = @["); 18 19 20 Iterate through the array for (int i = 0; i < eles.size (); i++) {23 24//based on I get corresponding elements of element li = Lis.get (i); 27 28 29//Take picture of Element img = Li.select ("img"). Get (0); +//System.out.println (img.attr ("src"       ); 34 35//Get the picture name of the Panax notoginseng String imgname = img.attr ("src"); Element p = li.select ("P"). Get (0); 40 41 String personname = P.text (); 42 43//Print data//SYSTEM.OUT.PRINTLN (Imgname + "," + personname); Ystem.out.println ("@{@\" name\ ": @\" "+ personname +" \ ", @\" icon\ ": @\" "+ imgname +" \ "},");}48 System.out.pr Intln ("]");Atch {51 52//Error (Exception) handling E.printstacktrace (); 54 55}56 57}58 59} 

This time back to print out the following code,

Below we need to do things in Xcode, do what, is the output of all the data beginning with Nsarray in Xcode into plist data, of course you can also do not convert, do a few changes directly using JSON parsing technology to parse, But that's not the best way.

Create a new project in Xcode, paste the copied code in Viewdidload, this time it looks very familiar, right, she is me? Array data commonly used in development.

I'll use loop traversal to tell the Nsarray array data to write to the plist file.

 1-(void) Viewdidload 2 3 {4 5 [super Viewdidload]; 6 7 8 9 Nsarray *apps = @[10 one @{@ "name": @ "enemy Mage" @ " Icon ": @" http://dotadb.uuu9.com/UploadFiles/Dota/Hero/dfss.jpg "},12 @{@" name ": @" Musket "@" icon "@" http:// Dotadb.uuu9.com/uploadfiles/dota/hero/arjj.jpg "},14 @{@" name ":" Druid "@" icon ": @" http://dotadb.uuu9.com/ Uploadfiles/dota/hero/dlyy.jpg "},16" @{@ "name": @ "Month Ride" @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/ Yzqs.jpg "},18 @{@" name ": @" Variant Sprite "@" icon ": @" http://dotadb.uuu9.com/UploadFiles/Dota/Hero/btjl.jpg "},20 21 @{@" Name ": @" Naga Demon "@" icon ": @" http://dotadb.uuu9.com/UploadFiles/Dota/Hero/njhy.gif "},22 @{@" name ": @" Monkey "@" icon ": @" Http://dotadb.uuu9.com/UploadFiles/Dota/Hero/hycm.jpg "},24 @{@" name ": @" White tiger "@" icon ": @" http://dotadb.uuu9.com/ Uploadfiles/dota/hero/yzjs.jpg "},26 @{@" name ": @" invisible Assassin "@" icon ": @" http://dotadb.uuu9.com/UploadFiles/Dota/Hero/ Yxck.jpg "},28 @{@" name ": @" troll "@" icon ": @" Http://dotadb.uuu9.com/UploadFiles/Dota/Hero/jmzj.jpg "},30 @{@" name ": @" Helicopter "@" icon ": @" http://dotadb.uuu9.com/UploadFiles/Dota/Hero/arzs.jpg "},32 33 @{@" Name ": @" bounty hunter, @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/Naka.gif"},34 @{@ "name": @ "Skeleton shooter" @ "icon": @ "Http://dotadb.uuu9.com/UploadFiles/Dota/Hero/KLSS.gif"},36 PNs @{@ "name": @ "female spider" @ "icon": @ "/http Dotadb.uuu9.com/uploadfiles/dota/hero/ymzz.gif "},38" @{@ "name": @ "blood Demon" @ "icon": @ "http://dotadb.uuu9.com/ Uploadfiles/dota/hero/xm.gif "},40" @{@ "name": @ "Dark Ranger" @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/ Nbrn.gif "},42" @{@ "name": @ "void Mask" @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/EC45.gif"},44 45 @{@ " Name: @ "snake hair Banshee" @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/H00V.gif"},46 @{@ "name": @ "bu", @ "icon": @ "Http://dotadb.uuu9.com/UploadFiles/Dota/Hero/H00I.gif"},48 @{@ "name": @ "crypt Assassin" @ "icon" @ "http:// Dotadb.uuu9.com/uploadfiles/dota/hero/dxck.gif "},50" @{@ "name": @ "Ant", @ "icon": @ "Http://dotadb.uuu9.com/UploadFiles/dota/hero/dxbz.gif "},52" @{@ "name": @ "phantom Assassin" @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/ Hyck.gif "},54 @{@" name ": @" Lightning Ghost "@" icon ": @" http://dotadb.uuu9.com/UploadFiles/Dota/Hero/E002.gif "},56 57 @{@" Name: @ "Shadow", @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/YM01.gif"},58 @{@ "name": @ "small fish man" @ "icon": @ " Http://dotadb.uuu9.com/UploadFiles/Dota/Hero/yryx.gif "},60" @{@ "name": @ "ghost", @ "icon": @ "http://dotadb.uuu9.com/ Uploadfiles/dota/hero/yg1.gif "},62 @{@" Name: @ "Templar Assassin" @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/ E01y.gif "},64" @{@ "name": @ "Soul Guard" @ "icon": @ "Http://dotadb.uuu9.com/UploadFiles/Dota/Hero/LHSW.gif"},66 67 @{@ " Name: @ "Bear Warrior" @ "icon": @ "http://dotadb.uuu9.com/UploadFiles/Dota/Hero/Huth.gif"},68 @{@ "name": @ "Poison Warlock" @ "icon": @ "Http://dotadb.uuu9.com/UploadFiles/Dota/Hero/JDSS.gif"},70 @{@ "name": @ "underworld dragon" @ "icon": @ "/http Dotadb.uuu9.com/uploadfiles/dota/hero/mjyl.gif "},72 @{@" Name: @ "Soul of revenge" @ "icon": @ "http://dotadb. uuu9.com/uploadfiles/dota/hero/hvwd.jpg "},74 @{@" name ": @" Sword Saint "@" icon ": @" http://dotadb.uuu9.com/UploadFiles/ Dota/hero/jsjs.jpg "}76];78 [Newapps writetofile:@"/users/icocos/desktop/apps.plist "Atomically:YES] ;

At this point, I have this plist file on the left, and you'll see it when you open it.

The final step is to download the image,

Simple Write

1//For    (Nsdictionary *dict in apps) {2  3//        nsstring *icon = dict[@ "icon"]; 4  5//         6  7//        / /new network Picture URL path 8  9//        Nsurl *url = [Nsurl urlwithstring:icon];10 one/////////        download binary data for image//
   nsdata *data = [NSData datawithcontentsofurl:url];16 +//+///////////=//        nsstring *filename = [icon lastpathcomponent];22////////////        nsstring *iconpath = [NSString stringwithform at:@ "/users/icocos/desktop/icons/%@", filename];28///        [Data Writetofile:iconpath Atomically:yes ];32//    }

Since the image name in plist is used for the last name of the corresponding link, we cannot use the method above, and we need to do some processing to really use

1     nsmutablearray *newapps = [Nsmutablearray array]; 2  3 for     (Nsdictionary *dict in apps) {4  5         Nsmuta Bledictionary *newdict = [Nsmutabledictionary dictionary]; 6  7         newdict[@ "name"] = dict[@ "name"]; 8  9         newdict[@ "icon"] = [dict[@ "icon"] lastpathcomponent];10 11< C10/>[newapps addobject:newdict];12     }14     [Newapps writetofile:@ "/users/icocos/desktop/ Apps.plist "atomically:yes];18  20 21}

After the picture has been downloaded, you will see a lot of pictures in the corresponding folder quickly.

At this point we get a copy and the Web page want to relatively complete plist data, what we want to do is to display our plist data to the interface, after I do not introduce, in detail, see: plist file Read

Finally summarize, if later encountered about the need to crawl network data to achieve the function of the web crawler, we basically think of the first method is to use the Java language, of course, companies generally do not have such requirements, companies are generally using their own server API to achieve, except for special cases.

Of course you can also use regular expressions or python, about regular expressions is relatively difficult, mainly the details are more. And Python I have not studied, there is a chance to try, if you have any good way to welcome the author contact, learn from each other and discuss.

We can basically follow the above ideas to achieve, only need to do a small part of the changes, here roughly say

    • 1: Preparation phase Depending on the data you need will have different HTML generated
    • 2: After generating different HTML, your HTML structure will respond to changes, this time you have to go to the Java core code to understand it is no problem, the most important is here.
    • 3: Write to the plist file according to the corresponding Nsarray data, this is the technology commonly used in IOS development I don't say much.

ios-Network Practical Technology OC & web crawler-Crawl network data using the Java language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.