Recently, I have been recruiting students for an exam on campus.

Source: Internet
Author: User

My question is a question that tests the encoding and instant learning abilities.

Candidates are generally master students in Beijing Science and Engineering Colleges.

 

Machine test questions:

 

James liked variety shows very much. He found that there was a user called Qiqi variety shows on Youku to upload many variety shows. So he wants to monitor the updated information of Kiki variety. His requirements include:

 

1.
Collect information about all the videos uploaded by Kiki variety.

2.
If "Kiki variety shows" releases a new film, James needs to be notified as soon as possible.

3.
The video information includes two parts: the video name and
Playback address.

Such as filmSeoul cool travel 20101031The playback address of is

Http://v.youku.com/v_show/id_XMjE5MTQ1NzE2.html

 

Compile this program for James.

 

Specific requirements:

 

1.
Due to limited time, you only need to capture the first 100 videos and think that you have completed the "collecting all uploaded Videos" task.

2.
Enter the video information to be captured in the ultimate file (movies.txt) in the format of one row for each video, including the video name and playback address, separated by a tab (/T.

3.
The end of the token.

 

TIPS:

1.
You can.

2.
You have up to 4 hours of coding time, But finishing ahead of time will increase your final score.

3.
You can search for required information on the Internet at will, but it is strictly prohibited to discuss with others or post for consultation on the forum.

 

 

 

 

 

 

Current situation:

 

 

The general situation of students performing the test is not ideal. The general performance is as follows:

 

The focus is on the low logic of the Code, followed by the speed of obtaining relevant knowledge and real-time learning (such as regular expressions and web page crawling), exception handling (robustness), and question review.

 

 

For example, the following is a part of the code written by a student. // <is my comment

 

Package main; <br/> Import Java. io. bufferedreader; <br/> Import Java. io. inputstream; <br/> Import Java. io. inputstreamreader; <br/> Import java.net. httpurlconnection; <br/> Import java.net. URL; <br/> Import Java. util. calendar; <br/> Import Java. util. date; <br/> Import util. contentutil; <br/> public class task implements runnable {<br/> string httpurl; <br/> filehandler FH; <br/> task (string httpurl, filehandle R FH) {<br/> This. httpurl = httpurl; <br/> This. FH = FH; <br/>}< br/> Public void run () {<br/> // todo auto-generated method stub <br/> system. out. println (calendar. getinstance (). gettime (); <br/> gethtmlreadline (httpurl, FH, true); // <why must I flip the page? <Br/>}< br/>/** <br/> * @ Param httpurl <br/> * @ Param FH <br/> * @ Param searchnextpage: search the videos at other pages or not <br/> */<br/> Public int gethtmlreadline (string httpurl, filehandler FH, Boolean searchnextpage) {<br/> /// <evaluate the function: <br/> /// <1. the regular expression is used, but only the basic match function is used. In fact, you can directly use a regular expression to retrieve all data. <Br/> /// <includes the video name, video URL, and flip address. <br/> /// <2. the logic of whether to continue crawling (turning pages) is incorrect. It should be that the task is completed after the duplicate data is captured. Otherwise, the page is crawled. <Br/> // <not only one page flip. (It is possible that the user updates n videos at a time and occupies multiple pages, which will lead to missing information); <br/> /// <3. coupled with the page capture itself and business logic, you can directly extract all the content of the entire page and then perform logic (do not read the rows <br/> // <coupling in PAGE analysis. <Br/> string currentline = ""; <br/> inputstream urlstream; <br/> int newarrival = 0; <br/> try {<br/> URL url = new URL (httpurl); <br/> httpurlconnection connection = (httpurlconnection) URL. openconnection (); <br/> connection. connect (); <br/> urlstream = connection. getinputstream (); <br/> bufferedreader reader = new bufferedreader (New inputstreamreader (urlstream, "UTF-8"); <br/> Video video; // <it is not necessary to declare, Refer to the proximity principle <br/> Boolean alreadyfindvideo = false; // only search the other pages after already analyze the videos. </P> <p> // <parse the content of each row <br/> while (currentline = reader. readline ())! = NULL) {<br/> currentline = currentline. trim (); <br/> If (contentutil. matchtitle (currentline) {// <match row <br/> alreadyfindvideo = true; <br/> Video = contentutil. analizevideo (currentline); // <parse HTML <br/> If (! FH. existurl (video) {// <no file or cache <br/> system. out. println ("New Arrival:"); <br/> system. out. print (video); <br/> FH. add (video); <br/> newarrival ++; <br/>}< br/>} else if (alreadyfindvideo & searchnextpage & contentutil. matchpages (currentline) {<br/> string nextpage = contentutil. analizepage (currentline); // <resolve the next page address <br/> // system. out. println ("Search next page at:" + nextpage); <br/> newarrival + = Gethtmlreadline (nextpage, FH, false); <br/> // <page flip: Why must it be false? <Br/> // <recursion? In this way, the resources in the stack are not released. Do you want to put them on the upper layer for scheduling? <Br/>}< br/>} catch (exception e) {<br/> E. printstacktrace (); // <the exception is not handled <br/> /// <What is the possibility of an exception? <Br/> /// <the most likely reason is that webpage capture times out. In this case, try again to capture the webpage. <Br/>}< br/> If (searchnextpage & newarrival = 0) // <if no new one exists, do not capture it? <Br/> system. Out. println ("no new arrival! "); <Br/> return newarrival; <br/>}< br/> 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.