Whimsy for blog Park recruitment information

Source: Internet
Author: User

I have just left my office recently. In my spare time, I am still wondering how to proceed in addition to taking a good rest. I thought that there was not a recruitment sub-station in the blog Park. Then I went to visit.But the problem is coming soon. Well, yes, as you may think, there are too many recruitment entries. Although Dudu offers a lot of tags that can be filtered by category, however, the filtered information is still large. In addition, I have been paying attention to the recruitment information for more than two years, so I am not quite clear about what I have said in the recruitment process and what direction I prefer.

Why not write a tool to summarize the information? Well, this articleArticleThis is the original starting point.I think it should be very interesting. Just do what you say. Let's simply think about it first.ProgramProbably.

The core process is like this:

Capture page data-> convert data to raw data (pickitem is generated for the first time)-> traverse the converted data to a more specific data object (parseitem is generated for the second time) -> filter and count the data after the second conversion.-> display the result.

The core business objects include:

Capture the picker of the web page, parse the parser of the web page, filter the data, and count the counter used in the final statistics.

You can also design more details and get a few more business objects to achieve the ultimate Oo, system stability and scalability can also be considered in depth, haha, if you think about this program, you can't do it. Let's write down these core functions first.

Here I will take Picker as an example to briefly describeCode:

 1   ///   <Summary>  2   ///  Capture webpages  3  ///   </Summary>  4   Public   Class  Picker  5   {  6       Public Ienumerable <pickitem> Pickpage (pickrule rule)  7   {  8           Return Innerget (rule). selectiterator (P =>{  9                                                   VaR Items = Rule. dopick (P );  10                                                   Return Items = Null ? Enumerable. Empty <pickitem> (): Items;  11   }). Tolist ();  12   }  13   14       Private Ienumerable  Innerget (pickrule rule)  15   {  16           VaR Currenturl = Rule. starturl;  17           Do  18   {  19               Yield   Return  Htmlhelper. gethtmldocument (currenturl, rule. pageencode );  20 Currenturl = Rule. calccurrenturl (currenturl );  21   22 } While (Currenturl! = String  . Empty );  23   }  24 }

The code is very simple. The captured task is handed over to the htmlhelper object (a simple encapsulation for the third-party library htmlagilitypack. The specific crawling rule is encapsulated in an object called pickrule. The pickrule code is as follows:

View code

Most of the other parts of the program use this design idea. Finally, let's take a look at the code of the "Main" function as the driver, as follows:

 1   Protected   Void Run_click ( Object  Sender, eventargs E)  2   {  3 Picker picker = New  Picker ();  4 Pickrule = New  Pickrule_cnblogs (); 5       VaR Pages = Picker. pickpage (pickrule );  6   7 Parser = New  Parser ();  8 Parserule = New  Parserule_cnblogs ();  9       VaR Parseditems = parser. parsepage (pages, parserule, 500  ); 10   11 Filter filter = New Filter (P => P! = Null & P. positioncategory. tolower () = "  . Net programmer  "  );  12       VaR Jobs = Filter. filting (parseditems). tolist ();  13   14 Counter = New Counter ();  15       VaR Result = Counter. Counting (jobs );  16   17       //  Print report  18       Foreach ( VaR Item In Result. orderbydescending (P => P. item2). Take ( 10  ))  19  {  20 Piedata. append ( String . Format ( "  ['{0}', {1}],  "  , Item. Item1, item. item2.tostring ()));  21   }  22 }

We can clearly see the workflow mentioned above :)

Now let's take a look at the actual running results. The first is a complete set of statistics for. Net programmers:

Let's take a look at whether there are different requirements for. Net programmer skills in Shanghai and Chengdu:

Is it quite intuitive? No :)

 

PS:

1. Although simple, the program encountered a lot of trouble in the actual writing process. For example, in the second Information Extraction Transformation, The positionrequire attribute of the parseitem object is also a "job requirement. The initial design was to use the word segmentation component to perform word segmentation on the information, and to extract valid key information and assign it to the positionrequire attribute. In the code, it is as follows:

Parseitem1.positionrequire = new pangu (). segment ("information about the job requirement description on the crawled HTML page ");

After some processing, positionrequire may be like this: ", net", "Asp.net", "MVC", and so on; however, this method relies heavily on the word splitting effect of the word divider. (You may need to add additional auxiliary code to meet my requirements after reading relevant documents ), therefore, we finally implemented a simple dictionary list and a glossary ing table, which were processed in the order of traversal.

2. I think Dudu can consider adding such a data statistics analysis function to the recruitment sub-station. Starting with Dudu can solve the problem better and more elegantly. For example, job-related information can be defined as formatted information in advance, and further analysis and processing can be performed on formatted data in the future.

3. Well... According to the statistical results, I should supplement my knowledge in Asp.net MVC :)

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.