Whimsy for blog Park recruitment information

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have just left my office recently. In my spare time, I am still wondering how to proceed in addition to taking a good rest. I thought that there was not a recruitment sub-station in the blog Park. Then I went to visit.But the problem is coming soon. Well, yes, as you may think, there are too many recruitment entries. Although Dudu offers a lot of tags that can be filtered by category, however, the filtered information is still large. In addition, I have been paying attention to the recruitment information for more than two years, so I am not quite clear about what I have said in the recruitment process and what direction I prefer.

Why not write a tool to summarize the information? Well, this articleArticleThis is the original starting point.I think it should be very interesting. Just do what you say. Let's simply think about it first.ProgramProbably.

The core process is like this:

Capture page data-> convert data to raw data (pickitem is generated for the first time)-> traverse the converted data to a more specific data object (parseitem is generated for the second time) -> filter and count the data after the second conversion.-> display the result.

The core business objects include:

Capture the picker of the web page, parse the parser of the web page, filter the data, and count the counter used in the final statistics.

You can also design more details and get a few more business objects to achieve the ultimate Oo, system stability and scalability can also be considered in depth, haha, if you think about this program, you can't do it. Let's write down these core functions first.

Here I will take Picker as an example to briefly describeCode:

 1   ///   <Summary>  2   ///  Capture webpages  3  ///   </Summary>  4   Public   Class  Picker  5   {  6       Public Ienumerable <pickitem> Pickpage (pickrule rule)  7   {  8           Return Innerget (rule). selectiterator (P =>{  9                                                   VaR Items = Rule. dopick (P );  10                                                   Return Items = Null ? Enumerable. Empty <pickitem> (): Items;  11   }). Tolist ();  12   }  13   14       Private Ienumerable  Innerget (pickrule rule)  15   {  16           VaR Currenturl = Rule. starturl;  17           Do  18   {  19               Yield   Return  Htmlhelper. gethtmldocument (currenturl, rule. pageencode );  20 Currenturl = Rule. calccurrenturl (currenturl );  21   22 } While (Currenturl! = String  . Empty );  23   }  24 }

The code is very simple. The captured task is handed over to the htmlhelper object (a simple encapsulation for the third-party library htmlagilitypack. The specific crawling rule is encapsulated in an object called pickrule. The pickrule code is as follows:

View code

Most of the other parts of the program use this design idea. Finally, let's take a look at the code of the "Main" function as the driver, as follows:

 1   Protected   Void Run_click ( Object  Sender, eventargs E)  2   {  3 Picker picker = New  Picker ();  4 Pickrule = New  Pickrule_cnblogs (); 5       VaR Pages = Picker. pickpage (pickrule );  6   7 Parser = New  Parser ();  8 Parserule = New  Parserule_cnblogs ();  9       VaR Parseditems = parser. parsepage (pages, parserule, 500  ); 10   11 Filter filter = New Filter (P => P! = Null & P. positioncategory. tolower () = "  . Net programmer  "  );  12       VaR Jobs = Filter. filting (parseditems). tolist ();  13   14 Counter = New Counter ();  15       VaR Result = Counter. Counting (jobs );  16   17       //  Print report  18       Foreach ( VaR Item In Result. orderbydescending (P => P. item2). Take ( 10  ))  19  {  20 Piedata. append ( String . Format ( "  ['{0}', {1}],  "  , Item. Item1, item. item2.tostring ()));  21   }  22 }

We can clearly see the workflow mentioned above :)

Now let's take a look at the actual running results. The first is a complete set of statistics for. Net programmers:

Let's take a look at whether there are different requirements for. Net programmer skills in Shanghai and Chengdu:

Is it quite intuitive? No :)

PS:

1. Although simple, the program encountered a lot of trouble in the actual writing process. For example, in the second Information Extraction Transformation, The positionrequire attribute of the parseitem object is also a "job requirement. The initial design was to use the word segmentation component to perform word segmentation on the information, and to extract valid key information and assign it to the positionrequire attribute. In the code, it is as follows:

Parseitem1.positionrequire = new pangu (). segment ("information about the job requirement description on the crawled HTML page ");

After some processing, positionrequire may be like this: ", net", "Asp.net", "MVC", and so on; however, this method relies heavily on the word splitting effect of the word divider. (You may need to add additional auxiliary code to meet my requirements after reading relevant documents ), therefore, we finally implemented a simple dictionary list and a glossary ing table, which were processed in the order of traversal.

2. I think Dudu can consider adding such a data statistics analysis function to the recruitment sub-station. Starting with Dudu can solve the problem better and more elegantly. For example, job-related information can be defined as formatted information in advance, and further analysis and processing can be performed on formatted data in the future.

3. Well... According to the statistical results, I should supplement my knowledge in Asp.net MVC :)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Whimsy for blog Park recruitment information

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Whimsy for blog Park recruitment information

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support