Python crawler code example for name retrieval

Source: Internet
Author: User
Everyone will encounter one thing in their life and will not care before it appears, but once it comes, they will find it extremely important and need to make a major decision in a short time, that is, give a name to your new baby. The following article describes how to use Python crawlers to give a good name to a child. For more information, see. Everyone will encounter one thing in their life and will not care before it appears, but once it comes, they will find it extremely important and need to make a major decision in a short time, that is, give a name to your new baby. The following article describes how to use Python crawlers to give a good name to a child. For more information, see.

Preface

I believe that every parent has some experience, because a name should be issued within two weeks after the birth of the child (the birth certificate is required). It is estimated that many people are in the same way as me, and at the beginning they are very confused, although I felt that many Chinese characters could be used as a name, I found that it was not a casual thing, as a result, the Dictionary, online search, Tang Poetry and Song dynasties, the Book of Songs, and even martial arts novels are searched everywhere. However, the name obtained for a long time is often opposed by the opinions and opinions of family members, for example, it is difficult to get started, and it is difficult to duplicate names and stress with relatives.

So we went back to the various searches on the internet and found many articles on the Internet, such as "a good name for boys and babies". these articles all gave hundreds and thousands of names at once, you are dazzled and cannot use it. A lot of websites or apps with name tests can provide an 8-character or 5-digit rating by name input. This function is quite helpful for reference, however, either we need to enter names for testing, or these websites or apps have very few names, or cannot meet our needs, such as qualified words, or start charging fees, at the end, I cannot find a useful one.

So I want to make such a program:

  1. The main function is to provide reference for batch names. these names are calculated based on the baby's birthdate;

  2. You can expand the name font by yourself. for example, you can find a batch of good names in the book of songs on the Internet. if you want to see how it works, you can add it;

  3. You can define the words used by the name. for example, some family members are limited. Currently, they are "country", and the name must contain the word "country;

  4. The name list provides a score, so that you can view the name from the high score to the low score;

In this way, you can obtain a list of birthdate 8 characters, Family Tree restrictions, and names that match your child's preferences. the list provides scores for reference, based on this, we can find out the desired name. If you have a new idea, you can add the new name to the dictionary at any time for recalculation.

Code structure of the program

Program configuration entry

The program configuration is as follows:

# Coding: GB18030 "write the configuration here" setting = {}# qualified word. if this value is configured, the word dictionary is used, otherwise, use the multi-character dictionary setting ["limit_world"] = "country" # The surname setting ["name_prefix"] = "li" # Gender, set the value to male or female setting ["sex"] = "male" # province setting ["area_province"] = "Beijing" # City setting ["area_region"] = "Haidian "# the year of the birth of the Gregorian calendar setting ['Year'] = "2017" # The calendar month of birth setting ['month'] = "1" # The calendar day of birth setting ['day'] = "11" # Birthdate's Gregorian hour setting ['hour'] = "11" # Birthdate's Gregorian minute setting ['Minute '] = "11" # result output file name setting ['output _ fname'] = "names_girls_source_xxx.txt"

Based on configuration itemssetting[“limit_world”] The system automatically determines whether to use a single-word or multi-word dictionary:

  1. If this item is set, for example, "country", the program will combine all the words for name calculation. for example, both the names are calculated;

  2. If this item is not set and the null string is kept, the program will only read the double-character dictionary of * _double.txt.

Program principle

This is a simple crawler. You can open the ingress.

If you want to get the score, you need to do two things. One is that the crawler automatically submits the form to obtain the result page, and the other is to extract the score from the result page;

For the first thing, urllib2 can be implemented (the code is in/chinese-name-score/main/get_name_score.py ):

 post_data = urllib.urlencode(params) req = urllib2.urlopen(sys_config.REQUEST_URL, post_data) content = req.read()

Here, params is a parameter dict. in this way, the POST with data is submitted, and the result data is obtained from the content.

Params parameters are set as follows:

Params = {}# date type, 0 indicates the Gregorian calendar, 1 indicates the lunar calendar params ['data _ type'] = "0" params ['Year'] = "% s" % str (user_config.setting ["year"]) params ['month'] = "% s" % str (user_config.setting ["month"]) params ['day'] = "% s" % str (user_config.setting ["day"]) params ['hour'] = "% s" % str (user_config.setting ["hour"]) params ['Minute '] = "% s" % str (user_config.setting ["minute"]) params ['pid '] = "% s" % str (user_config.setting ["area_province"]) params ['CID'] = "% s" % str (user_config.setting ["area_region"]) # welcome to use five lines. 0 indicates automatic analysis, 1 indicates that params ['wxxy'] = "0" params ['x'] = "% s" % (user_config.setting ["name_prefix"]) params ['Ming'] = name_postfix # indicates female. 1 indicates male if user_config.setting ["sex"] = "male": params ['sex'] = "1" else: params ['sex'] = "0" params ['acs'] = "submit" params ['isyz'] = "1"

The second thing is to extract the expected score from the web page. we can use BeautifulSoup4 to implement it. Its syntax is also very simple:

Soup = BeautifulSoup (content, 'HTML. parser ', from_encoding = "GB18030") full_name = get_full_name (name_postfix) # print soup. find (string = re. compile (u "name five-digit rating") for node in soup. find_all ("p", class _ = "chaxun_ B"): node_cont = node. get_text () if U' name five-digit rating 'in node_cont: name_wuge = node. find (string = re. compile (u "name five-digit score") result_data ['wuge _ score '] = name_wuge.next_sibling. B .get_text () if u 'name eight character rating' in node_cont: name_wuge = node. find (string = re. compile (u "") result_data ['bazi _ score '] = name_wuge.next_sibling. B .get_text ()

By using this method, you can parse HTML and extract the scores of eight characters and five cells.

Running result example

1/1287 Li Guojin name eight character score = 61.5 name five character score = 78.6 total score = 140.12/1287 Li guotie name eight character score = 61 name five character score = 89.7 total score = 150.73/1287 Li Guojing name eight character score = 21 name five-digit score = 81.6 total score = 102.64/1287 Li Mingguo name eight-character score = 21 name five-digit score = 90.3 total score = 111.35/1287 Li rouguo name eight-character score = 64 name five-digit score = 78.3 Total score = 142.36/1287 Li Guojing name eight character score = 21 name five character score = 89.8 total score = 110.87/1287 Li guoti name eight character score = 22 name five character score = 87.2 total score = 109.28/ 1287 Li guodeng name eight character score = 21 name five character score = 81.6 total score = 102.69/1287 Li Luoguo name eight character score = 21 name five character score = 83.7 total score = 104.710/1287 Li Guotian name eight character score = 21 name five score = 81.6 total score = 102.611/1287 Li Guotian name eight score = 22 name five score = 83.7 total score = 105.712/1287 Li Guotian name eight score = 22 name five score = 93.7 Total score = 115.7

With these scores, we can sort them as a practical reference.

Reminder

  1. Scores are related to many factors, such as the birth time, the words that have been defined, and the strokes of the words that have been defined. these conditions determine that some names do not have a high score, so do not be affected, find the score that is relatively high;

  2. Currently the program can only capture the content of a website, the address is http://life.httpcn.com/xingming.asp

  3. This list is for reference only. I have read some articles. There are many famous people in history who have very low names and 8-character ratings but all of them have made meritorious deeds. the name does have some influence, but sometimes it is the best to speak out;

  4. After selecting a name from this list, you can check it in Baidu, Renren, and other places to prevent negative people from having duplicate names or having too many people with this name;

  5. The eight-character score is inherited from China, and the five-digit score was invented by Japanese in modern times. sometimes, you can try the Western constellation naming method, what's strange is that the score of an eight character differs greatly from that of a five-character website, which means this is only for reference;

The code in this article has been uploaded to github

Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.