Use Python crawlers to give your child a good name

Source: Internet
Author: User
Preface

I believe every parent has experience, because to be born within two weeks after the birth of a name (need to handle birth certificate), estimated that many people like me, just at the beginning is very confused, although the feeling of Chinese characters are very much casually find a word to do the name, and then found that it is not a casual thing, how to find not suitable, So everywhere through the dictionary, online Search, the Tang and song lyrics, poetry, and even martial arts novels, but think for a long time to get the name, often by the family's opinions and objections, such as not easy, and relatives with the same accent, such as the problem, so caught in the repeated search and negation of the cycle, more and more chaotic.

So we return to the online search, find a lot of on-line "boy beautiful name Daquan" Such articles, these articles at once give hundreds of thousands of names, see dazzling can't use. And there are a lot of the name of the site or app, the input name can give a eight or five-grid rating, such a function feels pretty good can give a reference, but either we need a name for the input to test, either these sites or the app's name is very small, or can not meet our needs such as limited words, You can start charging or you won't find a good one at the end.

So I want to do a program like this:

    1. The main function is to give the batch name to provide reference, these names are combined with the baby's birthday eight figure out;

    2. I can expand the name of the library, such as the online discovery of a number of poems in the good name, want to see how, add in can use;

    3. Can limit the name of the use of words, such as some family spectrum has limited, is currently "state" word generation, the name must have "state" word;

    4. A list of names can give a score, so that the name can be viewed from a high score to a low score after the inverted line;

In this way, you can get a list of names that match your child's birthday, your own family tree, and your preferences, and the list has been given a score for reference, so we can figure out the right name. Of course, if you have new ideas, you can always add new names to the thesaurus and recalculate them.

Code Structure of the program

Code Description:

  • /chinese-name-score Code root directory

  • /chinese-name-score/main Code Catalog

  • /chinese-name-score/main/dicts Dictionary file directory

  • /chinese-name-score/main/dicts/names_boys_double.txt dictionary file, boy's double word name

  • /chinese-name-score/main/dicts/names_boys_single.txt dictionary file, boy's word name

  • /chinese-name-score/main/dicts/names_girls_single.txt dictionary file, girl's double word name

  • /chinese-name-score/main/dicts/names_grils_double.txt dictionary file, girl's word name

  • /chinese-name-score/main/outputs Output Data Directory

  • Sample file for/chinese-name-score/main/outputs/names_girls_source_wxy.txt output

  • /chinese-name-score/main/scripts Some scripts to preprocess the dictionary files

  • /chinese-name-score/main/scripts/unique_file_lines.py set the dictionary file, the name of the dictionary to go to the weight and go blank line

  • System configuration of the/chinese-name-score/main/sys_config.py program, including crawl destination URL, dictionary file path

  • /chinese-name-score/main/user_config.py the user Configuration of the program, including the baby's date and time, gender and other settings

  • /chinese-name-score/main/get_name_score.py The running entry of the program

Ways to use code:

    1. If there is no qualifier, find the dictionary file names_boys_double.txt and Names_grils_double.txt, you can add yourself here to find some of the names of the list, split by row to add at the end;

    2. If there is a qualified word, find the dictionary file Names_boys_single.txt and Names_girls_single.txt, here to add their own pre-favorite single word list, split by row to add in the end;

    3. Open user_config.py, configure, configuration items see the next section;

    4. Run Script get_name_score.py

    5. In the outputs directory, view your output files, can be copied to Excel, sorting and other operations;

Configuration entry for the program

The program is configured as follows:

# coding:gb18030 "" "Write Here Configuration" "" Setting = {} # qualifier, if this value is configured, then the word dictionary will be taken, otherwise the dictionary setting["Limit_world" = "Country" # setting["name_ Prefix "] =" Li "# Gender, value for male or female setting[" sex "=" male "# Province setting[" area_province "] =" Beijing "# City setting[" area_region "] =" Haidian "# born  Gregorian years setting[' year ' = "2017" # born Gregorian month setting[' month ' = "1" # born Gregorian Day setting[' days '] = "11" # Born Gregorian Hour setting[' hour '] = "11" # Gregorian minutes of birth setting[' minute ' = "11" # Result output file name setting[' output_fname '] = "Names_girls_source_xxx.txt"

Depending on the configuration setting[“limit_world”] , the system automatically decides whether to choose a word dictionary or a multi-word dictionary:

    1. If the item is set, such as equals "country", then the program will combine all the words for the calculation, such as the country Hao and Hao country two names will calculate;

    2. If you do not set the item, the program will only read *_double.txt's double-word dictionary If you keep an empty string

Principles of the program

This is a simple reptile. You can open the Http://www.php.cn/website view, this is a post form, fill in the required parameters, click Submit, will open a result page, the results page contains eight points and five-grid score.

If you want to score, you need to do two things, one is the crawler automatically submit the form, get the results page, and the second is to extract the score from the results page;

For the first thing, it's simple, URLLIB2 can be implemented (code in/chinese-name-score/main/get_name_score.py):

Post_data = Urllib.urlencode (params) req = Urllib2.urlopen (sys_config. Request_url, post_data) content = Req.read ()

Here the params is a parameter dict, in this way, the post with the data submitted, and then from the content to obtain the result data.

The params parameters are set as follows:

params = {}  # Date type, 0 for Gregorian, 1 for lunar params[' data_type '] = "0" params[' year '] = "%s"% str (user_config.setting["year"]) para ms[' Month ' = '%s '% str (user_config.setting["month"]) params[' Day ' = '%s '% str (user_config.setting["Day"]) params[' Hour '] = "%s"% str (user_config.setting["Hour"]) params[' minute '] = "%s"% str (user_config.setting["Minute"]) params[' PID '] = "%s"% str (user_config.setting["area_province"]) params[' cid '] = "%s"% str (user_config.setting["area_region"]) # Hi with five lines, 0 for automatic analysis, 1 for custom hi-params[' wxxy ') = "0" params[' xing '] = "%s"% (user_config.setting["Name_prefix"]) params[' Ming '] = Name_postfix # indicates female, 1 means male if user_config.setting["sex"] = = "Male":  params[' sex '] = "1" Else:  params[' sex '] = "0"   params[' act '] = "submit" params[' isbz ') = "1"

The second thing is to extract the required score from the Web page, we can use BEAUTIFULSOUP4 to implement it, and its syntax is simple:

Soup = beautifulsoup (content, ' Html.parser ', from_encoding= "GB18030") Full_name = Get_full_name (name_postfix)  # Print Soup.find (string=re.compile (U "name five-box rating")) for node in Soup.find_all ("P", class_= "Chaxun_b"):  Node_cont = Node.get_text ()  if u ' name five-grid rating ' in Node_cont:   name_wuge = Node.find (string=re.compile (U "name five-rating"))   Result_ data[' wuge_score ' = Name_wuge.next_sibling.b.get_text ()    if u ' name eight ratings ' in Node_cont:   name_wuge = Node.find ( String=re.compile (U "name-eight-score"))   result_data[' bazi_score '] = Name_wuge.next_sibling.b.get_text ()

By this method, we can parse the HTML and extract the eight-and five-lattice fractions.

Run result cases

1/1287 Alain Li Jin name eight score = 61.5 name five-cell score = 78.6 Total =140.12/1287 Alain Li Iron name eight score = 61 name five-cell score = 89.7 Total =150.73/1287 Li Guojing name eight score = 21 Name five-score = 81.6 Total score =10 2.64/1287 Li Chuang name eight score = 21 name five-cell score = 90.3 Total =111.35/1287 Li Yuoguo name eight score = 64 name five-cell score = 78.3 score =142.36/1287 Alain Li by name BA score = 21 Name five-rated = 89.8 Total =1 10.87/1287 Li Guoti name eight score = 22 name Five points = 87.2 Total score =109.28/1287 Alain Li name eight score = 21 name five-cell score = 81.6 Total =102.69/1287 Young country name eight score = 21 Name five-score = 83.7 Total 104.710/1287 Alain Li add name eight score = 21 name five lattice score = 81.6 Total =102.611/1287 Alain Li Day name eight score = 22 name five lattice score = 83.7 score =105.712/1287 Li Guodian name eight score = 22 name five lattice rating =93. 7 out of =115.7

With these scores, we can sort them out, which is a useful reference.

Friendly Tips

    1. Scores are related to many factors, such as birth time, already limited words, limited words, strokes and other factors, these conditions determine that some names will not be high scores, do not be affected by this, find the relative score high can be;

    2. Currently the program can only crawl the content of a website, the address is http://life.httpcn.com/xingming.asp

    3. This list is for reference only, read a number of articles, history many celebrities, the name of the eight scores are very low but all make contributions, the name does have some influence but sometimes catchy is the best;

    4. Choose a name from this list, you can search in Baidu, Renren and other places, in case some negative person name, or the names of people too many rotten street;

    5. Eight score is the Chinese heritage, five-grid score is the Japanese modern invention, sometimes can also try the western Constellation naming method, and strange is the eight and five scores different site score difference is very large, but also shows that this thing for reference only;

The code for this article has been uploaded to GitHub

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.