History of renaming Chinese universities Python

Source: Internet
Author: User

Last week led a new task, do domestic colleges and universities change the history of statistics, this is very interesting, the following is my task to complete the process, and everyone to share. I. Data collectionData requirements: currently has a university name, the history of the renaming of colleges and universities name Data source: Try from the University Rankings website (ipin), China Education and scientific research computer network crawl, but the university name is not complete, the former 709, the latter 1394, finally from the Ministry of Education to find a list: "2015 National University list  ", based on the information of the Ministry of Education is more authoritative and reliable, my idea is to crawl the Ministry of Education issued a letter to collect university renaming information. University renaming data Source: Ministry of Education Information Comprehensive search, in view of the Ministry of Education issued a large number of information, and the format of the letter, I divided into several categories crawl: merger, renaming, establishment, on the basis of XX, to set up. Then according to the URL of the search jump, crawl different categories, probably from the URL can be observed from the jump, so replace the keyword part, re-query will be able to obtain new results, and then jump based on the page to crawl all the relevant notifications. Detailed implementation details are more cumbersome, interested students can see the code on GitHub: The final data results are as follows: 1. University name: 2553 Colleges and 2 of the general high school. Renaming history: Ministry of Education 1995-2015 issued a letter, about 665, the Ministry of Education to publicize 1990-2006 May, the merger of universities, 431 records. two. Data Analysis

Next is the data to clean, rule processing process, although useless to what algorithm, but is full of tears AH ~ ~ Basic Everyone can understand, detailed code point here, the approximate process is as follows:

ImportReImportstringImportJsonsch= {}#School Renaming history DictionaryRemain_sch = {}#Added Schools#To deal with the merger of universitiesdefDealcombinerp (rp_file):#to deal with the establishment of the university notice, the original notice is incomplete, the data has been completed from the notification filedefDEALFOUNDRP (rp_file):#Notice of the change of University namedefDealrenamerp (rp_file):#Processing University Transfer NoticedefDEALSETUPRP (rp_file):#Processing University Establishment NoticedefDealupgraderp (rp_file):#processing Ministry of Education merger notice: 1990-20060515defDealcombinefile (combine_file):#Redo, remove duplicate renaming (merge)defremoveduplicate ():#Import School Namedefloadschoolname ():#results are saved in JSON formatdefShowresult ()defMain ():GlobalSchGlobalRemain_sch Sch_file="./data/sch_name/sch_name_gov.txt"Rp_rename_file="./data/reports/reports_rename.txt"Rp_upgrade_file="./data/reports/reports_upgrade.txt"Rp_setup_file="./data/reports/reports_setup.txt"Rp_found_file="./data/reports/reports_found.txt"Rp_combine_file="./data/reports/reports_combine.txt"school_combine_since1990="./data/reports/school_combine_since1990.txt"Sch=loadschoolname (sch_file)#print "Before:", Len (Sch)DEALSETUPRP (rp_setup_file) dealcombinefile (school_combine_since1990) dealcombinerp (rp_combine_file) d EALFOUNDRP (Rp_found_file) dealrenamerp (rp_rename_file) dealupgraderp (rp_upgrade_file)#print "After:", Len (Sch)removeduplicate () Showresult ( )if __name__=='__main__': Main ()

Three. Analysis results

Tagged results: The original document school 2,554, increased to 2,690, a total of 828 schools marked. Many colleges have also been added to the basic task. It is also quite spectacular to make a list, and here are some of the results:

{    "Guangdong Ocean University": [        "Zhanjiang Agricultural College",        "Zhanjiang Ocean University",        "Zhanjiang Aquatic College"    ],    "Guangdong Ocean University-jin College": [],    "Guangdong Environmental Protection Engineering Vocational college": [],    "Guangdong Polytechnic College": [        "Zhaoqing Science and technology vocational and technical College"    ],    "Guangdong Polytechnic Vocational College": [],    "Guangdong Ecological Engineering Vocational College": [],    "Guangdong Baiyun College": [],    "Guangdong Foreign Language Arts vocational college": [],    "Guangdong Institute of Petroleum and Chemical engineering": [        "Maoming College",        "Guangdong College of Petroleum and Chemical engineering",        "Guangdong Province Maoming Institute of Education",        "Maoming Petroleum Industry company Staff University"    ],    "Guangdong Country Garden Vocational college": [],    "Guangdong Science and Technology Vocational college": [],    "Guangdong Institute of Science and Technology": [        "Dongguan South Bo Vocational and Technical College"    ],    "Guangdong Kemao Vocational college": [],    "Guangdong Second Teacher's college": [        "Guangdong Institute of Education"    ],    "Guangdong Vocational and Technical College": [],    "Guangdong Dance and drama vocational College": [],    "Guangdong Institute of Pharmacy": [],    "Guangdong Administrative Vocational college": [],    "Guangdong Police Academy": [        "Guangdong Public Security College"    ],    "Guangdong University of Finance and Economics": [        "Guangdong Business School"    ],    "China Business College of Guangdong University of Finance and Economics": [],    "Guangdong Light Industry vocational and Technical College": [        "Guangzhou Light Industry School"    ],    "Guangdong Post and Telecommunications vocational and technical college": [],    "Guangdong Institute of Finance": [        "Guangzhou Finance College"    ],}

The basic task is this, complete project see here Schoolcard, have the problem friend can communicate.

Resources:

1. Love to fight the net Ipin:http://www.ipin.com/school/ranking.do

2. China education and research computer: http://ziyuan.eol.cn/list.php?listid=128

3. Ministry of Education: http://www.moe.gov.cn/jyb_sy/

History of renaming Chinese universities Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.