Python "8"-Parse JSON file

Source: Internet
Author: User
Tags time zones timezones

first, the basic knowledge of this section1. Progressive reading of files
 for  in open ('E:\Demo\python\json.txt'):    Print Line

2. Parsing JSON strings

There are built-in modules in Python that make it very easy to convert a JSON string into a Python object. For example, the Json.relaods () method in the JSON module resolves the JSON string to the appropriate dictionary.

Import jsons='{"A": "Googlemaps\/rochesterny", "C": "US", "NK": 0, "TZ": "America\/denver", "gr": "UT", " G ":" Mwszks "," H ":" Mwszks "," L ":" bitly "," hh ":" 1.usa.gov "," R ":" Http:\/\/www. " awaremap.com\/"," U ":" http:\/\/www.monroecounty.gov\/etc\/911\/rss.php "," T ": 1331926741," HC ": 1308262393," cy ":" Provo "," ll ": [40.218102, -111.613297]}'o=json.loads (s)print o

Operation Result:

{u'a': U'Googlemaps/rochesterny', u'C': U'US', u'NK': 0, U'TZ': U'America/denver', u'GR': U'UT', u'g': U'Mwszks', u'h': U'Mwszks', u'Cy': U'Provo', u'L': U'bitly', u'hh': U'1.usa.gov', u'R': U'http://www.AwareMap.com/', u'u': U'http://www.monroecounty.gov/etc/911/rss.php', u'T': 1331926741, U'HC': 1308262393, U'll': [40.218102,-111.613297]}

3. List-Generated

See: http://www.cnblogs.com/janes/p/5530979.html

second, parse the JSON file into a dictionary list

To parse the JSON file, we first read the file line by row, converting each line into the corresponding Dictionary object, and then forming a list.

Import JSON # reading a file and parsing a list of dictionaries  for  in open ('E:\Demo\python\json.txt')]# Prints the first dictionary element   print diclist[0]# Prints the time zone in the first element print diclist[0]['  tz']

Operation Result:

{u ' a ': U ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/535.11 (khtml, like Gecko) chrome/17.0.963.78 safari/535.11 ', U ' C ': U ' US ', U ' nk ': 1, U ' TZ ': U ' america/n Ew_york ', U ' gr ': U ' MA ', U ' G ': U ' A6QOVH ', u ' h ': U ' wflqtf ', U ' cy ': U ' danvers ', U ' l ': U ' Orofrog ', U ' al ': U ' en-us,en;q=0.8 ', u ' HH ': U ' 1.usa.gov ', U ' r ': U ' http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/wfLQtf ', U ' u ': U '/http Www.ncbi.nlm.nih.gov/pubmed/22415991 ', U ' t ': 1331923247, U ' HC ': 1331822918, U ' ll ': [42.576698,-70.954903]}

America/new_york

Iii. using the Python standard library to count time zone data in JSON files1. First put all time zone data in a list
# get all time zone data timezones=[item['tz'for inif' TZ ' inch Item] # test before printing five print Timezones[0:5]

Operation Result:

[u ' america/new_york ', U ' america/denver ', U ' america/new_york ', U ' America/sao_paulo ', U ' america/new_york ']

2. Then convert the time zone list to the time zone count dictionary, key is the time zone name, and value is the number of occurrences.
#custom functions, Statistics time zone occurrencesdefCountzone (timezones): Count_zone={}     forTzinchtimezones:if(TZinchcount_zone): Count_zone[tz]+=1Else: Count_zone[tz]=1returnCount_zone#Custom Function, return top Ndefcounttop (diccount,n): Valuekeyitems=[(Value,key) forKey,valueinchDiccount.items ()] Valuekeyitems.sort ()returnvaluekeyitems[-N:]#test and print the 5 most frequently occurring time zonesCount=Countzone (timezones)PrintCounttop (count,5)

Operation Result:

[(191, U ' America/denver '), (382, U ' america/los_angeles '), (+, U ' America/chicago '), (521, U '), (1251, U ' america/new_ York ')]

3. Using the Defaultdict simplification function Countzone function

The Python standard library collections Some data structures and is more convenient to use, where defaultdict can assign default value to the dictionary.

 from Import Defaultdict,counter def Countzone (timezones):    count_zone=defaultdict (int)     for in TimeZones:        Count_zone[tz]+=1    return Count_zone

4. Use collections. Counter simplifying counttop functions
 from Import Counter def counttop (diccount,n):     return Counter (Diccount). Most_common (N)

5. Complete code
#-*-coding:utf-8-*-ImportJSON#1. Read the file and convert it to a dictionary list#reading a file and parsing a list of dictionariesDiclist=[json.loads (line) forLineinchOpen'E:\Demo\python\json.txt')]#2. Statistical time zone#get all time zone datatimezones=[item['TZ'] forIteminchDiclistif 'TZ' inchItem]#count time zone occurrences fromCollectionsImportDefaultdict,counterdefCountzone (timezones): Count_zone=defaultdict (int) forTzinchTimezones:count_zone[tz]+=1returnCount_zone#Return Top Ndefcounttop (diccount,n):returnCounter (Diccount). Most_common (n)#test and print the 5 most frequently occurring time zonesCount=Countzone (timezones)PrintCounttop (count,5)

#运行结果: [(U ' america/new_york ', 1251), (U ', 521), (U ' america/chicago ', +), (U ' america/los_angeles ', 382), (U ' america/ Denver ', 191)]

four using Pandas to count time zone data in JSON files1. Using Dataframe to count time zone data

①dataframe is a very common data structure in pandas, which transforms data into a structure similar to a table.

# -*-coding:utf-8-*- Import JSON  from Import dataframediclist  for  in open ('E:\Demo\python\json.txt')]frame=DataFrame ( diclist)# Test print time zone list Top 5 elements print frame['tz' ][:5]

Operation Result:

0 America/new_york

1 america/denver

2 America/new_york

3 America/sao_paulo

4 America/new_york

②frame[' TZ '] has a value_counts () function that can return the corresponding count directly.

#打印出现次数最多的5个时区

print frame['tz'].value_counts () [: 5]

Operation Result:

America/new_york 1251

521

America/chicago 400

America/los_angeles 382

America/denver 191

③ the default value for data that does not exist for the time zone data or the time zone is an empty string.

The Fillna () function can complement a nonexistent field, and an empty string can be replaced by a Boolean index.

tzlist=frame['tz'].fillna ('Missing'= = ') ]='Unknown'print tzlist.value_counts () [: 5]

Operation Result:

America/new_york 1251

Unknown 521

America/chicago 400

America/los_angeles 382

America/denver 191

So we're done with the same work as the standard Python library, and the complete code is as follows:

#-*-coding:utf-8-*-ImportJSON fromPandasImportdataframediclist=[json.loads (line) forLineinchOpen'E:\Demo\python\json.txt')]frame=DataFrame (diclist)#Print 5 time zones with the most occurrencesPrintframe['TZ'].value_counts () [: 5]#The completion time zone does not exist or is emptytzlist=frame['TZ'].fillna ('Missing') Tzlist[tzlist=="']='Unknown'PrintTzlist.value_counts () [: 5]

2. Use plot method to draw vertical bar chart

Reference: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html

Tzlist.value_counts () [: 5].plot (kind= ' bar ', rot=0)

Run: We can use the%paste command to paste the code into the run.

Command line:

Ipython%pylab%paste

Operation Result:

JSON file used in this article: click here to download

Reference: "Data analysis using Python"

If you want to reprint, please indicate the source: http://www.cnblogs.com/janes/p/5546673.html

Python "8"-Parse JSON file

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.