first, the basic knowledge of this section1. Progressive reading of files
for in open ('E:\Demo\python\json.txt'): Print Line
2. Parsing JSON strings
There are built-in modules in Python that make it very easy to convert a JSON string into a Python object. For example, the Json.relaods () method in the JSON module resolves the JSON string to the appropriate dictionary.
Import jsons='{"A": "Googlemaps\/rochesterny", "C": "US", "NK": 0, "TZ": "America\/denver", "gr": "UT", " G ":" Mwszks "," H ":" Mwszks "," L ":" bitly "," hh ":" 1.usa.gov "," R ":" Http:\/\/www. " awaremap.com\/"," U ":" http:\/\/www.monroecounty.gov\/etc\/911\/rss.php "," T ": 1331926741," HC ": 1308262393," cy ":" Provo "," ll ": [40.218102, -111.613297]}'o=json.loads (s)print o
Operation Result:
{u'a': U'Googlemaps/rochesterny', u'C': U'US', u'NK': 0, U'TZ': U'America/denver', u'GR': U'UT', u'g': U'Mwszks', u'h': U'Mwszks', u'Cy': U'Provo', u'L': U'bitly', u'hh': U'1.usa.gov', u'R': U'http://www.AwareMap.com/', u'u': U'http://www.monroecounty.gov/etc/911/rss.php', u'T': 1331926741, U'HC': 1308262393, U'll': [40.218102,-111.613297]}
3. List-Generated
See: http://www.cnblogs.com/janes/p/5530979.html
second, parse the JSON file into a dictionary list
To parse the JSON file, we first read the file line by row, converting each line into the corresponding Dictionary object, and then forming a list.
Import JSON # reading a file and parsing a list of dictionaries for in open ('E:\Demo\python\json.txt')]# Prints the first dictionary element print diclist[0]# Prints the time zone in the first element print diclist[0][' tz']
Operation Result:
{u ' a ': U ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/535.11 (khtml, like Gecko) chrome/17.0.963.78 safari/535.11 ', U ' C ': U ' US ', U ' nk ': 1, U ' TZ ': U ' america/n Ew_york ', U ' gr ': U ' MA ', U ' G ': U ' A6QOVH ', u ' h ': U ' wflqtf ', U ' cy ': U ' danvers ', U ' l ': U ' Orofrog ', U ' al ': U ' en-us,en;q=0.8 ', u ' HH ': U ' 1.usa.gov ', U ' r ': U ' http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/wfLQtf ', U ' u ': U '/http Www.ncbi.nlm.nih.gov/pubmed/22415991 ', U ' t ': 1331923247, U ' HC ': 1331822918, U ' ll ': [42.576698,-70.954903]}
America/new_york
Iii. using the Python standard library to count time zone data in JSON files1. First put all time zone data in a list
# get all time zone data timezones=[item['tz'for inif' TZ ' inch Item] # test before printing five print Timezones[0:5]
Operation Result:
[u ' america/new_york ', U ' america/denver ', U ' america/new_york ', U ' America/sao_paulo ', U ' america/new_york ']
2. Then convert the time zone list to the time zone count dictionary, key is the time zone name, and value is the number of occurrences.
#custom functions, Statistics time zone occurrencesdefCountzone (timezones): Count_zone={} forTzinchtimezones:if(TZinchcount_zone): Count_zone[tz]+=1Else: Count_zone[tz]=1returnCount_zone#Custom Function, return top Ndefcounttop (diccount,n): Valuekeyitems=[(Value,key) forKey,valueinchDiccount.items ()] Valuekeyitems.sort ()returnvaluekeyitems[-N:]#test and print the 5 most frequently occurring time zonesCount=Countzone (timezones)PrintCounttop (count,5)
Operation Result:
[(191, U ' America/denver '), (382, U ' america/los_angeles '), (+, U ' America/chicago '), (521, U '), (1251, U ' america/new_ York ')]
3. Using the Defaultdict simplification function Countzone function
The Python standard library collections Some data structures and is more convenient to use, where defaultdict can assign default value to the dictionary.
from Import Defaultdict,counter def Countzone (timezones): count_zone=defaultdict (int) for in TimeZones: Count_zone[tz]+=1 return Count_zone
4. Use collections. Counter simplifying counttop functions
from Import Counter def counttop (diccount,n): return Counter (Diccount). Most_common (N)
5. Complete code
#-*-coding:utf-8-*-ImportJSON#1. Read the file and convert it to a dictionary list#reading a file and parsing a list of dictionariesDiclist=[json.loads (line) forLineinchOpen'E:\Demo\python\json.txt')]#2. Statistical time zone#get all time zone datatimezones=[item['TZ'] forIteminchDiclistif 'TZ' inchItem]#count time zone occurrences fromCollectionsImportDefaultdict,counterdefCountzone (timezones): Count_zone=defaultdict (int) forTzinchTimezones:count_zone[tz]+=1returnCount_zone#Return Top Ndefcounttop (diccount,n):returnCounter (Diccount). Most_common (n)#test and print the 5 most frequently occurring time zonesCount=Countzone (timezones)PrintCounttop (count,5)
#运行结果: [(U ' america/new_york ', 1251), (U ', 521), (U ' america/chicago ', +), (U ' america/los_angeles ', 382), (U ' america/ Denver ', 191)]
four using Pandas to count time zone data in JSON files1. Using Dataframe to count time zone data
①dataframe is a very common data structure in pandas, which transforms data into a structure similar to a table.
# -*-coding:utf-8-*- Import JSON from Import dataframediclist for in open ('E:\Demo\python\json.txt')]frame=DataFrame ( diclist)# Test print time zone list Top 5 elements print frame['tz' ][:5]
Operation Result:
0 America/new_york
1 america/denver
2 America/new_york
3 America/sao_paulo
4 America/new_york
②frame[' TZ '] has a value_counts () function that can return the corresponding count directly.
#打印出现次数最多的5个时区
print frame['tz'].value_counts () [: 5]
Operation Result:
America/new_york 1251
521
America/chicago 400
America/los_angeles 382
America/denver 191
③ the default value for data that does not exist for the time zone data or the time zone is an empty string.
The Fillna () function can complement a nonexistent field, and an empty string can be replaced by a Boolean index.
tzlist=frame['tz'].fillna ('Missing'= = ') ]='Unknown'print tzlist.value_counts () [: 5]
Operation Result:
America/new_york 1251
Unknown 521
America/chicago 400
America/los_angeles 382
America/denver 191
So we're done with the same work as the standard Python library, and the complete code is as follows:
#-*-coding:utf-8-*-ImportJSON fromPandasImportdataframediclist=[json.loads (line) forLineinchOpen'E:\Demo\python\json.txt')]frame=DataFrame (diclist)#Print 5 time zones with the most occurrencesPrintframe['TZ'].value_counts () [: 5]#The completion time zone does not exist or is emptytzlist=frame['TZ'].fillna ('Missing') Tzlist[tzlist=="']='Unknown'PrintTzlist.value_counts () [: 5]
2. Use plot method to draw vertical bar chart
Reference: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html
Tzlist.value_counts () [: 5].plot (kind= ' bar ', rot=0)
Run: We can use the%paste command to paste the code into the run.
Command line:
Ipython%pylab%paste
Operation Result:
JSON file used in this article: click here to download
Reference: "Data analysis using Python"
If you want to reprint, please indicate the source: http://www.cnblogs.com/janes/p/5546673.html
Python "8"-Parse JSON file