Python reads JSON file with SQL and codecs read large file problems

Source: Internet
Author: User

Preface: Recently helped the senior to deal with the JSON file, you need to read into the database, in case of subsequent reading from the database data. The data is on the YELP website: https://github.com/Yelp/dataset-examples,http://www.yelp.com/dataset_challenge/. involves some JSON and SQL issues, which are documented below.

One, python SQL installation

Python comes with a lightweight database sqlite, but it doesn't work. MySQL is required, pip installation MySQL failed, Easy_install installation also failed, this unscientific. After the help of Tongren, with Conda installation success, this what ghost. Okay, check it out. The Package Manager Condathat comes with Python.

<span style= "FONT-SIZE:18PX;" >pip Install mysqldbeasy_install mysqldbpip install mysqleasy_install mysqlipythonwhich pythonsudo Conda Search Mysqlconda Search Mysqlconda Install mysql-python</span>

II. processing of JSON data

Python has a package that comes with parsing json , like parsing HTML beautifulsoup, parsing XML packets, and so on. Use the Json.loads () function to break. The following lines of code can be broken.

<span style= "FONT-SIZE:18PX;" >import Jsonimport codecsf = codecs.open (file_name,encoding = "Utf-8") for line in F: line    = Line.strip ("\ n")    L Ine_dict = Json.loads (line) </span>
It is important to note that:

1. The main use of the halogen is codecs read into the file, once thought

<span style= "FONT-SIZE:18PX;" >with Codecs.open (file_name,encoding = "Utf-8") as f:    text = F.readlines () </span>

The ReadLines () is a one-line read, but the memory overflows when it encounters a JSON file of 1.4G. Instead of using the ReadLines () function to read directly with the above.

2.json.loads () Incoming parameters need to be a JSON string, the main line of a halogen read-in, the JSON string passed, parsed, is a dictionary. The next step is to deal with it. Look at individual needs for analysis.

#============================

Method 2: Pass the entire JSON file as a parameter to the

f = File (file_name)

s = json.load (f)

But this will encounter Valueerror:extra data error, check the following information, said to be a number of JSON object problem, this is not nonsense, a folder in a certain number of JSON objects. StackOverflow inside the explanation is very detailed http://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data.

<span style= "FONT-SIZE:18PX;" >>>> json.loads (' {} ') {}>>> json.loads (' {} {} ') # = = Json.loads (Json.dumps ({}) + Json.dumps ({})) Traceback (most recent):  file ' <stdin> ', line 1, in <module>  file "C:\Python27\lib\json\__ init__.py ", line 338, in loads    return _default_decoder.decode (s)  File" C:\Python27\lib\json\decoder.py ", line 368, in decode    raise ValueError (ErrMsg ("Extra data", s, End, Len (s))) Valueerror:extra data:line 1 column 3-line 1 Column 5 (char 2-4) >>> Dict1 = {}>>> Dict2 = {}>>> json.dumps ([Dict1, Dict2]) ' [{}, {}] ' >&G T;> json.loads (Json.dumps ([Dict1, Dict2])) [{}, {}]</span>

Halogen Lord Useless Method 2, therefore did not delve into.

third, save SQL

Not at first, it's easier to actually find a blog to write your own code than you think. Code that has comments directly on it.

<span style= "font-size:18px;" >import MySQLdb as mdb# need to create a database Yelp_dataset_challenge_academic_daasetconn = mdb.connect (host = ' XXX. Xx. Xx. XX ', user = ' XXX ', passwd = ', db = ' yelp_dataset_challenge_academic_daaset ') cur = conn.cursor () #初始化游标 #conn.set_characte R_set ("Utf-8") Cur.execute (' Set NAMES utf8; ') cur.execute (' Set CHARACTER set UTF8; ') Cur.execute (' SET Character_set_connection=utf8; ') #=============== the table, delete the existing records first. The table itself is not deleted without a drop, with deletetable_name = "Yelp_academic_dataset_checkin" delete_table = "Delete from" +table_namecur.execute (delete_table) #需要在数据库中创建表yelp_academic_dataset_checkin, and field and field property types. #写sql语句创建也行, Insert_sql = "INSERT into Yelp_academic_dataset_checkin (type,business_id,checkin_info) VALUES (%s,%s,%s "#===== some steps to get value from JSON, slightly. ============values_tuple = (str (temp_values[0]), str (temp_values[1]), str (temp_values[2])) Cur.execute (Insert_sql, Values_tuple) #执行完了, you need to turn off Conn.commit () conn.close () </span> 

In addition it seems that JSON--->dataframe--->sql, with pandas.io.json related. The Lord did not try, and later had a chance to try.

Reference:

1.https://github.com/yelp/dataset-examples

2.http://www.yelp.com/dataset_challenge/

3.http://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Python reads JSON file with SQL and codecs read large file problems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.