Original: Chapter 9
Import pandas as PD
import sqlite3
So far, we've only been involved in reading data from a CSV file. This is a common way to store data, but there are many other ways. Pandas can be from html,json,sql,excel (!!! ), Hdf5,stata and other things to read data. In this chapter, we will discuss reading data from an SQL database.
You can use the Pd.read_sql function to read data from an SQL database. Read_sql will automatically convert the SQL column name to the dataframe column name.
Read_sql requires 2 parameters: A SELECT statement and a database connection object. This is excellent because it means that you can read from any kind of SQL database-either mysql,sqlite,postgresql or something else.
This example reads from the SQLite database, but any other database will work in the same way.
con = Sqlite3.connect (".. /data/weather_2012.sqlite ")
df = Pd.read_sql (" select * from weather_2012 LIMIT 3 ", con)
DF
ID |
Date_time |
Temp |
0 |
1 |
2012-01-01 00:00:00 |
1 |
2 |
2012-01-01 01:00:00 |
2 |
3 |
2012-01-01 02:00:00 |
Read_sql does not automatically set the primary key (ID) to the Dataframe index. You can do this by adding a index_col parameter to the Read_sql.
If you use read_csv heavily, you may have seen it have a index_col parameter. This behavior is the same.
DF = Pd.read_sql ("select * from weather_2012 LIMIT 3", con, index_col= ' id ')
DF
Date_time |
Temp |
Id |
|
1 |
2012-01-01 00:00:00 |
2 |
2012-01-01 01:00:00 |
3 |
2012-01-01 02:00:00 |
If you want Dataframe to be indexed by more than one column, you can provide a list of columns to Index_col:
DF = Pd.read_sql ("select * from weather_2012 LIMIT 3", con,
index_col=[' id ', ' date_time '])
DF
|
Temp |
Id |
Date_time |
1 |
2012-01-01 00:00:00 |
2 |
2012-01-01 01:00:00 |
3 |
2012-01-01 02:00:00 |
9.2 Write to SQLite database
Pandas owns the Write_frame function, which creates a database table from Dataframe. Now this applies only to the SQLite database. Let's use it to convert our 2012 weather data to SQL.
You will notice that the function is in the pd.io.sql. There are many useful functions in Pd.io to read and write various types of data, and it is worthwhile to spend some time exploring them. (see the documentation.) )
WEATHER_DF = Pd.read_csv ('.. /data/weather_2012.csv ')
con = Sqlite3.connect ("... /data/test_db.sqlite ")
con.execute (" DROP TABLE IF EXISTS weather_2012 ")
weather_df.to_sql (" weather_2012 ", Con
We can now read the data from the weather_2012 table in Test_db.sqlite, and we see that we have the same data:
con = Sqlite3.connect (".. /data/test_db.sqlite ")
df = Pd.read_sql (" select * from weather_2012 LIMIT 3 ", con)
DF
Index |
Date/time |
Temp (C) |
Dew Point Temp (C) |
Rel Hum (%) | wind up
Spd ( | /km)
Visibility (km) |
Stn (kPa) |
Weather |
0 |
0 |
2012-01-01 00:00:00 |
-1.8 |
-3.9 |
86 |
4 |
8 |
101.24 |
1 |
1 |
2012-01-01 01:00:00 |
-1.8 |
-3.7 |
87 |
4 |
8 |
101.24 |
2 |
2 |
2012-01-01 02:00:00 |
-1.8 |
-3.4 |
89 |
7 |
4 |
101.26 |
The advantage of saving data in a database is that you can execute arbitrary SQL queries. This is really cool, especially if you are more familiar with SQL. The following is an example of weather column ordering:
Index |
Date/time |
Temp (C) |
Dew Point Temp (C) |
Rel Hum (%) | wind up
Spd ( | /km)
Visibility (km) |
Stn (kPa) |
Weather |
0 |
67 |
2012-01-03 19:00:00 |
-16.9 |
-24.8 |
50 |
24 |
25 |
101.74 |
1 |
114 |
2012-01-05 18:00:00 |
-7.1 |
-14.4 |
56 |
11 |
25 |
100.71 |
2 |
115 |
2012-01-05 19:00:00 |
-9.2 |
-15.4 |
61 |
7 |
25 |
100.80 |
If you have a PostgreSQL database or a MySQL database, the way it reads from it works exactly as it reads from the SQLite database. Use Psycopg2.connect () or MySQLdb.connect () to create the connection, and then use the
Pd.read_sql ("Select whatever from Your_table", con)
9.3 Connecting to other types of databases
To connect to the MySQL database:
Note: To make it work, you need to have a mysql/postgresql database with the correct localhost, database name, and others.
Import MySQLdb con = mysqldb.connect (host= "localhost", db= "test")
To connect to the PostgreSQL database:
Import psycopg2 con = psycopg2.connect (host= "localhost")