Use Python pandas to convert between DataFrame and SQL

Use Python pandas to convert between DataFrame and SQL

1: Python dependency installation

Since its about converting between DataFrame and SQL, of course we need to install both packages for DataFrame(pandas) and SQL(SQLAlchemy).

pip3 install -U pandas sqlalchemy

SQLAlchemy is a SQL toolkit and Object Relational Mapper(ORM) that gives application developers the full power and flexibility of SQL. So in addition, depending on different SQL database you are using, you should also install different corresponding package. Here are few options you can choose from:

  • PostgreSQL: psycopg2 as default api.
    • other: psycopg2, pg8000
  • MySQL: mysql-python as default api.
    • other: mysqldb, mysqlconnector, oursql, pymysql
  • Oracle: cx_oracle as default api.
  • MS SQL Server: pyodbc as default api.
    • other: pymssql
  • SQLite: python built-in module as default api.

Or you can go to SQLAlchemy official site for more info about api choices.

2: Convert from SQL to DataFrame

Pandas provides 3 functions to read SQL content: read_sql, read_sql_table and read_sql_query, where read_sql is a convinent wrapper for the other two. So for the most of the time, we only uses read_sql, as depending on the provided sql input, it will delegate to the specific function for us. For more reference, check pandas.read_sql.

function prototype:

pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

The second parameter above con is a SQLAlchemy connectable. Pandas uses it to decide which database to connect and how to connect etc. We can create an connectable by sqlalchemy.create_engine function that accepts a string of connection configuration by format dialect[+driver]://user:password@host/dbname.

So to summarize, we read from SQL by following command:

from sqlalchemy import create_engine
from pandas import read_sql
engine = create_engine('mysql+pymysql://root:123456@127.0.0.1:3306/my_db')
my_data = read_sql("my_table_name", engine)
my_data2 = read_sql("SELECT * FROM my_table_name", engine)

3: Convert from DataFrame to SQL

similarly, there is also a function called to_sql in DataFrame.

function prototype:

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)

So basically, we do the almost same things here by first creating a connectable and then call to_sql.

from pandas import DataFrame
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://root:123456@127.0.0.1:3306/my_db')
my_data = DataFrame([[1,2],[3,5]])
my_data.to_sql("my_table", engine)

… that easy!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s