Use Python pandas to convert between DataFrame and SQL
1: Python dependency installation
Since its about converting between DataFrame and SQL, of course we need to install both packages for DataFrame(pandas) and SQL(SQLAlchemy).
pip3 install -U pandas sqlalchemy
SQLAlchemy is a SQL toolkit and Object Relational Mapper(ORM) that gives application developers the full power and flexibility of SQL. So in addition, depending on different SQL database you are using, you should also install different corresponding package. Here are few options you can choose from:
psycopg2as default api.
mysql-pythonas default api.
cx_oracleas default api.
- MS SQL Server:
pyodbcas default api.
- SQLite: python built-in module as default api.
Or you can go to SQLAlchemy official site for more info about api choices.
2: Convert from SQL to DataFrame
Pandas provides 3 functions to read SQL content:
read_sql is a convinent wrapper for the other two. So for the most of the time, we only uses
read_sql, as depending on the provided sql input, it will delegate to the specific function for us. For more reference, check pandas.read_sql.
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)
The second parameter above
con is a SQLAlchemy connectable. Pandas uses it to decide which database to connect and how to connect etc. We can create an connectable by
sqlalchemy.create_engine function that accepts a string of connection configuration by format
So to summarize, we read from SQL by following command:
from sqlalchemy import create_engine from pandas import read_sql engine = create_engine('mysql+pymysql://root:firstname.lastname@example.org:3306/my_db') my_data = read_sql("my_table_name", engine) my_data2 = read_sql("SELECT * FROM my_table_name", engine)
3: Convert from DataFrame to SQL
similarly, there is also a function called
to_sql in DataFrame.
DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)
So basically, we do the almost same things here by first creating a connectable and then call
from pandas import DataFrame from sqlalchemy import create_engine engine = create_engine('mysql+pymysql://root:email@example.com:3306/my_db') my_data = DataFrame([[1,2],[3,5]]) my_data.to_sql("my_table", engine)
… that easy!