Member-only story

How to boost SQLAlchemy data fetch speed from SnowFlake to Pandas DataFrame

Dmitry Ermolchik
2 min readJan 24, 2023

--

SQLAlchemy is a well-known ORM library (https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping) to work with databases in Python. Many data analysts and Python developers use it on day to day basis.

This library abstracts you from the low-level interaction with the database driver. But you may and will lose some advanced features.

If we talk about SnowFlake, one of the features is transferring data between DB and your application with the Apache Arrow (https://arrow.apache.org/) library.

Easy solution

Usually, you create an engine like this:

from sqlalchemy import create_engine

engine = create_engine("snowlake://user:pass@account.us-east-1/mydatabase?schema=my_schema&warehouse=my_warehouse")
df = pandas.read_sql("SELECT * FROM BIG_TABLE", engine)

This is the most simple and universal way to connect to the database. You can copy/paste this snippet, replace the SnowFlake connection string with the MS SQL Server connection string and begin a new block in jupyterbook or Python script.

But if you need to pull a massive amount of data from the database, you will spend a lot of time just waiting while all bytes are downloaded to your machine. The root cause is the data transfer format. Data will be transferred in JSON format.

--

--

No responses yet

Write a response