Tutorials

How to Connect Snowflake with Python (Jupyter) Tutorial | Census

Parker Rogers
Parker Rogers November 30, 2021

Parker is a data community advocate at Census with a background in data analytics. He's interested in finding the best and most efficient ways to make use of data, and help other data folks in the community grow their careers. Salt Lake City, Utah, United States

In this article, you’ll find a step-by-step tutorial for connecting Python with Snowflake. Specifically, you'll learn how to:

  • How to connect Python (Jupyter Notebook) with your Snowflake data warehouse
  • How to retrieve the results of a SQL query into a Pandas data frame

As always, if you're looking for more resources to further your data skills (or just make your current data day-to-day easier) check out our other how-to articles here.

If your title contains “data” or “engineer”, you likely have strict programming language preferences. Good news: Snowflake hears you! You can now connect Python (and several other languages) with Snowflake to develop applications. If you're a Python lover, here are some advantages of connecting Python with Snowflake:

  • Optimize development time
  • Improved machine learning and linear regression capabilities
  • Accelerate operational analytics capabilities

In this tutorial, I'll run you through how to connect Python with Snowflake. Let's get into it.

Software requirements

First, you need to make sure you have all of the following programs, credentials, and expertise:

  1. A table in your Snowflake database with some data in it
  2. User name, password, and host details of the Snowflake database
  3. Jupyter Notebook
  4. Familiarity with Python and programming constructs

Next, we'll go to Jupyter Notebook to install Snowflake's Python connector.

Install Snowflake Connector for Python

The Snowflake Connector for Python gives users a way to develop Python applications connected to Snowflake, as well as perform all the standard operations they know and love. You can install the connector in Linux, macOS, and Windows environments by following this GitHub link, or reading Snowflake’s Python Connector Installation documentation.

To connect Snowflake with Python, you'll need the snowflake-connector-python connector (say that five times fast). You can install the package using a Python PIP installer and, since we're using Jupyter, you'll run all commands on the Jupyter web interface.

Before you go through all that though, check to see if you already have the connector installed with the following command:


pip show snowflake-connector-python

However, if the package doesn’t already exist, install it using this command:


pip install snowflake-connector-python

If you followed those steps correctly, you'll now have the required package available in your local Python ecosystem. Next, we'll tackle connecting our Snowflake database to Jupyter Notebook by creating a configuration file, creating a Snowflake connection, installing the Pandas library, and, running our read_sql function.

Create a configuration file

To start off, create a configuration file as a nested dictionary using the following authentication credentials:

  • Username: The username for your Snowflake account. You can find this by querying current_user() in a Snowflake worksheet.
  • Password: The password used to log in to Snowflake.
  • Host: The hostname of your Snowflake warehouse. Example: abc1234567.us-east-2

Here's an example of the configuration file python code:


conns = {'SnowflakeDB':{ 'UserName': 'python','Password':'Pythonuser1', 'Host':'ne79526.ap-south.1.aws'}}

I created a nested dictionary with the topmost level key as the connection name SnowflakeDB. Then, I wrapped the connection details as a key-value pair. In the future, if there are more connections to add, I could use the same configuration file.

Now, we'll use the credentials from the configuration file we just created to successfully connect to Snowflake.

Create a Snowflake connection

First, we'll import snowflake.connector with install snowflake-connector-python (Jupyter Notebook will recognize this import from your previous installation). Next, create a Snowflake connector connection that reads values from the configuration file we just created using snowflake.connector.connect. The code will look like this:


#import the module
import snowflake.connector 

#create the connection 
connection = snowflake.connector.connect( 
    user=conns['SnowflakeDB']['UserName']
    , password=conns['SnowflakeDB']['Password']
    , account=conns['SnowflakeDB']['Host'])

You've officially installed the Snowflake connector for Python! Now you can use the open-source Python library of your choice for these next steps. For this tutorial, I’ll use Pandas.

Install the Pandas library

You may already have Pandas installed. You can check by running print(pd._version_) on Jupyter Notebook. If it’s not already installed, run the following:


import pandas as pd

Once you have the Pandas library installed, you can begin querying your Snowflake database using Python and go to our final step.

Run read_sql function

read_sql is a built-in function in the Pandas package that returns a data frame corresponding to the result set in the query string. In this example query, we'll do the following:

  1. Call our Pandas package pd
  2. Running the read_sql package
  3. Writing a SQL query
  4. Specifying our Snowflake connection

The query and output will look something like this:


pd.read.sql("SELECT * FROM PYTHON.PUBLIC.DEMO WHERE FIRST_NAME IN ('Michael', 'José')", connection)

Output of read_sql function to complete Snowflake connection with python

Congratulations! You've officially connected Snowflake with Python and retrieved the results of a SQL query into a Pandas data frame. You can now use your favorite Python operations and libraries on whatever data you have available in your Snowflake data warehouse. This will help you optimize development time, improve machine learning and linear regression capabilities, and accelerate operational analytics capabilities (more on that below).

How to accelerate operational analytics capabilities

Operational analytics is a type of analytics that drives growth within an organization by democratizing access to accurate, relatively real-time data. It requires moving data from point A (ideally, the data warehouse) to point B (day-to-day SaaS tools). This means your data isn't just trapped in a dashboard somewhere, getting more stale by the day. Instead, you're able to use Snowflake to load data into the tools your customer-facing teams (sales, marketing, and customer success) rely on every day.

Here are some of the high-impact use cases operational analytics unlocks for your company when you query Snowflake data using Python:

  • Sending product data from the data warehouse to Zendesk to help customer success teams prioritize tickets.
  • Sending marketing data from the data warehouse to Facebook to help marketers create highly-personalized ad campaigns.
  • Sending product analytics data of free trial users to Salesforce to help sales reps identify high-probability deals.

Now, you can get started with operational analytics using the concepts we went over in this article, but there's a better (and easier) way to do more with your data. That was is reverse ETL tooling, which takes all the DIY work of sending your data from A to B off your plate.  If you'd like to learn more, sign up for a demo or try the product for free!

And, of course, if you have any questions about connecting Python to Snowflake or getting started with Census, feel free to drop me a line anytime.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: