41
loading...
This website collects cookies to deliver better user experience
We also have a dedicated blog for manipulating and querying your Google BigQuery data using Python and R, in case you are interested.
For this post, we assume that you have the data already loaded in your Redshift instance. In case you haven't, the best way to load your data to Redshift is to leverage Customer Data Infrastructure tools such as RudderStack. They allow you to collect your data across all the customer touch-points, and load them securely into Redshift -- or any other warehouse of your choice, with minimal effort.
psycopg
Python driver to connect to our Redshift instance. That said, please feel free to experiment with any other library of your choice to do so.import psycopg2
con=psycopg2.connect(dbname= 'dbname', host='host',
port= 'port', user= 'user', password= 'pwd')
psycopg
, we perform these steps:cur = con.cursor()
cur.execute("SELECT * FROM `table`;")
O
nce the query is successfully executed, we instruct psycopg
to fetch the data from the database. For further data analysis, it makes sense to get the complete dataset. Hence, we run the following command:
cur.fetchall()
cur.close()
conn.close()
NumPy
and pandas
:NumPy
array. We initialize a new NumPy
array and pass the cursor containing the query results as a parameter.import numpy as np
data = np.array(cur.fetchall())
NumPy
for your data analysis. For this, however, the steps involved are a bit different.from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://scott:tiger@hredshift_host:<port_no>/mydatabase')
data_frame = pd.read_sql('SELECT * FROM `table`;', engine
RPostgreSQL
package to connect to our Redshift instance and then run queries on the data.install.packages("RPostgreSQL")
require("RPostgreSQL")
drv <- dbDriver("PostgreSQL")
con <-dbConnect(drv,dbname="dbname",host="host",port=1234,
user="user",password="password")
dbDisconnect(con)
Note: It is important to close the connection once you are done with pulling the data out of the database.
df_postgres <- dbGetQuery(con, "SELECT * from `table`")