This website collects cookies to deliver better user experience
SQL Query into Pandas DataFrame - Part 2
SQL Query into Pandas DataFrame - Part 2
Continuing the last part, we are going deep into intermediate SQL translated into Pandas DataFrame.
The Playground Database
We will be using the same SQLite database from the previous post, but now we will use the invoice table and the csv file of the table.
Preparing the DataFrame
import pandas as pd
invoices_df = pd.read_csv("invoices.csv")
Aggregation Queries into Pandas DataFrame
We will cover some aggregation function and groupby function in both sql and pandas.
Basic Aggregation
SUM:
SELECTSUM(Total)FROM invoices
invoices_df['Total'].sum()
COUNT():
SELECTCOUNT(BillingState)FROM invoices
invoices_df['BillingState'].count()
AVG():
SELECTAVG(Total)FROM invoices
invoices_df['Total'].mean()
MAX():
SELECTMAX(Total)FROM invoices
invoices_df['Total'].max()
MIN():
SELECTMIN(Total)FROM invoices
invoices_df['Total'].min()
GROUP BY
SELECT CustomerId,SUM(Total)AS Total
FROM invoices
GROUPBY CustomerId
## grouping with all (number) columns aggregatedinvoices_df.groupby(['CustomerId']).sum()## the same as the sql resultinvoices_df.groupby(['CustomerId']).sum().reset_index()[['CustomerId','Total']]