29
loading...
This website collects cookies to deliver better user experience
pip install pandasql
from pandasql import sqldf
from sklearn import datasets
data = datasets.load_wine(as_frame=True)['data']
sqldf("SELECT * FROM data LIMIT 10", globals())
sqldf
function is straightforward to understand. It's a SQL query. The second argument, global()
tells the function use the global context to find the dataset. The other possible value for this argument is local()
which tells the interpreter to look only within the context of a function block. df[:10]
functool.partial
utility to create a partial function that doesn't require you to mention the environment every time. Here's how to do it.from functools import partial
gpandsql = partial(sqldf, env=globals())
gpandsql("SELECT * FROM data LIMIT 10")
Pandasql
uses sqlite as it's temporary backend. You can perform all kinds of SQL operations sqlite supports. This includes GROUPBY, WHERE, and different kinds of JOINS. from functools import partial
from pandasql import sqldf
from sklearn import datasets
data = datasets.load_wine(as_frame=True)['data']
target = datasets.load_wine(as_frame=True)['target']
gpandsql = partial(sqldf, env=globals())
gpandsql("""
SELECT *
FROM data d
LEFT JOIN target t
WHERE
d.ash = 1.36
LIMIT 5"""
).describe()