26
loading...
This website collects cookies to deliver better user experience
import colorcet
import datashader as ds
import pandas as pd
df = pd.read_sql_table('listing', conn)
cvs = ds.Canvas(plot_width=800, plot_height=500)
agg = cvs.points(df, x='longitude', y='latitude')
img = ds.tf.shade(agg, cmap=colorcet.fire, how='log')
CREATE VIEW listing_melbourne AS
SELECT *
FROM listing
WHERE (longitude BETWEEN 144.2589302195 AND 145.4866523875)
AND (latitude BETWEEN -38.517433861 AND -37.5733096779);
listing_melbourne
table like that:df = pd.read_sql_table('listing_melbourne', conn)
cvs = ds.Canvas(plot_width=800, plot_height=500)
agg = cvs.points(df, x='longitude', y='latitude')
img = ds.tf.shade(agg, cmap=colorcet.fire, how='log')
SELECT region_parent_name, COUNT(*) AS listing_count
FROM listing
WHERE region_parent_name IN ('New South Wales', 'Victoria','Queensland','Western Australia','South Australia','Tasmania','Australian Capital Territory', 'Northern Territory')
GROUP BY region_parent_name
ORDER BY listing_count DESC
SELECT region_parent_name as region, COUNT(*) AS listing_count
FROM listing
WHERE region_parent_name IN ('New South Wales', 'Victoria','Queensland','Western Australia','South Australia','Tasmania','Australian Capital Territory', 'Northern Territory')
GROUP BY region_parent_name
UNION ALL
SELECT 'New Zealand' as region, COUNT(*) AS listing_count
FROM listing
WHERE region_parent_name NOT IN ('New South Wales', 'Victoria','Queensland','Western Australia','South Australia','Tasmania','Australian Capital Territory', 'Northern Territory')
ORDER BY listing_count DESC
neighbourhood
, host_neighbourhood
, host_location
. The problem here is that a lot of these columns have NULL or empty values. For example: SELECT COUNT(*) FROM "listing" where "neighbourhood" !=''
will only return 124,363 rows, or 64% of all the listings. Coalescing these three fields also won't work properly.suburbs
, I will run the following query:SELECT
suburbs.sa3_name21 AS suburb_name,
COUNT(listing.*) AS listing_count
FROM listing
JOIN suburbs
ON ST_Contains(ST_SetSRID(
suburbs.geom::GEOMETRY,
7844
), ST_SetSRID(ST_MakePoint(listing.longitude, listing.latitude), 7844))
GROUP BY suburb_name
ORDER BY listing_count DESC
SELECT bedrooms, COUNT(*) AS listing_count
FROM airdata.listing
WHERE bedrooms>0 AND bedrooms<15
GROUP BY bedrooms
ORDER BY listing_count DESC
WITH count_by_host AS (
SELECT host_id, COUNT(*) AS listing_count
FROM airdata.listing
GROUP BY host_id
)
SELECT listing_count, COUNT(count_by_host.host_id) as host_count
FROM count_by_host
GROUP BY listing_count
ORDER BY listing_count DESC
calendar
table, which has an enormous 70M records. The queries will be running for some time, so it's a good thing to increase statement timeout in the Postgres settings.id=372
) we have:WITH suburb_listings AS (
SELECT lst.id as lstid FROM "airdata"."listing" AS lst
JOIN "airdata"."suburbs" AS subs
ON ST_Contains(ST_SetSRID(
subs.geom::GEOMETRY,
7844
), ST_SetSRID(ST_MakePoint(lst.longitude, lst.latitude), 7844))
WHERE subs.id = 372
)
SELECT DISTINCT(cal.listing_id) FROM "airdata"."calendar" AS cal
WHERE cal.listing_id IN (SELECT lstid FROM suburb_listings)
AND (cal.date BETWEEN '2021-12-20 00:00:00+00' AND '2021-12-28 00:00:00+00')
AND cal.available=TRUE
347
listings. So out of 2350
total listings in Bondi Beach area, only 347 (< 15%) have some availability during the Christmas period. Let's try to visualise it as a chart. In order to do that, I will iterate the above query over all suburbs in the suburbs
table, and visualise like that: