32
loading...
This website collects cookies to deliver better user experience
last_updated
field for records. For example, changes to computed fields do not affect this timestamp. So, we can't poll for changes to them.public
public_swap
public
schema. The public_swap
or "swap schema" is where the sync takes place during a rebuild.public_swap
as defined by the Airtable base's current schema. We then pull all the records for each table from Airtable and insert them into the tables in the swap schema.public
and "promote" every table in public_swap
to public
.public
schema suddenly receives all updates from Airtable, all at once, at discrete intervals.public
, only rebuild on schema changespublic
. This meant we were constantly marking Postgres tuples for deletion, keeping the vacuuming functionality very busy.public
schema. To pull this off, we needed to incorporate a few changes.public
schema. Here's an example of what the upsert statement looked like:insert into public.products (id,created_time,name,size,color)
values $1, $2, $3, $4, $5
on conflict (id) do update set
id=excluded.id, created_time=excluded.created_time, name=excluded.name, size=excluded.size, color=excluded.color
where (created_time, name, size, color) is distinct from (excluded.created_time, excluded.name, excluded.size, excluded.color)
on conflict (id)
is what switches Postgres from performing an insert to performing an update. When updating, the set
clause is used instead. The set
clause here is just a re-iteration of the mapping that the insert
clause makes between columns and their values.where ... is distinct from
is a key clause. Without it, the upsert would perform a write operation for every single row in the upsert clause, regardless if there were any changes. This trades a write-heavy characteristic for a read-heavy characteristic, which is more efficient for Postgres. Furthermore, for customers using replication slots, this means we'll only generate a WAL entry when something has actually been created/updated.public
. But, we still need to trigger a full rebuild if the base's schema is modified in some way (eg a table is added or a column is changed).public
. If the hashes are different, we kick off a full rebuild.%{ "record_id" => "record_hash" }
, where record_hash
is an MD5 hash of all the key/value pairs of a given record. When we request 100 records from Airtable, we can compare their latest hashes with what we have in memory. Then, we only perform an upsert of the records that have new hashes. After Postgres confirms the write succeeded, we write the latest hashes to our in-memory cache.PATCH https://proxy.syncinc.so/api.airtable.com/v0/appF0qbTS1QiA025N/Users
public
: We first reduced vacuuming by syncing directly to public
and keeping tables between syncs.where ... is distinct from
. This traded high write IOPs for high read IOPs and got rid of our vacuuming problem. We still had high network.