52
loading...
This website collects cookies to deliver better user experience
CREATE TABLE posts_by_user(
user_id bigint,
post_id bigint generated always as identity,
group_ids bigint[] null,
tag_ids bigint[] null,
content text null,
created_date timestamptz,
PRIMARY KEY (user_id, created_date, post_id),
UNIQUE (user_id, post_id)
);
PRIMARY KEY (user_id HASH, created_date ASC, post_id ASC)
. user_id, post_id
and I enforce it with a UNIQUE constraint. But for the primary key, I'm adding the date. The drawback of adding created_date
in primary key is full rewrite of the whole document in case the created_date
is updated, which is probably not the case here. The advantage is to allow fast access to a time range when looking at one user posts. This is something to decide when knowing all access patterns. For YugabyteDB, it would be better to set the sharding options explicitly, with a descending order: PRIMARY KEY (user_id HASH, created_date DESC post_id ASC)
.tag_id
or by group_id
to fill the feed list. With this single-table design, I can create GIN indexes on those arrays:create index posts_by_user_group_ids on posts_by_user using gin (group_ids);
create index posts_by_user_tag_ids on posts_by_user using gin (tag_ids);
created_date
in them. This is a limitation from PostgreSQL. Trying to add it in the INCLUDING clause will raise ERROR: access method "gin" does not support included columns
and trying to add it in the indexed columns will raise ERROR: data type timestamp with time zone has no default operator class for access method "gin"
.created_date
. However, there is still a place where GIN index may help, for text search on the post content. I described this in a previous post.CREATE TABLE posts_by_tag(
tag_id bigint not null,
user_id bigint not null,
post_id bigint not null,
created_date timestamptz not null,
PRIMARY KEY (tag_id, created_date, user_id, post_id),
UNIQUE (tag_id, user_id, post_id),
FOREIGN key (user_id, created_date, post_id) references posts_by_user (user_id, created_date, post_id) on delete cascade
);
CREATE TABLE posts_by_group(
group_id bigint not null,
user_id bigint not null,
post_id bigint not null,
created_date timestamptz not null,
PRIMARY KEY (group_id,created_date, user_id,post_id),
UNIQUE (group_id, user_id, post_id),
FOREIGN key (user_id, created_date, post_id) references posts_by_user (user_id, created_date, post_id) on delete cascade
);
created_date
. This is on purpose to be able to filter on a time range from this table, before doing to read the main table to get the content. This is why I didn't create the GIN indexes which didn't allow it. I've also declared the unicity of columns without the created_date
to enforce data integrity. My goal here is to consider these posts_by_group
and posts_by_tag
tables like a secondary index. Exactly like the GIN index I wanted to build, but with additional columns to match the selectivity of the uses cases. I'll not update those additional tables directly but they will be automatically maintained when updating the main posts_by_user
table. And this is where I need a trigger.posts_by_user
. tag_ids
and group_ids
arrays there to get everything in one document when accessing to a post by its primary key. This also eases the logic to maintain the secondary tables posts_by_tag
and posts_by_group
because any DML on the main table will know the old
and new
value. Thanks to this, there is no need for other indexes on those secondary tables. The logic is simple, and can be optimized if needed, here I delete the entries for the old values and insert those for the new ones. This is exactly how a secondary index works, but here I'm coding it in a procedure [Please read the comments, there's a much better version thanks to Pavel]:CREATE OR REPLACE FUNCTION posts_by_user_dml()
RETURNS TRIGGER AS
$$
declare
loop_tag_id bigint;
loop_group_id bigint;
begin
if old.tag_ids is not null then
-- delete entries for old values
foreach loop_tag_id in array old.tag_ids loop
delete from posts_by_tag t
where t.tag_id = loop_tag_id
and t.user_id= old.user_id
and t.post_id= old.post_id;
end loop;
end if;
if new.tag_ids is not null then
-- insert entries for new values
foreach loop_tag_id in array new.tag_ids loop
insert into posts_by_tag(tag_id,user_id, post_id,created_date)
values (loop_tag_id,new.user_id, new.post_id, new.created_date);
end loop;
end if;
if old.group_ids is not null then
-- delete entries for old values
foreach loop_group_id in array old.group_ids loop
delete from posts_by_group t
where t.group_id = loop_group_id
and t.user_id= old.user_id
and t.post_id= old.post_id;
end loop;
end if;
if new.group_ids is not null then
-- insert entries for new values
foreach loop_group_id in array new.group_ids loop
insert into posts_by_group(group_id,user_id, post_id,created_date)
values (loop_group_id,new.user_id, new.post_id, new.created_date);
end loop;
end if;
return new;
end;
$$
LANGUAGE plpgsql;
old
and new
records from a trigger. Here is the declaration of the trigger:CREATE TRIGGER posts_by_user_dml
AFTER insert or update or delete ON posts_by_user
FOR EACH ROW
EXECUTE PROCEDURE posts_by_user_dml();
delete from posts_by_user;
insert into posts_by_user (user_id, group_ids, tag_ids, content, created_date)
values (1,array[1,2,3],array[1,2,3],'x',date'2021-01-01');
insert into posts_by_user (user_id, group_ids, tag_ids, content, created_date)
values (2,array[1,2,3],array[]::bigint[],'x',date'2021-01-01');
update posts_by_user set tag_ids=tag_ids||'{4}' where user_id=1;
with join_secondary as (
select *,
array(
SELECT tag_id
from posts_by_tag t
where t.user_id=p.user_id
and t.post_id=p.post_id
and t.created_date=p.created_date
) tag_ids_secondary,
array(
SELECT group_id
from posts_by_group g
where g.user_id=p.user_id
and g.post_id=p.post_id
and g.created_date=p.created_date
) group_ids_secondary
from posts_by_user p)
select tag_ids<@tag_ids_secondary and group_ids<@group_ids_secondary
and tag_ids@>tag_ids_secondary and group_ids@>group_ids_secondary
"valid?",* from join_secondary;
yugabyte-# "valid?",* from join_secondary;
valid? | user_id | post_id | group_ids | tag_ids | content | created_date | tag_ids_secondary | group_ids_secondary
--------+---------+---------+-----------+-----------+---------+-------------------------------+-------------------+---------------------
t | 2 | 2 | {1,2,3} | {} | x | 2021-12-20 10:18:56.922046+00 | {} | {3,2,1}
t | 1 | 1 | {1,2,3} | {1,2,3,4} | x | 2021-01-01 00:00:00+00 | {3,4,2,1} | {3,2,1}
(2 rows)
truncate posts_by_user cascade;
with
users as (select generate_series(1,10) user_id),
posts as (select generate_series(1,100) post_id),
ids as (select distinct (1000*random()*generate_series(1,20))::int id)
insert into posts_by_user (user_id,group_ids,tag_ids,content,created_date)
select user_id
,ARRAY(select id from ids) group_ids
,ARRAY(select id from ids) tag_ids
,'Lorem ipsum...' as content
, now() - random() * interval '10 year'
from users,posts;
\timing on
\watch 0.1
\watch
and in a small server this is about 8 seconds for 1000 posts, which means about 8 milliseconds per post:Time: 10604.948 ms (00:10.605)
Mon 20 Dec 2021 11:29:17 AM GMT (every 0.1s)
INSERT 0 1000
Time: 8533.703 ms (00:08.534)
Mon 20 Dec 2021 11:29:25 AM GMT (every 0.1s)
INSERT 0 1000
Time: 8320.473 ms (00:08.320)
Mon 20 Dec 2021 11:29:34 AM GMT (every 0.1s)
INSERT 0 1000
yugabyte=# select to_hex(oid::int) as "0xOID", relname,relkind, reltuples from (select oid,relname, relkind,relnamespace,reltuples from pg_class) c natural join (select oid relnamespace from pg_namespace where nspname='public') n where relname like 'posts%' order by 1;
0xOID | relname | relkind | reltuples
-------+---------------------------------------------+---------+-------------
433f | posts_by_user_post_id_seq | S | 1
4341 | posts_by_user | r | 4.644e+07
4344 | posts_by_user_pkey | i | 4.644e+07
4346 | posts_by_tag | r | 4.39074e+08
4349 | posts_by_tag_pkey | i | 4.39074e+08
4116 | posts_by_tag_tag_id_user_id_post_id_key | i | 4.39074e+08
4352 | posts_by_group | r | 5.40224e+08
4355 | posts_by_group_pkey | i | 5.40224e+08
4357 | posts_by_group_group_id_user_id_post_id_key | i | 5.40224e+08
430a | posts_by_user_group_ids | i | 4.644e+07
4400 | posts_by_user_tag_ids | i | 4.644e+07
yugabyte=# set log_statement='all';
SET
yugabyte=# set yb_debug_log_docdb_requests=true;
SET
yugabyte=# select pg_current_logfile();
pg_current_logfile
--------------------------------------------------------------------------
/home/opc/var/data/yb-data/tserver/logs/postgresql-2021-12-20_000000.log
(1 row)
yugabyte=# \! grep --color=auto -E '^|^.*(pg_session.cc|STATEMENT:)|(PGS|Y)QL[_A-Z]+|[0-9a-f]{4}"|Flushing|Applying|Buffering' /home/opc/var/data/yb-data/tserver/logs/postgresql-2021-12-20_000000.log
yugabyte=# 2021-12-20 10:21:09.424 UTC [17389] LOG: statement: insert into posts_by_user (user_id, group_ids, tag_ids, content, created_date)
values (2,array[1,2,3],array[]::bigint[],'x',date'2021-01-01');
I1220 10:21:09.425124 17389 pg_session.cc:437] Applying operation: PGSQL_READ client: YQL_CLIENT_PGSQL stmt_id: 31771648 schema_version: 0 targets { column_id: 1 } targets { column_id: 2 } targets { column_id: 3 } targets { column_id: 4 } targets { column_id: 5 } targets { column_id: 6 } targets { column_id: 7 } targets { column_id: -8 } column_refs { ids: 1 ids: 2 ids: 3 ids: 4 ids: 5 ids: 6 ids: 7 } is_aggregate: false limit: 1024 return_paging_state: true table_id: "000033e1000030008000000000000a30" index_request { targets { column_id: 3 } column_refs { ids: 3 } is_forward_scan: true is_aggregate: false range_column_values { value { uint32_value: 1259 } } range_column_values { value { uint32_value: 17217 } } range_column_values { value { int32_value: 2 } } table_id: "000033e1000030008000000000000a72" }
I1220 10:21:09.426597 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30364144 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004341" schema_version: 0 ybctid_column_value { value { binary_value: "G\317\252I\200\000\000\000\000\000\000\002!I\200\000\000\000\000\000\000\003!" } } column_values { column_id: 2 expr { value { binary_value: "\001\000\000\000\000\000\000\000\024\000\000\000\003\000\000\000\001\000\000\000\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000" } } } column_values { column_id: 3 expr { value { binary_value: "\000\000\000\000\000\000\000\000\024\000\000\000" } } } column_values { column_id: 4 expr { value { string_value: "x" } } } column_values { column_id: 5 expr { value { int64_value: 662774400000000 } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426748 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30530672 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004352" schema_version: 1 ybctid_column_value { value { binary_value: "G\355\251I\200\000\000\000\000\000\000\001!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426813 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30441232 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004357" schema_version: 0 partition_column_values { value { int64_value: 1 } } range_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 3 } } range_column_values { value { } } column_values { column_id: 4 expr { value { binary_value: "G\355\251I\200\000\000\000\000\000\000\001!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426929 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30438432 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004352" schema_version: 1 ybctid_column_value { value { binary_value: "G\317\252I\200\000\000\000\000\000\000\002!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426992 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30436192 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004357" schema_version: 0 partition_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 3 } } range_column_values { value { } } column_values { column_id: 4 expr { value { binary_value: "G\317\252I\200\000\000\000\000\000\000\002!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.427089 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30294576 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004352" schema_version: 1 ybctid_column_value { value { binary_value: "G\010DI\200\000\000\000\000\000\000\003!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.427141 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 32457600 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004357" schema_version: 0 partition_column_values { value { int64_value: 3 } } range_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 3 } } range_column_values { value { } } column_values { column_id: 4 expr { value { binary_value: "G\010DI\200\000\000\000\000\000\000\003!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.430510 17389 pg_session.cc:949] Flushing buffered operations, using transactional session (num ops: 7)
yugabyte=# explain analyze
select posts_by_user.*
from posts_by_user where user_id=1
and created_date > now() - 2*interval '1 day'
order by created_date desc;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan Backward using posts_by_user_pkey on posts_by_user (cost=0.00..17.00 rows=100 width=120) (actual time=3.589..84.544 rows=2173 loops=1)
Index Cond: ((user_id = 1) AND (created_date > (now() - '2 days'::interval)))
Planning Time: 0.081 ms
Execution Time: 85.111 ms
(4 rows)
created_date
as DESC instead of ASC (I didn't to keep the PostgreSQL compatibility for the blog post). On PostgreSQL the same execution plan will have to read rows from the heap table, probably scattered, as one user doesn't post all at the same time.yugabyte=# explain analyze
select posts_by_user.*
from posts_by_user
join posts_by_tag
using(user_id, created_date, post_id)
where posts_by_tag.created_date
> now() - interval '1 month'
and tag_id =1
order by created_date desc limit 100
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..28.40 rows=100 width=120) (actual time=2.691..70.550 rows=100 loops=1)
-> Nested Loop (cost=0.00..28.40 rows=100 width=120) (actual time=2.689..70.513 rows=100 loops=1)
-> Index Scan Backward using posts_by_tag_pkey on posts_by_tag (cost=0.00..17.00 rows=100 width=24) (actual time=1.830..1.971 rows=100 loops=1)
Index Cond: ((tag_id = 1) AND (created_date > (now() - '1 mon'::interval)))
-> Index Scan using posts_by_user_pkey on posts_by_user (cost=0.00..0.11 rows=1 width=120) (actual time=0.670..0.670 rows=1 loops=100)
Index Cond: ((user_id = posts_by_tag.user_id) AND (created_date = posts_by_tag.created_date) AND (post_id = posts_by_tag.post_id))
Planning Time: 0.269 ms
Execution Time: 71.185 ms
I1220 11:01:53.145462 17389 pg_session.cc:437] Applying operation: PGSQL_READ client: YQL_CLIENT_PGSQL stmt_id: 33622848 schema_version: 1 partition_column_values { value { int64_value: 1 } } targets { column_id: 0 } targets { column_id: 2 } targets { column_id: 3 } targets { column_id: 1 } targets { column_id: -8 } column_refs { ids: 0 ids: 1 ids: 2 ids: 3 } is_forward_scan: true is_aggregate: false limit: 1024 return_paging_state: true ysql_catalog_version: 308 table_id: "000033e1000030008000000000004382" condition_expr { condition { op: QL_OP_AND operands { condition { op: QL_OP_GREATER_THAN_EQUAL operands { column_id: 1 } operands { value { int64_value: 692708513144764 } } } } } }
I1220 11:01:53.145519 17389 pg_session.cc:437] Applying operation: PGSQL_READ client: YQL_CLIENT_PGSQL stmt_id: 33622848 schema_version: 1 partition_column_values { value { int64_value: 2 } } targets { column_id: 0 } targets { column_id: 2 } targets { column_id: 3 } targets { column_id: 1 } targets { column_id: -8 } column_refs { ids: 0 ids: 1 ids: 2 ids: 3 } is_forward_scan: true is_aggregate: false limit: 1024 return_paging_state: true ysql_catalog_version: 308 table_id: "000033e1000030008000000000004382" condition_expr { condition { op: QL_OP_AND operands { condition { op: QL_OP_GREATER_THAN_EQUAL operands { column_id: 1 } operands { value { int64_value: 692708513144764 } } } } } }
yugabyte=# explain analyze
select *
from posts_by_user
where created_date > now() - interval '1 month'
and tag_ids @>'{1}'
order by created_date desc limit 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
------
Limit (cost=52.11..52.36 rows=100 width=120) (actual time=289549.350..289549.380 rows=100 loops=1)
-> Sort (cost=52.11..54.61 rows=1000 width=120) (actual time=289549.348..289549.363 rows=100 loops=1)
Sort Key: created_date DESC
Sort Method: top-N heapsort Memory: 128kB
-> Index Scan using posts_by_user_tag_ids on posts_by_user (cost=4.00..13.89 rows=1000 width=120) (actual time=8720.517..289510.325 rows=250376 loo
ps=1)
Index Cond: (tag_ids @> '{1}'::bigint[])
Filter: (created_date > (now() - '1 mon'::interval))
Rows Removed by Filter: 2845624
Planning Time: 0.093 ms
Execution Time: 289550.383 ms
(10 rows)
created_date
, this has to be filtered out later (Rows Removed by Filter
). Does it matter? Test it with your use case and your data. And also with your YugabyteDB version. I'm on 2.11 here and maybe one day the GIN index defined can filter on the date. Because, being part of the primary key, it is possible to get it from the index entry. This is an advantage of YugabyteDB storing the tables in the primary key LSM tree: the secondary index reference the row by they primary key. The possibility to use it to filter may be related to #10169. On standard PostgreSQL you will see another join method for this one, but let's keep it for the next post.