Valentine's tech log: Table partitioning automation triggers in PostgreSQL

Table partitioning is described in the Postgres documentation Partitioning chapter. Unfortunately until partition data distribution is done automatically in some future version of the Postgres we need some triggers to handle partitioning automatically.

Here is one such example trigger script, that can be useful when developing a tailor made one:

CREATE OR REPLACE FUNCTION myschema.ruled_indexed_partition_multiplexer_by_view_day()
RETURNS TRIGGER AS 
$BODY$
-- $Header: $
/**
* This is a common trigger function that can be used to partition any table 
* that has a VIEW_DAY partitioning column.
* This function will only work on BEFORE INSERT row level triggers.
* If the first parameter is specified, it can only be 'week' or 'month' 
* to indicate the needed partitioning schedule.
*/
DECLARE
  schema_name_prefix CONSTANT text := quote_ident( TG_TABLE_SCHEMA ) || '.';
  table_name_prefix CONSTANT text := TG_TABLE_NAME || '_';
  needed_month_table_name text;
  partitioning_interval CONSTANT text := coalesce( TG_ARGV[0], 'week' );
  s text;
BEGIN
  if not ( TG_WHEN = 'BEFORE' and TG_LEVEL = 'ROW' and TG_OP = 'INSERT' ) then 
    raise exception 'This trigger function can only be used with BEFORE INSERT row level triggers!';
  end if;
  -- raise info 'starting partition_multiplexer for %.%', TG_TABLE_SCHEMA, TG_TABLE_NAME;
  if new.view_day is null then 
    raise exception 'partitioning column "view_day" cannot be NULL';
  end if;

  needed_month_table_name := 
myschema_partitions.need_ruled_indexed_partition_table(
TG_TABLE_SCHEMA, 
"name" 'myschema_partitions', 
TG_TABLE_NAME, 
"name" 'view_day', 
new.view_day, partitioning_interval );

  -- raise info 'needed_month_table_name is %', needed_month_table_name;
  select new into s;
  s := $$INSERT INTO myschema_partitions.$$ || needed_month_table_name || 
  $$ SELECT ($$ || quote_literal( s ) || $$::$$ || 
  schema_name_prefix || TG_TABLE_NAME || $$).*  $$;
  -- raise info 'executing statement [%]', s;
  EXECUTE s;
  RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql;

The trigger function that will use a VIEW_DAY table column of type DATE and can be assigned to a root table with the following command:

CREATE TRIGGER mytable_multiplexer_trigger
BEFORE INSERT
ON myschema.mytable
FOR EACH ROW
EXECUTE PROCEDURE myschema.ruled_indexed_partition_multiplexer_by_view_day('week');

The function uses an additional helper function that creates a needed partition table when it is needed and creates a INSTEAD INSERT rule, that will prevent the system from calling this trigger (that is actually doing at least one catalog look-up for every inserted record) again, if the table already exists. The rules probably should be dropped by hand for the older partitions, so the planner does not have to check too many rule conditions when rewriting the original insert statement, trying to insert into the root table (I suppose here, that we actively insert only into some recent table partitions and when the rule does not exist and we still have to insert something in to an old table the trigger will still work and choose a needed one).

Here is the helper function:

CREATE OR REPLACE FUNCTION myschema_partitions.need_ruled_indexed_partition_table
(TG_TABLE_SCHEMA "name", 
TG_ARCHIVE_SCHEMA "name", 
TG_TABLE_NAME "name", 
partitioning_column_name "name", 
needed_partitioning_date date, 
partitioning_interval text)
RETURNS "name" AS 
$BODY$
/**
* This stored procedure checks if the needed partitioning table exists, as if not,
* it creates it. 
* It also creates all the indexes, that exist on the parent table renaming it
* according to the new partition table name. 
* 
* Be careful about the maximum length of the object name. 
* 
* It is usually to be called from the trigger function like 
* myschema.ruled_indexed_partition_multiplexer_by_view_day()
*
* @param TG_TABLE_SCHEMA - the source (shallow) table schema name
* @param TG_ARCHIVE_SCHEMA - name of the schema, where the partitioning table should be created
* @param TG_TABLE_NAME - the source (shallow) table name
* @param partitioning_column_name - the name of the column, that is used to perform the partitioning (this column should exist in the source table) 
* @param needed_partitioning_date - the value of the partitioning column, this value is used to determine the name of the needed partitioning table
* @param partitioning_interval - partitioning interval. can be 'week' or 'month'
*
* @author Valentine Gogichashvili
*/
DECLARE
partition_beginning_date CONSTANT date := date_trunc( partitioning_interval, needed_partitioning_date )::date;
needed_partition_table_name "name";
BEGIN
-- raise info 'starting partition_multiplexer for %.%, needed table is %, partitioning date is %', TG_TABLE_SCHEMA, TG_TABLE_NAME, needed_partition_table_name, needed_partitioning_date;
-- calculate the name of the needed table
-- we start with the beginning of the week (week partitioning)
needed_partition_table_name := TG_TABLE_NAME || 
to_char( partition_beginning_date, '_YYYYMMDD_') || partitioning_interval;

-- check that the needed table exists on the database
perform 1 
from pg_class, pg_namespace
where relnamespace = pg_namespace.oid 
and relkind = 'r'::"char"
and relname = needed_partition_table_name
and nspname = TG_ARCHIVE_SCHEMA;

if not found then 
DECLARE
archive_schema_name_prefix CONSTANT text := quote_ident( TG_ARCHIVE_SCHEMA ) || '.';
base_schema_name_prefix CONSTANT text := quote_ident( TG_TABLE_SCHEMA ) || '.';
base_table_name CONSTANT text := base_schema_name_prefix || quote_ident( TG_TABLE_NAME );
quoted_column_name CONSTANT text := quote_ident( partitioning_column_name );
partition_beginning_date CONSTANT date := date_trunc( partitioning_interval, needed_partitioning_date )::date;
next_partition_beginning_date date := date_trunc( partitioning_interval, needed_partitioning_date + ( '1 ' || partitioning_interval )::interval )::date;
quoted_needed_table_name CONSTANT text := archive_schema_name_prefix || quote_ident ( needed_partition_table_name );
quoted_rule_name CONSTANT text := quote_ident( 'rule_' || TG_TABLE_NAME || to_char( partition_beginning_date, '_YYYYMMDD') );
base_table_owner name;
s text;
a text;
parent_index_name text;
parent_index_has_valid_name boolean;
BEGIN
SET search_path = myschema_partitions, myschema, public;
-- we have to create a needed table now
-- check if the partitioning date has been passed correctly
if needed_partitioning_date is null then 
raise exception 'partitioning_date should not be NULL';
end if;
-- check if the partitioning interval is correct
-- we check it here and not in the trigger function to improve the performance
if partitioning_interval not in ( 'week', 'month' ) then 
raise exception $$partitioning_interval is set to [%] and should be 'week' or 'month'$$, partitioning_interval;
end if;
-- check for the base table and extract the table owner
select pg_roles.rolname into base_table_owner
from pg_class, pg_namespace, pg_roles
where relnamespace = pg_namespace.oid 
and relkind = 'r'::"char"
and relowner = pg_roles.oid
and relname = TG_TABLE_NAME
and nspname = TG_TABLE_SCHEMA;
if not found then 
raise exception 'cannot find base table %.%', TG_TABLE_SCHEMA, TG_TABLE_NAME;
end if;
-- now check that the base table contains the partitioning column
perform 1 from information_schema.columns where table_schema = TG_TABLE_SCHEMA and table_name = TG_TABLE_NAME and column_name = partitioning_column_name;
if not found then 
raise exception 'cannot find partitioning column % in the table %.%', quoted_column_name, TG_TABLE_SCHEMA, TG_TABLE_NAME;
end if;

s := $$
CREATE TABLE $$ || quoted_needed_table_name || $$ (
CHECK ( $$ || quoted_column_name || $$ >= DATE $$ || quote_literal( partition_beginning_date ) || $$ AND 
$$ || quoted_column_name || $$ < DATE $$ || quote_literal( next_partition_beginning_date ) || $$ )
) INHERITS ( $$ || base_table_name || $$ ); $$;
raise notice 'creating table as [%]', s;
EXECUTE s;

if coalesce(length(base_table_owner), 0) = 0 then 
raise exception 'base_table_owner is unknown';
end if;
s := $$
ALTER TABLE $$ || quoted_needed_table_name || 
$$ OWNER TO $$ || base_table_owner;
raise notice 'changing owner as [%]', s;
EXECUTE s;

-- extract all the indexes existing on the parent table and apply them to the newly created partition
for a, s, parent_index_name, parent_index_has_valid_name
in  SELECT CASE indisclustered WHEN TRUE THEN 'ALTER TABLE ' || needed_partition_table_name::text || ' CLUSTER ON ' || replace( i.relname, c.relname, needed_partition_table_name::text ) ELSE NULL END as clusterdef,
replace( pg_get_indexdef(i.oid), TG_TABLE_NAME::text, needed_partition_table_name::text ),
i.relname,
strpos( i.relname, TG_TABLE_NAME::text ) > 0
FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class i ON i.oid = x.indexrelid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_tablespace t ON t.oid = i.reltablespace
WHERE c.relkind = 'r'::"char" 
AND i.relkind = 'i'::"char"
AND n.nspname = TG_TABLE_SCHEMA
AND c.relname = TG_TABLE_NAME
loop
if parent_index_has_valid_name then 
if strpos( s, quote_ident( TG_TABLE_SCHEMA ) || '.' ) then 
raise info 'create index statement contains original schema name, removing it';
s := replace( s, quote_ident( TG_TABLE_SCHEMA ) || '.', '' );
end if;
raise notice 'creating index as [%]', s;
EXECUTE s;
if a is not null then 
if strpos( a, quote_ident( TG_TABLE_SCHEMA ) || '.' ) then 
raise info 'alter index statement contains original schema name, removing it';
a := replace( a, quote_ident( TG_TABLE_SCHEMA ) || '.', '' );
end if;
raise notice 'setting clustering as [%]', a;
EXECUTE a;
end if;
else 
raise exception 'parent index name [%] should contain the name of the parent table [%]', parent_index_name, TG_TABLE_NAME;
end if;
end loop;

-- now we create a rule, that will be assigned to the original table
s := $$
CREATE RULE $$ || quoted_rule_name || $$ AS
ON INSERT TO $$ || base_table_name || $$ 
WHERE ( $$ || quoted_column_name || $$ >= DATE $$ || quote_literal( partition_beginning_date ) || $$ AND 
$$ || quoted_column_name || $$ < DATE $$ || quote_literal( next_partition_beginning_date ) || $$ )
DO INSTEAD
INSERT INTO $$ || quoted_needed_table_name || $$ VALUES (NEW.*);$$;
-- raise notice 'creating a rule as [%]', s;
EXECUTE s;
END;
end if;
return needed_partition_table_name;
END;
$BODY$
LANGUAGE plpgsql strict volatile;

Note: when using inherited tables, to make real use of setting constraint_exclusion on, we have actually to use constant values for partition criteria checks. That means in practice, that we have to always construct SQL statements (not forgetting to use quote_ident() and quote_literal()) and then EXECUTE them (when writing PL/pgSQL code of course)

P.S.: Creation of the partitions on the fly will cause parallel transactions the fail on the moment of creation the tables, but this happens only at that moment and the client should be ready to retry the attempt in case of failure...

2 comments:

Sergey said...: Great article. Helped me a lot!
Thank you!; 03 January, 2009 17:46
Marko Tiikkaja said...: This example isn't safe and should NOT be used in production.

Creating partitions automatically in a trigger is a big no-no because of deadlocks. You should always create the partitions using some other method. For time-based partitions, cronjobs are handy.; 31 October, 2010 13:58

Valentine's tech log

2008-04-10

Table partitioning automation triggers in PostgreSQL

2 comments:

Subscribe Now: Google

Labels

Blog Archive