Though, strictly speaking, using arrays in relational databases is kind of not correct, still we use arrays, especially in stored procedures.
Actually PostgreSQL provides us with quite an number of nice array manipulation functions. That can be used at least starting from PostgreSQL 8.0. But what about set operations. How to sort an array in Postgres? How to merge arrays without repetitions? How to remove duplicate elements keeping the order of the appearance in the original array?
If you are working with the integer based arrays, there is quite an old and effective module
intarray that allows us to perform some basic set operations on the integer based arrays.
But what to do with not integer based arrays? Or if we want to do some relatively complicated operation on integer bases array? SQL is a set manipulation language, so that means that actually, we can use internal SQL mechanics to manipulate, merge and sort arrays. But for that, first we have to be able to convert an array into kind of a table, so that we can use usual SQL manipulation on it's elements.
unnest() are the most important functions in our case. Unfortunately
unnest() are only available starting from PostgreSQL 8.4, but one can easily create a small helper
unnest() function using
generate_series(), that will be not as efficient as the build in one, but still it is quite good trade off.
CREATE OR REPLACE FUNCTION unnest(a anyarray) RETURNS SETOF anyelement AS $BODY$ /** * Unnests given array into a table */ /* -- testing select unnest(ARRAY[1,2,3,4,5]) as i select * from unnest(ARRAY[1,2,3,4,5]) as u(i) */ select ($1)[s.i] from generate_series( array_lower($1, 1), array_upper($1, 1 ) ) as s(i); $BODY$ LANGUAGE 'sql' IMMUTABLE STRICT;
A good thing about
unnest()written in SQL and not PL/pgSQL is that one can use such a set retuning function directly after
SELECTnot necessarily pushing it in the list of used tables after
FROMclause. The reason was explained by Tom Lane some time back in one of the newsgroup threads and is related to the implementation drawbacks of the PL/pgSQL.
Ok, so now I will simply give several examples of how to use these functions.
|Merge 2 or more text arrays excluding duplicates:|
select ARRAY( select unnest(ARRAY[ 'a', 'b', 'c' ]) union select unnest(ARRAY[ 'c', 'd', 'e' ]) )
UNION ALLinstead, we are doing just the same as
array_cat(), but slower :)
UNIONactually sorts the elements of the result set fist, to eliminate the duplicates, so the resulting array will be sorted usually. But one cannot relay on that behavior and if you really need a sorted array on the output, you have to use
|Merge 2 or more text arrays excluding duplicates and sorting elements:|
select ARRAY( select unnest(ARRAY[ 'a', 'b', 'c' ]) as e union select unnest(ARRAY[ 'c', 'd', 'e' ]) as e order by e )
|Trim or rewrite text array elements:|
select ARRAY( select btrim(lower(regexp_replace(unnest(ARRAY['a a','B b','c C ']), $$\s+$$, ' ', 'g') ) ) as e )
By now, with this syntax we cannot actually filter some elements out of the original array.
|Filter elements from an array depending on element properties:|
select ARRAY( select a.e from unnest(ARRAY[0,4,2,5,4,1,6,9,8,7]) as a(e) where a.e % 2 = 0 )
FROMlist to be able to name the virtual table, that
unnest()creates as well as name the element column so that we can reference it in the where clause. Again there are not guaranties here that our array will be contain elements in the same order that it was in the original one.
The problem is that all these queries do not know anything about the original position of the elements in the original array. To solve this problem we need element indexes to be visible inside the query. One can do it using a plain old
generate_series()or not so old
generate_subscripts()(one actually should prefer the later one if you are on the PostgreSQL 8.4+) to unnest our original array. You cannot use the
unnest()directly as there is not way to determine the position or index other then using window functions like
row_number(), but this is much much slower on practice.
|Filtering elements from an array depending on element position (index):|
select ARRAY( select (ARRAY[0,4,2,5,4,1,6,9,8,7])[s.i] from generate_series(array_lower(ARRAY[0,4,2,5,4,1,6,9,8,7], 1), array_upper(ARRAY[0,4,2,5,4,1,6,9,8,7], 1) ) as s(i) where s.i % 2 = 0 order by s.i )
generate_subscripts()shows a small problem actually. We have to push the original array at least 2 times in the query. Inside a PL/pgSQL code the arrays are usually replaced with the variable names, and the queries do not look so bulky there, but there is a possibility to create an additional helper function that will unnest an array returning not only a element itself, but also it's index in the original array:
CREATE OR REPLACE FUNCTION array_enumerate(IN a anyarray, OUT i integer, OUT e anyelement) RETURNS SETOF record AS $BODY$ /** * Unnests array into a table together with element indexes */ /* -- testing select * from array_enumerate(ARRAY['a', 'b', 'c']) as u */ select s.i, ($1)[s.i] from generate_series( array_lower($1, 1), array_upper($1, 1 ) ) as s(i) $BODY$ LANGUAGE 'sql' IMMUTABLE STRICT;on the PostgreSQL versions after 8.4 one can write it using
CREATE OR REPLACE FUNCTION array_enumerate(IN a anyarray, OUT i integer, OUT e anyelement) RETURNS SETOF record AS $BODY$ /** * Unnests array into a table together with element indexes */ /* -- testing select * from array_enumerate(ARRAY['a', 'b', 'c']) as u */ select s.i, ($1)[s.i] from generate_subscripts( $1, 1 ) as s(i) $BODY$ LANGUAGE 'sql' IMMUTABLE STRICT;Using
generate_subscripts()should be preferable as one can create some crazy array with elements having noncontinuous indexes actually. And it should be also a little bit faster, then
This actually gives us a possibility to rewrite previous query like this:
select ARRAY( select s.e from utils.array_enumerate(ARRAY[0,4,2,5,4,1,6,9,8,7]) as s(i,e) where s.i % 2 = 0 order by s.i )
|Remove duplicate elements from array keeping the element appearance order:|
select ARRAY( select s.e from utils.array_enumerate(ARRAY[0,4,2,5,4,1,6,9,8,7]) as s(i,e) group by s.e order by min(s.i) )
array_to_string()methods, or on newer versions of PostgreSQL
regexp_split_to_array()to remove the repeating space or punctuation characters during splitting directly.
Practically everywhere in the queries, one could use the
array_agg()to generate the resulting arrays or, in the coming 9.0, a very efficient
string_agg()to generate strings directly, without creating preliminary arrays and then using
On my tests comparing the performance of the
array_agg()I could not find any significant performance difference.