Valentine's tech log: 2011

2011-10-30

How to remove SIM Card PIN from your GSM/UMTS modem on Ubuntu (Linux)

If your GSM modem SIM Card is configured with a PIN a NetworkManager is constantly trying to ask for that PIN on every wake up... and this is quite annoying indeed. So the easy way to remove a PIN protection from your SIM Card under Ubuntu would be:

sudo apt-get install gsm-utils
sudo gsmctl -d /dev/ttyACM0 -o unlock sc all 1234

here 1234 is actually your SIM Card PIN to be removed.

2011-04-06

Index sizes depending on the type of the field being indexed

The sizes of indexes (and tables of cause) are influencing the look up speed directly (the smaller are the corresponding file system files, the faster one can scan it, not saying anything about the sizes, needed to be kept in memory caches)

So I did some simple experiment to demonstrate the effect of choosing different field types to be used as index fields to the size of the indexes and tables.

DROP DATABASE IF EXISTS eval;
CREATE DATABASE eval;
\c eval
*/
SET work_mem TO '128MB';
SET maintenance_work_mem TO '128MB';

DROP SCHEMA IF EXISTS eval_schema CASCADE;
CREATE SCHEMA eval_schema;

SET search_path to eval_schema;

CREATE TABLE eval_config ( table_name text, id_field_type text, id_field_expression text, row_count integer DEFAULT 100000 );
INSERT INTO eval_config VALUES 
( 'integer_short_id_table', 'integer', 's.i' ),
( 'integer_large_id_table', 'integer', 's.i * 123' ),
( 'bigint_short_id_table', 'bigint', 's.i' ),
( 'bigint_large_id_table', 'bigint', 's.i * 123' ),
( 'text_number_short_id_table', 'text', 's.i' ),
( 'text_number_large_id_table', 'text', 's.i * 123' ),
( 'numeric_short_id_table', 'numeric', 's.i' ),
( 'numeric_large_id_table', 'numeric', 's.i * 123' ),
( 'binary_md5_text_table', 'bytea', $$decode( md5( s.i::text || '-text-filler'), 'hex' ) $$ ),
( 'md5_text_table', 'text', $$md5( s.i::text || '-text-filler' )$$ );

CREATE VIEW eval_table_stats AS
  SELECT t.relname as "Table name",
       c.id_field_type as "Indexed field type",
       c.id_field_expression as "Expression",
       c.row_count as "Row count",
       s.stawidth as "Average id field width",
       pg_size_pretty(pg_table_size(t.oid)) as "Table size without index",
       pg_size_pretty(pg_indexes_size(t.oid)) as "Index size" /*,
       n_tup_ins,
       n_tup_upd,
       n_tup_del,
       n_tup_hot_upd,
       n_live_tup,
       n_dead_tup */
  FROM eval_config as c
  JOIN pg_class as t
    ON c.table_name = t.relname
   AND t.relkind = 'r'
  JOIN pg_namespace as n
    ON relnamespace = n.oid
   AND n.nspname = 'eval_schema'
  LEFT 
  JOIN pg_statistic as s
    ON s.starelid = t.oid
   AND s.staattnum = 1
  LEFT
  JOIN pg_stat_user_tables as w
    ON w.relid = t.oid;

DO $SQL$
DECLARE 
  config record;
BEGIN
  FOR config IN SELECT * FROM eval_config
  LOOP
    RAISE INFO 'Creating table %', quote_ident( config.table_name );
    EXECUTE $$
    CREATE TABLE $$ || quote_ident( config.table_name ) || $$
    ( id $$ || config.id_field_type || $$, data text );
    $$;
    RAISE INFO 'Filling table %', quote_ident( config.table_name );
    EXECUTE $$
    INSERT INTO $$ || quote_ident( config.table_name ) || $$
    SELECT ( $$ || config.id_field_expression || $$ )::$$ || config.id_field_type || $$, 'some filling data'
      FROM generate_series(1, $$ || config.row_count || $$) as s(i);
    $$;
    RAISE INFO 'Building index on %', quote_ident( config.table_name );
    EXECUTE $$
    CREATE INDEX ON $$ || quote_ident( config.table_name ) || $$ ( id );
    $$;
    RAISE INFO 'Analyzing table %', quote_ident( config.table_name );
    EXECUTE $$
    ANALYZE $$ || quote_ident( config.table_name ) || $$;
    $$;
  END LOOP;
END;
$SQL$;

SELECT * FROM eval_table_stats;

DO $SQL$
DECLARE 
  config record;
BEGIN
  FOR config IN SELECT * FROM eval_config
  LOOP
    RAISE INFO 'Bloating table % (phase 1)', quote_ident( config.table_name );
    EXECUTE $$
    UPDATE $$ || quote_ident( config.table_name ) || $$
       SET data = data
     WHERE random() > 0.5;
    $$;
    RAISE INFO 'Bloating table % (phase 2)', quote_ident( config.table_name );
    EXECUTE $$
    UPDATE $$ || quote_ident( config.table_name ) || $$
       SET data = data
     WHERE random() > 0.5;
    $$;
    RAISE INFO 'Analyzing table %', quote_ident( config.table_name );
    EXECUTE $$
    ANALYZE $$ || quote_ident( config.table_name ) || $$;
    $$;
  END LOOP;
END;
$SQL$;

SELECT * FROM eval_table_stats;

As a result of execution of this script, we got several tables and some statistics on the table and index sizes.
I created the tables with id field with types: integer, bigint, text and numeric, additionally bytea and text for the md5 hash indexes. Actually the table size is including the filler text data, so it's size is not only the size of the fields being evaluated:

Table sizes just after insertion of 100T rows

Table name	Indexed field type	Expression	Row count	Average id field width	Table size without index	Index size
integer_short_id_table	integer	s.i	100000	4	5128 kB	1768 kB
integer_large_id_table	integer	s.i * 1234	100000	4	5128 kB	1768 kB
bigint_short_id_table	bigint	s.i	100000	8	5552 kB	2208 kB
bigint_large_id_table	bigint	s.i * 1234	100000	8	5552 kB	2208 kB
text_number_short_id_table	text	s.i	100000	5	5128 kB	2200 kB
text_number_large_id_table	text	s.i * 1234	100000	9	5552 kB	2624 kB
numeric_short_id_table	numeric	s.i	100000	8	5552 kB	2616 kB
numeric_large_id_table	numeric	s.i * 1234	100000	9	5624 kB	2656 kB
binary_md5_text_table	bytea	decode( md5( s.i::text \|\| '-text-filler'), 'hex' )	100000	17	6336 kB	3552 kB
md5_text_table	text	md5( s.i::text \|\| '-text-filler' )	100000	33	7880 kB	5328 kB

Table sizes after random bloating (2 times random update of 50% of rows)

Table name	Indexed field type	Expression	Row count	Average id field width	Table size without index	Index size
integer_short_id_table	integer	s.i	100000	4	10232 kB	5288 kB
integer_large_id_table	integer	s.i * 1234	100000	4	10232 kB	5272 kB
bigint_short_id_table	bigint	s.i	100000	8	11 MB	6592 kB
bigint_large_id_table	bigint	s.i * 1234	100000	8	11 MB	6560 kB
text_number_short_id_table	text	s.i	100000	5	10232 kB	5896 kB
text_number_large_id_table	text	s.i * 1234	100000	9	11 MB	7096 kB
numeric_short_id_table	numeric	s.i	100000	8	11 MB	7752 kB
numeric_large_id_table	numeric	s.i * 1234	100000	9	11 MB	7880 kB
binary_md5_text_table	bytea	decode( md5( s.i::text \|\| '-text-filler'), 'hex' )	100000	17	12 MB	7336 kB
md5_text_table	text	md5( s.i::text \|\| '-text-filler' )	100000	33	15 MB	11 MB

Table sizes just after insertion of 5M rows

Table name	Indexed field type	Expression	Row count	Average id field width	Table size without index	Index size
integer_short_id_table	integer	s.i	5000000	4	249 MB	86 MB
integer_large_id_table	integer	s.i * 123	5000000	4	249 MB	86 MB
bigint_short_id_table	bigint	s.i	5000000	8	269 MB	107 MB
bigint_large_id_table	bigint	s.i * 123	5000000	8	269 MB	107 MB
text_number_short_id_table	text	s.i	5000000	7	269 MB	107 MB
text_number_large_id_table	text	s.i * 123	5000000	9	269 MB	128 MB
numeric_short_id_table	numeric	s.i	5000000	8	269 MB	129 MB
numeric_large_id_table	numeric	s.i * 123	5000000	10	284 MB	129 MB
binary_md5_text_table	bytea	decode( md5( s.i::text \|\| '-text-filler'), 'hex' )	5000000	17	308 MB	172 MB
md5_text_table	text	md5( s.i::text \|\| '-text-filler' )	5000000	33	383 MB	259 MB

Table sizes after random bloating (2 times random update of 50% of rows)

Table name	Indexed field type	Expression	Row count	Average id field width	Table size without index	Index size
integer_short_id_table	integer	s.i	5000000	4	498 MB	257 MB
integer_large_id_table	integer	s.i * 123	5000000	4	498 MB	257 MB
bigint_short_id_table	bigint	s.i	5000000	8	539 MB	321 MB
bigint_large_id_table	bigint	s.i * 123	5000000	8	539 MB	321 MB
text_number_short_id_table	text	s.i	5000000	7	538 MB	287 MB
text_number_large_id_table	text	s.i * 123	5000000	9	539 MB	340 MB
numeric_short_id_table	numeric	s.i	5000000	8	539 MB	384 MB
numeric_large_id_table	numeric	s.i * 123	5000000	10	569 MB	384 MB
binary_md5_text_table	bytea	decode( md5( s.i::text \|\| '-text-filler'), 'hex' )	5000000	17	615 MB	354 MB
md5_text_table	text	md5( s.i::text \|\| '-text-filler' )	5000000	33	766 MB	545 MB

Ok, we see here, that actually it is a field byte width, that is important and for small numbers, the difference is really not so significant, but still an index on integer (4 bytes wide) filed is little smaller then index on text representation of the same ids even for tables that keep 10000 records only (these results are not posted here).

I specially was creating the tables in 2 variants, id as a result of 1..N series, and multiplied by 1234 or 123 to make the numbers wider in text representation to see how the length of the text representation of the number is influencing the size of the index.

Some more play with the smallint type, showed that it does save space in the table itself, but the index size is the same as for integer.

What was really surprising for me, that numeric type had bigger indexes then text indexes for the same numbers.

And of cause again and again -- indexes on some long text fields can be built on the md5 hash of such a text and be much more performant, when you index bytea version of the md5 hash instead of text representation of the md5 hash itself. Though md5 hash collisions can theoretically happen, I have not experienced them yet myself in practice.

2011-02-17

pgAdmin III macros: get table fields

When writing SQL statements of Stored Procedure code in pgAdmin, it is quite often quite handy to know the names and restrictions on the fields of some table.

Here is a simple macro that can be assigned to a key combination in pgAdmin -> Query Window -> Macros -> Manage macros...

select quote_ident(nspname) || '.' || quote_ident(relname) as table_name, 
       quote_ident(attname) as field_name, 
       format_type(atttypid,atttypmod) as field_type, 
       case when attnotnull then ' NOT NULL' else '' end as null_constraint,
       case when atthasdef then 'DEFAULT ' || 
                                ( select pg_get_expr(adbin, attrelid) 
                                    from pg_attrdef 
                                   where adrelid = attrelid and adnum = attnum )::text else '' 
       end as dafault_value,
       case when nullif(confrelid, 0) is not null 
            then confrelid::regclass::text || '( ' || 
                 array_to_string( ARRAY( select quote_ident( fa.attname ) 
                                           from pg_attribute as fa 
                                          where fa.attnum = ANY ( confkey ) 
                                            and fa.attrelid = confrelid
                                          order by fa.attnum 
                                        ), ',' 
                                 ) || ' )' 
            else '' end as references_to
  from pg_attribute 
       left outer join pg_constraint on conrelid = attrelid 
                                    and attnum = conkey[1] 
                                    and array_upper( conkey, 1 ) = 1,
       pg_class, 
       pg_namespace
 where pg_class.oid = attrelid
   and pg_namespace.oid = relnamespace
   and pg_class.oid = btrim( '$SELECTION$' )::regclass::oid
   and attnum > 0
   and not attisdropped
 order by attrelid, attnum;

Just select a table name, that you are interested in, and press the key binding, that you selected for that macros. pgAdmin will execute the query and show the list of all the columns of that table, including default values and foreign key references...

My preferred key binding is CTRL+1

Type of NULL is important

The following SQL query shows how different can be the result of concatenating arrays with NULL values:

select '{e1,e2}'::text[] || 't'::text     as normal_array_concatenation,
       NULL::text[] || 't'::text         as appending_element_to_existing_NULL_value_array, 
       NULL || 't'::text                 as appending_element_to_NULL_value,
       '{e1,e2}'::text[] || NULL::text   as appending_typed_NULL,
       '{e1,e2}'::text[] || NULL::text[] as appending_typed_NULL_array;

The result of the execution (on 9.0) is:

─[ RECORD 1 ]──────────────────────────────────┬─────────────
normal_array_concatenation                     │ {e1,e2,t}
appending_element_to_existing_null_value_array │ {t}
appending_element_to_null_value                │ 
appending_typed_null                           │ {e1,e2,NULL}
appending_typed_null_array                     │ {e1,e2}

That explains why one can simply initialize a new array variable in PL/pgSQL and then immediately start concatenating it with values, not pre-initializing it with an empty array (that you have to do when you want to populate a text string... you cannot just take a NULL::text variable and start concatenating strings to it, you have to first pre-inizialize it with en empty string, but with arrays you can).

Another very important issue, that is related to that beheviour of arrays, is that when creating dynamic SQL queries for EXECUTE command in PL/pgSQL you SHOULD always put explicit type casts to the variables that you quote using quote_literal() or quote_nullable() when building dynamic queries:

DO $SQL$
DECLARE
  r text[];
  t text   := NULL;
  a text[] := NULL;
BEGIN

  SELECT '{e1,e2}'::text[] || a INTO r;
  RAISE INFO 'array concatenate: r is %', r;

  EXECUTE $$SELECT '{e1,e2}'::text[] || $$ || quote_nullable(a) || $$::text[] $$ INTO r;
  RAISE INFO 'array concatenate from EXECUTE: r is %', r;

  SELECT '{e1,e2}'::text[] || t INTO r;
  RAISE INFO 'array append: r is %', r;

  EXECUTE $$SELECT '{e1,e2}'::text[] || $$ || quote_nullable(t) || $$::text $$ INTO r;
  RAISE INFO 'array append from EXECUTE: r is %', r;

  /* These 2 examples will fail on the runtime 
     throwing exception (operator is not unique: text[] || unknown)
  --EXECUTE $$SELECT '{e1,e2}'::text[] || $$ || quote_nullable(a) INTO r;
  --RAISE INFO 'r is %', r;
  --
  --EXECUTE $$SELECT '{e1,e2}'::text[] || $$ || quote_nullable(t) INTO r;
  --RAISE INFO 'r is %', r;
  */

END;
$SQL$;

Valentine's tech log