2010-11-24

Reflecting generic type parameters in Java

If you one is introspecting an object class in Java, it was always a problem to be able to find out, what is the actual type of the generic type parameter for a class field or property.

Ok, yesterday Daniel and I ware listening to a presentation about EclipseLink from a guy from Oracle (or a company, that is working with Oracle, but this is not so important). The question about Generic Type erasure was in our minds all the time, when the guy showed how EclipseLink is generating adapter classes for his model classes. Ok, we asked about the issue, and he showed us how one can extract generic type parameter information for object fields using reflection methods. This approach will work only on the Field classes actually, but this is more then enough to be able to extract actual generic parameter types for class fields :)

Here is a simple example, that I made inspired by this information:

import java.lang.reflect.Field;
import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;
import java.lang.reflect.TypeVariable;
import java.util.Arrays;
import java.util.Map;

import org.junit.Before;
import org.junit.Test;

public class GenericTypeParameterDemo {

 static final class MyGenericType<e extends Number> {
  public Map<String, E> map;
 }
 
 @Before
 public void setUp() throws Exception {
 }
 
 @Test
 public void extractGenericParameters() throws Exception {
  Field field = MyGenericType.class.getDeclaredField("map");

  Type fieldGenericType = field.getGenericType();
  ParameterizedType parametrizedFieldType = (ParameterizedType) fieldGenericType;

  for (Type actualParameterType : parametrizedFieldType.getActualTypeArguments()) {
   if ( actualParameterType instanceof TypeVariable<?>) {
    Type[] bounds = TypeVariable.class.cast(actualParameterType).getBounds();
    System.out.println(actualParameterType.toString() + 
     " with bounds " + Arrays.toString(bounds) );
   } else {
    System.out.println(actualParameterType);
   }
  }
 }
}

It is really simple, as you can see on lines 25 and 26. The trick is to really check if the result of Field.getGenericType()
is actually an instance of ParameterizedType. One can look up also other children of Type interface, that can bring more details on the structure of your field types.

2010-08-08

ALTER ENUM in PostgreSQL

Note: do not use this method on PostgreSQL 9.1. This can corrupt your catalog as new enum model differentiates between even and odd OIDs for enum values. But in PostgreSQL 9.1 there is a nice ALTER TYPE command, that can alter you enum correctly.

Unfortunately PostgreSQL (9.0 and before) does not support altering ENUM types, that makes their use quite limited. Very often, it is important to have a possibility to extend already existing enumeration type and in very few cases delete or rename an element of an enumeration type.

The standard way to do that, would be to create a new enumeration type with some temporary name, convert the fields of the type of your old enumeration type to the new enumeration type you just created, then drop the old one and rename the temporary name to the original enum type. This operation is doable, but conversion of the field type is usually getting a full lock on that table and can be quite a long operation (maybe in 9.1 this will change).

By now, the only way to do it without table lock and messing up with temporary enumeration types, is to play with the catalog directly.

As internally postgres is storing OIDs of the enumeration type labels in the tables and types, where this enumeration is being used, one can actually add new labels and change the names of existing labels in the needed enumeration without a fear to destroy integrity of existing data structures, that use that enumeration labels already (i.e. OIDs of that enumeration labels).

To add one additional label in an existing ENUM one can do the following:

insert into pg_enum(enumtypid, enumlabel)
select t.oid, 'NEW_ENUM_VALUE'
  from pg_type as t
 where t.typtype = 'e'
   and t.oid = 'existing_enum'::regtype::oid
   and string_to_array( split_part(version(), ' ', 2), '.' )::int[] < ARRAY[9,1]
returning enumtypid::regtype::text, enumlabel;

This will add a label of NEW_ENUM_VALUE to an existing enumeration type existing_enum (be sure to put the right letter case for your enumeration type and enumeration label names).

To rename an existing value name one can do the following:

update pg_enum as e
   set enumlabel = 'UPDATED_ENUM_VALUE'
 where e.enumtypid = 'existing_enum'::regtype::oid
   and e.enumlabel = 'OLD_ENUM_VALUE'
   and string_to_array( split_part(version(), ' ', 2), '.' )::int[] < ARRAY[9,1]
returning e.enumtypid::regtype::text, e.enumlabel;

Deletion of the enumeration labels can be really dangerous as one can accidentally drop an OID of the label, that is still being used in the tables. Dmitry Koterov wrote a stored procedure, that does some easiest checks on tables (not types or types of types) that use the enumeration that you want to change. So if you really need to drop an enumeration label from an existing enumeration, think twice first and then read the post with the source of this stored procedure :)

2010-05-28

Merging and manipulating arrays in PostgreSQL

Though, strictly speaking, using arrays in relational databases is kind of not correct, still we use arrays, especially in stored procedures.

Actually PostgreSQL provides us with quite an number of nice array manipulation functions. That can be used at least starting from PostgreSQL 8.0. But what about set operations. How to sort an array in Postgres? How to merge arrays without repetitions? How to remove duplicate elements keeping the order of the appearance in the original array?

If you are working with the integer based arrays, there is quite an old and effective module intarray that allows us to perform some basic set operations on the integer based arrays.

But what to do with not integer based arrays? Or if we want to do some relatively complicated operation on integer bases array? SQL is a set manipulation language, so that means that actually, we can use internal SQL mechanics to manipulate, merge and sort arrays. But for that, first we have to be able to convert an array into kind of a table, so that we can use usual SQL manipulation on it's elements.

generate_series(), generate_subscripts() and unnest() are the most important functions in our case. Unfortunately generate_subscripts() and unnest() are only available starting from PostgreSQL 8.4, but one can easily create a small helper unnest() function using generate_series(), that will be not as efficient as the build in one, but still it is quite good trade off.

CREATE OR REPLACE FUNCTION unnest(a anyarray)
  RETURNS SETOF anyelement AS
$BODY$
/**
 *  Unnests given array into a table
 */
 /* -- testing
     select unnest(ARRAY[1,2,3,4,5]) as i
     select * from unnest(ARRAY[1,2,3,4,5]) as u(i)
  */
select ($1)[s.i] 
  from generate_series( array_lower($1, 1), array_upper($1, 1 ) ) as s(i);
$BODY$
  LANGUAGE 'sql' IMMUTABLE STRICT;

A good thing about unnest() written in SQL and not PL/pgSQL is that one can use such a set retuning function directly after SELECT not necessarily pushing it in the list of used tables after FROM clause. The reason was explained by Tom Lane some time back in one of the newsgroup threads and is related to the implementation drawbacks of the PL/pgSQL.

Ok, so now I will simply give several examples of how to use these functions.

Merge 2 or more text arrays excluding duplicates:
select ARRAY(
  select unnest(ARRAY[ 'a', 'b', 'c' ])
   union
  select unnest(ARRAY[ 'c', 'd', 'e' ])
)
If we use UNION ALL instead, we are doing just the same as array_cat(), but slower :)

Note, that UNION actually sorts the elements of the result set fist, to eliminate the duplicates, so the resulting array will be sorted usually. But one cannot relay on that behavior and if you really need a sorted array on the output, you have to use ORDER BY explicitly.

Merge 2 or more text arrays excluding duplicates and sorting elements:
select ARRAY(
  select unnest(ARRAY[ 'a', 'b', 'c' ]) as e
   union
  select unnest(ARRAY[ 'c', 'd', 'e' ]) as e
   order by e
)
Here we are forcing PostgreSQL to sort the elements alphabetically. If you check the execution plan for that query, you will see that it is exactly the same as for the previous query. But here we can actually reverse the order or sort by element length or first by element length and then alphabetically and so on. This makes this approach quite flexible not still the whole construct is very readable and understandable. But what if we want to trim text of each of the elements, lower the case or even rewrite each element using regular expression replace?

Trim or rewrite text array elements:
select ARRAY(
  select btrim(lower(regexp_replace(unnest(ARRAY['a a','B b','c  C  ']), 
                                    $$\s+$$, 
                                    ' ', 
                                    'g')
                     )
               ) as e
)
We can use this approach in all our previous queries as well, so we actually can merge several arrays reducing all repeated space characters into one, trimming and lowering each element before, removing duplicates from the resulting array and all that in one simple and readable SQL statement.

By now, with this syntax we cannot actually filter some elements out of the original array.

Filter elements from an array depending on element properties:
select ARRAY(
  select a.e 
    from unnest(ARRAY[0,4,2,5,4,1,6,9,8,7]) as a(e)
   where a.e % 2 = 0
)
This query actually rewrites an integer array so, that it drops all the add elements from it. We have to push unnest() to FROM list to be able to name the virtual table, that unnest() creates as well as name the element column so that we can reference it in the where clause. Again there are not guaranties here that our array will be contain elements in the same order that it was in the original one.

The problem is that all these queries do not know anything about the original position of the elements in the original array. To solve this problem we need element indexes to be visible inside the query. One can do it using a plain old generate_series() or not so old generate_subscripts() (one actually should prefer the later one if you are on the PostgreSQL 8.4+) to unnest our original array. You cannot use the unnest() directly as there is not way to determine the position or index other then using window functions like row_number(), but this is much much slower on practice.

Filtering elements from an array depending on element position (index):
select ARRAY(
  select (ARRAY[0,4,2,5,4,1,6,9,8,7])[s.i]
    from generate_series(array_lower(ARRAY[0,4,2,5,4,1,6,9,8,7], 1),
                         array_upper(ARRAY[0,4,2,5,4,1,6,9,8,7], 1) ) as s(i)
   where s.i % 2 = 0
   order by s.i
)
This query is dropping all the elements of the original array that are located on the odd positions and ensures that the elements are ordered in the resulting array the same way as they were in the original one. forcing ordering the elements here is probably not the best thing to do for performance reasons, but I want to demonstrate that it is possible and, strictly speaking, even a proper way to do.

But using generate_series() or generate_subscripts() shows a small problem actually. We have to push the original array at least 2 times in the query. Inside a PL/pgSQL code the arrays are usually replaced with the variable names, and the queries do not look so bulky there, but there is a possibility to create an additional helper function that will unnest an array returning not only a element itself, but also it's index in the original array:

CREATE OR REPLACE FUNCTION array_enumerate(IN a anyarray, 
  OUT i integer, OUT e anyelement)
  RETURNS SETOF record AS
$BODY$
/** 
 *  Unnests array into a table together with element indexes
 */
 /* -- testing
     select * from array_enumerate(ARRAY['a', 'b', 'c']) as u
  */
select s.i, ($1)[s.i] from generate_series( array_lower($1, 1), array_upper($1, 1 ) ) as s(i)
$BODY$
  LANGUAGE 'sql' IMMUTABLE STRICT;
on the PostgreSQL versions after 8.4 one can write it using generate_subscripts() like:
CREATE OR REPLACE FUNCTION array_enumerate(IN a anyarray, 
  OUT i integer, OUT e anyelement)
  RETURNS SETOF record AS
$BODY$
/** 
 *  Unnests array into a table together with element indexes
 */
 /* -- testing
     select * from array_enumerate(ARRAY['a', 'b', 'c']) as u
  */
select s.i, ($1)[s.i] from generate_subscripts( $1, 1 ) as s(i)
$BODY$
  LANGUAGE 'sql' IMMUTABLE STRICT;
Using generate_subscripts() should be preferable as one can create some crazy array with elements having noncontinuous indexes actually. And it should be also a little bit faster, then generate_series().

This actually gives us a possibility to rewrite previous query like this:

select ARRAY(
  select s.e
    from utils.array_enumerate(ARRAY[0,4,2,5,4,1,6,9,8,7]) as s(i,e)
   where s.i % 2 = 0
   order by s.i
)
Now we can even do something practical using this approach. for example

Remove duplicate elements from array keeping the element appearance order:
select ARRAY(
  select s.e
    from utils.array_enumerate(ARRAY[0,4,2,5,4,1,6,9,8,7]) as s(i,e)
   group by s.e
   order by min(s.i)
)
Actually this gives a possibility to process lists of words using string_to_array() and array_to_string() methods, or on newer versions of PostgreSQL regexp_split_to_array() to remove the repeating space or punctuation characters during splitting directly.

Practically everywhere in the queries, one could use the array_agg() to generate the resulting arrays or, in the coming 9.0, a very efficient string_agg() to generate strings directly, without creating preliminary arrays and then using array_to_string().

On my tests comparing the performance of the ARRAY(select) and array_agg() I could not find any significant performance difference.

2010-02-26

Global database configuration and context aware connection pool extention for psycopg2

Just created a recipe of the context aware connection pool that I am using for quite a long time now.

ActiveState Python Recipe #577072

Hope somebody would get at least some ideas for his own project.

2010-02-24

re-assigning values to the parameters in plpythonu

As discussed in Postgres BUG #5232 assigning values to the parameters fails with the following error:

Detail: <type 'exceptions.UnboundLocalError'>: local variable 'src' 
referenced before assignment

If assignment to the parameter is still needed one can solve the issue by adding global definition for such a parameter.

CREATE OR REPLACE FUNCTION pyreplace(src text, s text) 
  RETURNS text AS 
$BODY$ 
global src
src=src.replace(s,'') 
return src 
$BODY$ 
  LANGUAGE 'plpythonu' VOLATILE 
  COST 100; 

2010-01-14

Awesome code syntax highlighting made easy | Carter Cole's Blog

I finally tried out the highlighting JavaScript library SyntaxHighlighter. This is quite an impressive work done by the author. My respect to him. And here is a nice blog post about how no configure the library to use when posting to blogger:
Awesome code syntax highlighting made easy | Carter Cole's Blog

2010-01-13

64 bit python installation issues on 64 bit Windows 7

After installing 64 bit Windows 7 at work, I got a problem of installing many python utilities that search for installed python locations is windows registry. distutils (setuptools-0.6c11.win32-py2.6.exe) still could not find 64 bit python installation, as well as develer's Unofficial MinGW GCC binaries for Windows (I still do not have a solution for that one :( ).

Actually, what helped me with disttools, was a http://www.mail-archive.com/distutils-sig@python.org/msg10512.html group thread. The currently available 64 bit python installation is registering python in a new location in windows registry. So one have to use a modified script for registering python the old way:

#
# script to register Python 2.0 or later for use with win32all
# and other extensions that require Python registry settings
#
# written by Joakim Loew for Secret Labs AB / PythonWare
#
# source:
# http://www.pythonware.com/products/works/articles/regpy20.htm
#
# modified by Valentine Gogichashvili as described in http://www.mail-archive.com/distutils-sig@python.org/msg10512.html

import sys

from _winreg import *

# tweak as necessary
version = sys.version[:3]
installpath = sys.prefix

regpath = "SOFTWARE\\Python\\Pythoncore\\%s\\" % (version)
installkey = "InstallPath"
pythonkey = "PythonPath"
pythonpath = "%s;%s\\Lib\\;%s\\DLLs\\" % (
    installpath, installpath, installpath
)

def RegisterPy():
    try:
        reg = OpenKey(HKEY_CURRENT_USER, regpath)
    except EnvironmentError as e:
        try:
            reg = CreateKey(HKEY_CURRENT_USER, regpath)
            SetValue(reg, installkey, REG_SZ, installpath)
            SetValue(reg, pythonkey, REG_SZ, pythonpath)
            CloseKey(reg)
        except:
            print "*** Unable to register!"
            return
        print "--- Python", version, "is now registered!"
        return
    if (QueryValue(reg, installkey) == installpath and
        QueryValue(reg, pythonkey) == pythonpath):
        CloseKey(reg)
        print "=== Python", version, "is already registered!"
        return
    CloseKey(reg)
    print "*** Unable to register!"
    print "*** You probably have another Python installation!"

if __name__ == "__main__":
    RegisterPy()

or just inject the following REG file into your registry if you are using python 2.6 installed in C:\Python26 directory

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Python]

[HKEY_CURRENT_USER\Software\Python\Pythoncore]

[HKEY_CURRENT_USER\Software\Python\Pythoncore\2.6]

[HKEY_CURRENT_USER\Software\Python\Pythoncore\2.6\InstallPath]
@="C:\\Python26"

[HKEY_CURRENT_USER\Software\Python\Pythoncore\2.6\PythonPath]
@="C:\\Python26;C:\\Python26\\Lib\\;C:\\Python26\\DLLs\\"