2008-11-04 09:32:18

SQLAlchemy: With ancient methods against modern IT

There is a small number of projects the Python developers are especially proud of: Zope and SQLAlchemy.

The fact that the Python core team deliberately keeps breaking compatibility for Zope is probably bad enough, but at least SQLAlchemy appears to work well enough with the various versions. Well, but how does it work?

The answer is: bizarre.

Never trust the database to get sequences right!

The first thing I discovered was with PostgreSQL and sequences. Some tables use sequences for creating their primary key, which is then going to be a 64-bit integer. This is done automatically by either not naming the primary key column on insert, or by using the keyword DEFAULT.

Looking through the debug output, however, items like these can be spotted:

2008-11-04 09:26:24,258 INFO sqlalchemy.engine.base.Engine.0x..d0 select nextval('data_from_box_id_seq')

This is wrong on a number of levels. At this point, SQLalchemy manually queries the next value returned by the sequence for the id column. This value is consequently used in the insert statement for the id column. This breaks the atomicity of the insert statement — rather than letting PostgreSQL handle things internally as it's supposed to, the insert statement is artificially split into two statements. Of course, this also means that two SQL operations are performed, which almost doubles the network overhead (especially over encrypted connections).

The most likely reason why this is done is to have the row ID of the newly inserted row for use in the client side object representation. However, PostgreSQL has two much better mechanisms achieving the same: Firstly, the ID can be queried using the last_insert_id operation, or, even better, everything can be handled in one statement by using the RETURNING keyword:

INSERT INTO data_from_box VALUES (DEFAULT, something, else) RETURNING id;

Prepared statements are for the weak and timid

An even more bizarre affection manifests itself in the support for MySQL. Rather than to make use of the prepared statements, which are mandated by the API specification, SQLAlchemy attempts to reproduce prepared statements in the SQLAlchemy code, and to naturalize SQL statements manually. Then, the resulting statement is PREPAREd and EXECUTEd.

This of course constitutes a duplication of functionality, and more than this, the SQLAlchemy users are now prone to SQLAlchemy prepared statement naturalization bugs which cannot really occurr in SQL prepared statements (because they are never synchronized into a string but used as-is).

Overall, it is very hard to say what the SQLAlchemy developers had in mind, but it seems clear that it was nothing useful.

Posted by Tonnerre Lombard | Permanent link | File under: broken, programming