Monthly Archives: June 2008

Using SQLAlchemy with Django

This post is an introduction to using [SQLAlchemy][] and [Django][] together
to make a simple web-based asset catalogue. The examples make use of
[SQLAlchemy 0.4][sa04] and [Django 0.97-pre][django097].

[Extensis Portfolio][] is digital asset management software that has
been available on Macinctosh (and Windows) for donkeys years.
The most expensive edition of Portfolio replaces the
native data store with an SQL back-end (currently supported databases
are MySQL, MSSQL and Oracle). Extensis also sells [NetPublish][], a web front-end
to a Portfolio catalogue.

For various reasons I decided to write a custom web front-end for
Portfolio using Django. [As with Farmers Wife][farmerswife], the existing
database schema
was not well-suited to Django’s ORM. More than one table used a compound
primary key which Django does not support (there is a page [discussing
how support would be implemented][wiki] on the Django Wiki).

Defining models with SQLAlchemy
——————————-

The meat of the application consists of two models:

* **Items** represent items in the catalogue. Each picture is an `Item`, and
has properties for things like file type, file size, path to file on
disk.
* **Keywords** represent keywords related to an `Item`, using a
many-to-many relationship.

I defined two tables named `item_table` and `keyword` and a third that
is used to hold the many-to-many relationship:

item_table = Table(‘item_table’, metadata,
Column(‘Record_ID’, Integer(), primary_key=True, nullable=False, key=”id”),
Column(u’Filename’, MSString(length=249), key=”filename”),
Column(u’Path’, MSString(length=249), key=”path”),
Column(u’Created’, MSDateTime(timezone=False)),
Column(u’Last_Modified’, MSDateTime(timezone=False)),
Column(u’File_Size’, MSInteger(length=11)),
# Additional columns omitted here
)

keyword_table = Table(‘keyword’, metadata,
Column(‘Record_ID’, Integer(), primary_key=True, nullable=False, key=”id”),
Column(u’Keyword’, MSString(length=249), nullable=False, key=”keyword”),
)

item_keyword_table = Table(‘Item_Keyword’, metadata,
Column(u’Item_ID’, MSInteger(length=11), ForeignKey(‘item_table.id’), primary_key=True, nullable=False, default=PassiveDefault(u’0′)),
Column(u’Keyword_ID’, MSInteger(length=11), ForeignKey(‘keyword.id’), primary_key=True, nullable=False, default=PassiveDefault(u’0′)),
)

Of note here is how one can rename the table columns for use in the
table definition (roughly equivalent to the `db_column` field option in
Django). I prefer using the short name `id` for the primary key in place
of the original designer’s choice of `Record_ID`.

Putting `primary_key=True` on more than one column for the
`item_keyword_table` tells SQLAlchemy to use a compound primary key.
And notice the `ForeignKey` relationship uses our customized
column names for the related tables.

Then we need classes to associate the `Item` and `Keyword` objects with
the underlying tables,

class Keyword(object): pass
class Item(object): pass

mapper(Keyword, keyword_table)
mapper(Item, item_table, properties=dict(
keywords = relation(Keyword, secondary=item_keyword_table, backref=”items”, lazy=False, order_by=[keyword_table.c.keyword]),
))

The second statement above added a `keywords` property to an `Item` object
to define the many-to-many relationship with a `Keyword` object. Specifying
`lazy=False` causes the keywords related to an item to be fetched when the
item itself is loaded – the default behaviour is to leave the related
queries until the related object is accessed.

Making models more Django-friendly
———————————-

The mapped classes for `Keyword` and `Item` behave in ways similar to a
Django model. Any features that are missing can be easily put in place.

One can provide the `pk` shortcut for a model using a Python property:

class Keyword(object):
# Extra pk property to access primary key
def pk(self):
return self.id
pk = property(pk)

And `get_absolute_url` is just as straight-forward:

class Item(object):
# Match a URL pattern like ‘^(?P\d+)/$’
def get_absolute_url(self):
return reverse(‘catalogue_item’, kwargs={‘item_id’:unicode(self.pk)})

These extra methods help SQLAlchemy model objects behave almost interchangeably
with Django model objects in your template and view methods.

Using the Django paginator class with SQLAlchemy `Query` objects
requires one to override a single method calculating the total number of hits:

from django.core.paginator import ObjectPaginator as DjangoPaginator

class ObjectPaginator(DjangoPaginator):
“””SQLAlchemy compatible flavour of Django’s ObjectPaginator.”””
def _get_hits(self):
if self._hits is None:
self._hits = self.query_set.offset(0).limit(0).count()
return self._hits

Using the models in views
————————-

Unfortunately there’s no simple means for using Django’s generic
view methods (or the admin application) with SQLAlchemy’s query methods.
Django’s views rely on accessing `model._meta` from the queryset and use
an incompatible query syntax (albeit a syntax with similar intent).

But there’s nothing to stop you using SQLAlchemy query sets in custom
views and in the template context.

The following view method shows the object detail page for an `Item` object;
the primary key of the item is used in the URL and passed to the
view as `item_id`:

# Session, Item are imported from a module with the model definitions
def item(request, item_id):
“””Detail page for a Portfolio item.”””
item_id = int(item_id)
session = Session()
item = session.query(Item).filter_by(id=item_id).one()
return render_to_response(“catalogue/item.html”, {‘object’:item})

To complete the example one ought to handle the situation where
there is not exactly one matching item. If SQLAlchemy throws
`sqlalchemy.exceptions.InvalidRequestError: Multiple rows returned for one()`
the view should in turn throw [the corresponding Django exceptions][exceptions] so
that the middleview machinery can operate as normal.

The future
———-

Many parts of the API changed (or were deprected for compatibility) when
SQLAlchemy progressed from 0.3 to the current 0.4 branch, and it is
interesting how some of those changes seemed to be bringing SQLAlchemy’s ORM
approach closer to Django’s ORM. I expect more similarities to appear
on both sides in future releases.

[The Django/SQLAlchemy branch][django-sqlalchemy] has the explicit goal of combining the
two without changing the Django model API, which would eliminate the
need for any of the code in this post.

Until that gets merged to trunk, I hope you find this useful.

[SQLAlchemy]: http://www.sqlalchemy.org/
[Django]: http://www.djangoproject.com/documentation/
[sa04]: http://www.sqlalchemy.org/docs/04/
[django097]: http://www.djangoproject.com/
[NetPublish]: http://www.extensis.com/en/products/asset_management/product_information.jsp?id=2020
[wiki]: http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
[Extensis Portfolio]: http://www.extensis.com/portfolio
[farmerswife]: http://reliablybroken.com/b/2008/06/choosing-sqlalchemy-over-django/
[exceptions]: http://www.djangoproject.com/documentation/db-api/#get-kwargs
[django-sqlalchemy]: http://gitorious.org/projects/django-sqlalchemy/

Choosing SQLAlchemy over Django

In the [DJUGL][] post-meet pub chat [Simon Willison][] was curious about
people’s experiences combining [Django][] with [SQLAlchemy][]. I’ve used
SQLAlchemy’s ORM with Django in two projects; on both occasions I
quickly chose to substitute Django’s ORM with SQLAlchemy’s because I was
dealing with an existing SQL schema which I could not alter and which
did not fit well with Django’s ORM.

The first project was to produce reports on data exported by a
scheduling application called [Farmers Wife][]. The database has three
dozen tables or so, with some tables using compound keys, other tables
lacking primary keys altogether and one relation in particular using
combined *parts* of columns to refer to other tables.

I gave an involuntary, rather hysterical giggle when I discovered that
particular corner of the data model.

I tried mapping the data using Django’s models but quickly found that
the schema was simply too irregular to fit Django’s requirements for
meaningful relations between tables (this project started a bit before
the [0.96 branch][096] was released).

I decided to try SQLAlchemy. The immediate benefit in using SQLAlchemy
was its introspection of table definitions allowed me to start mapping
objects with very few lines of code yet having column data available as
properties of objects.

**N.B. The examples in this post are for [SQLAlchemy 0.3][sa3] ([current release
is 0.4.6][sa4]).**

Here is a [MySQL][] definition for a table named `objects_users3`:

mysql> describe objects_users3;
+———————-+————-+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+———————-+————-+——+—–+———+——-+
| id | varchar(32) | YES | UNI | NULL | |
| name | varchar(64) | YES | | NULL | |
| icon | varchar(32) | YES | | NULL | |
| buy_hour | float(11,2) | YES | | NULL | |
| sell_hour | float(11,2) | YES | | NULL | |
| buy_day | float(11,2) | YES | | NULL | |
| sell_day | float(11,2) | YES | | NULL | |
| daybased | tinyint(4) | YES | | NULL | |
| ref | varchar(64) | YES | | NULL | |
| password | varchar(32) | YES | | NULL | |
| firstname | varchar(32) | YES | | NULL | |
| lastname | varchar(32) | YES | | NULL | |
| email | varchar(64) | YES | | NULL | |
| active | tinyint(4) | YES | | NULL | |
| permission | int(11) | YES | | NULL | |
| textnote | text | YES | | NULL | |
| tel_home | varchar(64) | YES | | NULL | |
| tel_work | varchar(64) | YES | | NULL | |
| tel_cell | varchar(64) | YES | | NULL | |
| stock_inform | tinyint(4) | YES | | NULL | |
| lib_write | tinyint(4) | YES | | NULL | |
| mediaorders | tinyint(4) | YES | | NULL | |
| muleaccess | tinyint(4) | YES | | NULL | |
| days_in_liueu_offset | int(11) | YES | | NULL | |
| reportaccess | tinyint(4) | YES | | NULL | |
| aux_hour | float(11,2) | YES | | NULL | |
| aux_day | float(11,2) | YES | | NULL | |
+———————-+————-+——+—–+———+——-+
27 rows in set (0.20 sec)

27 different rows, 27 different attributes I would need to define in my
Django `models.py` in order to access all the values.

Here’s how to use SQLAlchemy to introspect a table definition for
the same table:

from sqlalchemy import BoundMetaData, Column, Table, mapper

metadata = BoundMetaData(“mysql://name:passwd@hostname/DatabaseName”)

objects_users3 = Table(‘objects_users3’, metadata,
Column(‘id’, String(32), primary_key=True),
autoload=True,
)

And we’re done.

The next piece of the puzzle is to map a Python class to this table:

class ObjectUser3(object):
pass

mapper(ObjectUser3, objects_users3)

And we’re done.

These few lines give us a class that provides similar functionality to
that of `django.db.models.Model`. You can use the class to create a new
**ObjectUser3** or to retrieve one or more existing **ObjectUser3**
objects with column filters, etc.

In SQLAlchemy 0.3 one uses the ORM within the context of a session, which
has a `query` method that returns an object that can be used to retrieve
the objects from the database:

>>> session = create_session()
>>> q = session.query(ObjectUser3)
>>> users = q.all()
>>> len(users)
17
>>> me = q.get_by(name=’David’)
>>> me.lastname
‘Buxton’

Note how one can specify the column for filtering the results using
named arguments, just like Django.

Like Django, SQLAlchemy provides means for defining relationships and
allows one to add whatever additional methods one chooses to the model
class. Unlike Django, SQLAlchemy provides a comprehensive (if somewhat
daunting) set of tools for generating SQL queries, allowing one to move
between manipulating the SQL table data and manipulating Python objects
constructed from that data without having to manually write any SQL at
all.

And therein lies the major difference between SQLAlchemy and the
Django ORM: the former is intended to be *a toolkit for SQL*, whereas
Django provides a system for storing Python objects and exposes
relatively little of its query construction tools.

SQLAlchemy 0.4 improves things for an existing developer and for a
developer coming from Django. The sessions are simpler to work with. The
`Query` objects have changed to support slicing syntax like Django’s
`QuerySets`.

I find Django’s models simpler to write, easier to understand than the
equivalent SQLAlchemy approach. The business of defining relations
between models exposes a little more of the underlying SQL concepts when
working with SQLAlchemy, but then that’s precisely why it was such a
great choice for this project; SQLAlchemy allows one to customize its
default object mapping behaviour in ways that Django does not. For
example one of the more mind-bending features allows one to specify [a
custom class for handling collections][custom] of related objects, so
what would be a simple list could just as easily be treated as a
dictionary where the key is determined by a column’s value.

SQLAlchemy combines the convenience of a good ORM engine with an
incredibly flexible SQL abstraction. For gnarly databases it rocks.

I want to write more about how Django and SQLAlchemy fit together, but
I’ll leave that to a discussion of the second project.

[DJUGL]: http://djugl.eventwax.com/djugl
[Simon Willison]: http://simonwillison.net/
[SQLAlchemy]: http://www.sqlalchemy.org/
[Django]: http://www.djangoproject.com/
[Farmers Wife]: http://www.farmerswife.com/

[096]: http://www.djangoproject.com/documentation/0.96/
[sa3]: http://www.sqlalchemy.org/docs/03/
[sa4]: http://www.sqlalchemy.org/docs/04/
[MySQL]: http://www.mysql.com/
[custom]: http://www.sqlalchemy.org/docs/03/adv_datamapping.html#advdatamapping_properties_customlist

Halcali, It’s PARTY TIME!

I guess [Halcali][] never made a video for *Doo The HAMMER!!*, so this is the video for the second song from *Cyborg Oretachi*. I don’t understand it on every level, but I like it, I like it a lot.

*It’s PARTY TIME!*

As good as that is, *Doo The HAMMER!!* is better. And has twice as many exclamation marks.

[Halcali]: http://www.halcali.com/

Encoding lists in Django for jQuery

Several times I have needed to implement a form for a [Django][] model where
one field’s value determines the available choices for a second field.
Recently I finished a project where the user had to choose from a list of
departments in our office, and then choose from a list of staff working
for the chosen department.

[Django]: http://www.djangoproject.com/

The Django model looks like this:

EMPLOYEES = (
(‘Finance’, ‘Adam’),
(‘Finance’, ‘Clare’),
(‘Finance’, ‘Dave’),
(‘Housekeeping’, ‘Frank’),
(‘Housekeeping’, ‘Nat’),
(‘Marketing’, ‘Nigel’),
(‘Marketing’, ‘Valerie’),
)

def department_choices():
depts = list(set(d for d, e in EMPLOYEES))
depts.sort()
for d in depts:
yield (d, d)

def employee_choices():
for d, e in EMPLOYEES:
yield (e, e)

class EmployeeOfTheYear(models.Model):
department = models.CharField(max_length=250, choices=department_choices())
employee = models.CharField(max_length=250, choices=employee_choices())

In this application the site visitor will be creating `EmployeeOfTheYear`
objects, and the new object form should display department and employee
fields as lists of pre-defined department names and pre-defined
employee names.

So then wickles! A quick Django view and template for making a new object
creates markup with SELECT inputs for department and employee fields
with the OPTION elements restricted to just those
departments and employees defined by the evil management types on
the 100th floor. The markup will be similar to this:

(If your corporate culture lacks a 100th floor you might imagine a
variation where departments and employees are defined in [your corporate
directory][msad], and where you have mad skillz sufficient to
create lists of the departments and employees using [LDAP and
Python][pyldap].)

[msad]: http://www.microsoft.com/windowsserver2003/technologies/directory/activedirectory/
[pyldap]: http://python-ldap.sourceforge.net/

The problem is that choosing a department from the department SELECT menu
has no bearing on the choices available in the employee SELECT menu. It
jolly well ought to.

I use [jQuery][] for nearly every piece of JavaScript functionality in my
projects. jQuery makes so many tedious tasks a matter of a few lines, I
wish I could reclaim the hours spent debugging my scripts before I
discovered this magical library.

[jQuery]: http://jquery.com/

With jQuery and the [texotela plugin for select boxes][texotela] we can
create an event handler that fires whenever the visitor makes a choice from
the department menu so that the choices in the employee menu are restricted to just
the employees matching the chosen department. This is known as a cascading
select input or two-level select input.

[texotela]: http://www.texotela.co.uk/code/jquery/select/

The script to do this does the following:

* Take the chosen value for the department
* Remove all options for the employee SELECT input element
* Query the server via AJAX for a list of employees in the chosen department
* Insert the results as options for the employee SELECT element

Here is the JavaScript to do all that, using jQuery’s excellent selector
syntax to install the handler for the form’s `id_department` SELECT element:

employee_url = ‘/find_employees/’

$(document).ready(function() {
$(“#id_department”).change(function() {
var dept = $(this).selectedValues();
$(“#id_employee”).removeOption(/./);
$(“#id_employee”).ajaxAddOption(employee_url, {dept: dept}, false);
});
});

There are implementation details hard-coded in there.

1. The URL for retrieving a list of employees is `/find_employees/`
2. The `/find_employees/` URL is queried with the chosen department
passed in as the `dept` query variable
3. The Django form inputs must be named `id_department` and `id_employee`

When a visitor chooses ‘Housekeeping’ from the department SELECT the
event handler will GET `/find_employees/?dept=Housekeeping`
and will expect a JSON-encoded list of options for insertion in the
employees menu. The texotela plugin says the format for the employee list
has to be like so:

{
“option_value_1”: “option_text_1”,
“option_value_2”: “option_text_2”
}

(At which point I wonder if my mistrust of JavaScript is misplaced. That
damn JSON-encoded data just is a Python dictionary! But no, a Python
dictionary is unordered, whereas those conniving JavaScript curly-braces
denote object properties. And the absence of a final comma on the last
value/option pair can make all the debugging difference in the world. Not
the same thing at all.)

And finally the view itself. This view must return a JSON-encoded dictionary
of employee names that match the department given by the ‘dept’ query
parameter. However a regular Python dictionary is no good because the order
of items is significant: the choices in the employee SELECT should be listed
alphabetically. We can use Django’s `SortedDict`, a sub-class of dict that
maintains key order.

from django.http import HttpResponse

def find_employees(request):
“””Return a JSON list matching search term.”””
from django.utils.datastructures import SortedDict
from django.utils import simplejson
from models import EMPLOYEES

dept = request.GET.get(‘dept’, ”).lower()

if dept:
employees = [e for d, e in EMPLOYEES if dept == d.lower()]
else:
employees = [e for d, e in EMPLOYEES]
employees.sort()

d = SortedDict([(e, e) for e in employees])
return HttpResponse(simplejson.dumps(d, ensure_ascii=False), mimetype=’application/json’)

I need a matching rule in urls.py to direct requests to the JSON view:

urlpatterns = patterns(‘myproject.myapp.views’,
url(r’^find_employees/$’, ‘find_employees’, name=”find_employees”),
)

Things I like about this approach:

* The form works perfectly without JavaScript
* If the submitted form doesn’t validate, the visitor’s department and
employee choices are still selected when the form is redisplayed
* The JavaScript is clear and concise

In the real application the EMPLOYEES and DEPARTMENTS are lists of objects
wrapping LDAP results, but I hope this explanation is clear enough to show
how the it all hangs together to help the visitor use what would otherwise
be an unhelpful couple of SELECT inputs.

Using relative paths in your settings.py

Several important settings in your [Django project’s
settings.py][settings] are annotated with warnings about the need for
absolute path names. When I start a new project with the development
server I don’t want to have to think what directory my MEDIA_ROOT should
point to. Isn’t all that going to change anyway when it’s deployed to
the gigantic server in the sky?

[settings]: http://www.djangoproject.com/documentation/settings/

My first tip is that when you are running the development server, you
*can* get away with relative paths. Paths will be relative to the
directory you were in when you started the development server (in my
case I always `cd ~/my_project && ./manage.py runserver` so that
means relative to my project’s directory).

Therefore if you want to keep project-wide templates within the project,
just create a `templates` directory and add it to settings.py:

TEMPLATE_DIRS = (
‘templates’,
)

Sweet! Relative paths and it all works and I don’t have to edit my
`TEMPLATE_DIRS` setting whenever I am editing on a work machine (where
the project is in `/Users/dbuxton/my_project`) instead of home (where
the project is in `/Users/david/my_project`).

But it all goes pear-shaped when you move the project to the deployment
platform. There the project is running under mod_python where the notion
of the current working directory is going to be very different. All I
know is my relative paths *do not get resolved* and I wonder if them
Django developers knew a thing or two when they warned me to use
absolute paths.

My current approach to this is to refuse to do what I am told. Instead
one can take advantage of a [Python module’s `__file__` attribute][__file__] to establish where on disk your `settings.py` is,
and armed with that knowledge you can construct absolute paths from the
relative settings.

[__file__]: http://www.python.org/doc/2.5.1/ref/types.html#l2h-111

Near the top of `settings.py` I have:

import os

INSTALL_DIR = os.path.dirname(os.path.abspath(__file__))

Then anywhere I need an absolute path in settings.py I have something
similar to this:

TEMPLATE_DIRS = (
os.path.join(INSTALL_DIR, ‘templates’),
)

Shabooba! These paths change according to the value of `INSTALL_DIR`,
and that in turn is determined when the Django project is loaded. When
my project’s templates are sitting on a distant FreeBSD server in
`/home/webapps/django/my_project/templates` mod_python can locate them
just as surely as Django’s development server can locate the folder
`/Users/david/my_project/templates` on my MacBook.

My Mac is broken: Spotlight example 1

Gah.

Just thinking about having to find files on a Macintosh fills me with dread. Spotlight is useless for finding anything unless you know where it is already. I am in the half of the Venn diagram labelled *believes names of files and folders have meaning*. In the other half are the interface designers of [Apple’s Spotlight technology][spotlight].

Here’s a plum example of how a Spotlight window in Mac OS X 10.5.3 does an alphabetic sort-by-name of results when searching for files using *File Name* criteria.

Spotlight's notion of an alphabetical sort

I hate Spotlight. Makes me wonder what the point of all that amazing search engine technology is.

[spotlight]: http://developer.apple.com/macosx/spotlight.html
[pic]: http://reliablybroken.com/b/wp-content/uploads/spotlight-broken-1.gif

Old-style Python class system and parent methods

I rather like Python’s explicit object reference requirement, whereby method definitions for a class instance have to use `self` as the first parameter (I should write some classes that use `this` instead of `self` some time, just to annoy myself).

But I was tripped up debugging a problem that centred on a simple class I had that needed to do a bit of housekeeping for byte streams:

from StringIO import StringIO

class MyString(StringIO):
def __init__(self, *args, **kwargs):
self._customized = True # Or similar housekeeping
super(MyString, self).__init__(*args, **kwargs)

Creating an object from this class raises a `TypeError`:

>>> s = MyString()
Traceback (most recent call last):
File “”, line 1, in
File “”, line 4, in __init__
TypeError: super() argument 1 must be type, not classobj

Wuh? I always used `super` like that before, and it always worked. But my mistake here was I was sub-classing an [*old-style* class][old], and `super` only works with [*new-style* classes][new].

The correct way of calling the parent method for old-style classes:

class MyString(StringIO):
def __init__(self, *args, **kwargs):
self._customized = True # Or similar housekeeping
StringIO.__init__(self, *args, **kwargs)

That works! But the distinction between the two class models is so inelegant, so clunky. It is a nasty bit of Python’s historic implementation that one needs to keep in mind, and it is knowledge that makes me no cleverer (although it does mean I’m less stupid than I was).

Of course Python 3000 removes this distinction…

[old]: http://docs.python.org/ref/node33.html
[new]: http://docs.python.org/tut/node11.html#SECTION0011500000000000000000

Humour in FreeBSD man pages

Was reading the [tunefs man page][tunefs] today while working out how to enable ACLs for a filesystem in [FreeBSD][freebsd]. The last line made me laugh.

> You can tune a file system, but you can’t tune a fish.

The man pages are one of the BSD’s great strengths.

[tunefs]: http://www.freebsd.org/cgi/man.cgi?query=tunefs
[freebsd]: http://www.freebsd.org/