[SOLVED] Add Unique Constraints to Google App Engine databases

The problem:

Google App Engine rules! The truth is that I’m starting to feel confortable programming in Python, although I still like the curly braces to indentate.

Anyway, the datastore used by Google is superpowerful and supersimple to use, but it has some limitations. With the App Engine SDK you can easily say which attributes you want to index (indexed = True), which ones you want to be required (required = True), which is the default value (default=”whatever”) but you cannot add a unique constraint on an attribute or set of attributes (there is no unique=True).

This is the typical nick or e-mail field in a database. You don’t want two users share the same nick or the same e-mail. You want all users to be identified by their nicks or their e-mails.

¿What’s the problem? As I said, you can’t do that using the Google App Engine SDK… hopefully the solution is really easy 😉

The solution:

The only thing you have to do is to override put inside our model, and launch an exception when a new element that violates our unique constraints is going to be added to the database.

In order to provide a simple example, let’s create a model named User using name, email, language and creation_date as the attributes:

class User (db.Model): 
   creation_date = db.DateTimeProperty (auto_now_add=True)
   name          = db.StringProperty (required = True)
   email         = db.EmailProperty (required = True)
   language      = db.StringProperty (default="en")

Now, to reproduce the problem in GAE, the following code will create two different entries:

User (name="Mark Johnson", email="test@codigomanso.com").put()
User (name="JohnMarkinson", email="test@codigomanso.com", language="es").put()

Now let’s add a constraint to forbid two users to share the same e-mail. With this constraint the second putshould fail or launch an exception. ¿How do we do that? It’s as simple as it seems. We just need to redefine put so when an new element is going to be added to the database, we check if violates our rules and we launch an exception.

The new class with the constraints would be as follows:

class User (db.Model): 
   creation_date = db.DateTimeProperty (auto_now_add=True)
   name          = db.StringProperty (required = True)
   email         = db.EmailProperty (required = True)
   language      = db.StringProperty (default="en")
   def put (self):
     # Make sure e-mails are unique for each user
     if (not self.is_saved()) and (User.gql ('WHERE email = :1', self.email).count() > 0):
       raise DuplicatedInstanceError ('User.email', self.email)
     # call the parent method
     db.Model.put (self)

We needed only 3 lines of code (4 counting the def line)… it can’t be easier

Some inconveniences

After thinking on it for a while, this implementation is far from being perfect, although it will work on most cases. I see two problems:

  • You have to make an extra query before inserting an item (updates behave as always)
  • Using this solution you can end up with two users having the same e-mail. How can that be? Have you heard about race conditions? The count-put operation is not atomic, so if two users try to register the same e-mail at almost the same time (you will have two instances of the application running), then it can happen that they do the count at the same time (returning zero) and so none of both will raise an exception.

The first problem is not a problem at all, just something to be aware of.

The second problem it can be a huge problem. There are two solutions that come to my mind right now, one is to run the count-put in a transaction for new instances (transactions are atomic), and the other one would be to create a mutex around the count-put. To me, this is something I’m not worried about, but you need to know that it can happen.

Apendix: DuplicatedInstanceError

If you are smart, you have probably realiced that I launched an exception of the type DuplicatedInstanceError. This exception does not exist, I created to be able to detect duplicated instances. My implementation is below, but you can use whatever you want:

class DuplicatedInstanceError (Exception):
  def __init__(self, attrpath, value = None):
    Accept a dict of attribute and value pairs, or just one attribute and a value:
       # user unique constraint has been violated (User.nick to be unique)
       DuplicatedInstanceError ('User.nick', 'jsmith')
       # student unique constraint has been violated (student nick's are fin for different schools)
       DuplicatedInstanceError ({
         'Student.nick'   : 'jsmith',
         'Student.school' : 'demo-primary-school'
    self.attrpath = ''
    # accept a dict as the only parameter
    if isinstance (attrpath, dict):
      for (k, v) in attrpath.items():
        self.attrpath += ', ' + str (k) + ' = ' + str(v)
      self.attrpath = self.attrpath[2:]
    # accept a couple of strings
      self.attrpath = str(attrpath) + ' = ' + str(value) 
    self.attrpath = '(' + self.attrpath + ')'
  def __str__ (self):
    return str (self.attrpath)

That’s all!

Trackback URL

, , , ,

  1. Ikai Lan
    17/09/2010 at 9:38 am Permalink

    Hey, nice article. Remember that Memcache’s INCR operation is atomic, so you can create a key and increment – if the number is greater than +1 of the value you started with, you can back off. This kind of optimistic locking is probably a better approach.

  2. Pau Sánchez
    20/09/2010 at 1:40 am Permalink

    @Ikai you are right! I haven’t realized about that 🙂

    Thanks for the comment!

    I think now the solution is more than 3 lines long…

       def put (self):
         # Make sure e-mails are unique for each user
         isnew = (not self.is_saved()) and (User.gql ('WHERE email = :1', self.email).count() > 0)
         if isnew:
           if memcache.incr ("DB::User.email=" + repr (self.email)) is not None:
             raise DuplicatedInstanceError ('User.email', self.email)
         # call the parent method
           db.Model.put (self)
            # always cleanup memcache
            if isnew:
               memcache.delete ("DB::User.email=" + repr (self.email))

    I just writen the code down as I thought it should be… I haven’t tested it.

    I think this is still easy to implement, but now it might be more interesting to create a declarator in Python that does the job for the programmer.

    I’ll have to write another article soon…

    Thanks for the comment.