XOR Media

Coding, Operations, Etc.

Django Model Validation On Save

Posted on Sat 05 January 2013 by

In what is probably my biggest WTF with Django to date, it doesn't validate your models before saving them to the database. All of the necessary code exists and when a dev sets up her models she usually adds the relevant validations using EmailField, URLField, blank, null, unique, ..., but unless you explicitly add code the constraints won't be enforced (adequately.) Some things will be caught with IntegrityErrors, but not everything and not consistently.

Since the validation code is sitting there waiting to be hooked up the only reason I can imagine for not having it by default is backwards compatibility. That seems to be the reason given elsewhere. It's a big enough problem in my mind to deserve a breaking changing in a 1.x release with a configuration variable to disable it and a large print warning in the release notes. If not that it at least needs to be featured very prominently in the getting started and general documentation. If it's there it's not obvious enough that I've run across it and Google doesn't seem to point there when searching for relevant terms either. Oh well.

So now that I've told you how I feel about it, lets get to what to do about it. You have two basic options. A signal or a base class. Both have advantages and dis-advantages and I'll quickly list the ones that come to mind as we look at the necessary code.

Pre-Save Signal

from django.db.models.signals import pre_save

def validate_model(self, instance, raw=False, **kwargs):
    if not raw:
        instance.full_clean()


pre_save.connect(validate_model, dispatch='validate_model')

Ignoring the fact the method is called full_clean, which seems better fit for ModelForm checking than Model enforcement, the above code will check all models used by your app. We connect a handler to the model pre_save signal and on each call will make a call to full_clean unless we're saving in raw mode (from fixtures.)

The pre_save signal will be sent out for every object being saved whether it's one of ours or an upstream dependency's. That's both the advantage and disadvantage of this method. If you use it from the start all of your code will handle ValidationErrors and as you bring in 3rd-party apps/code you'll be able to quickly see if it causes problems for them. But you can run in to problems.

You also shouldn't use this method if you're developing a shared app as it would cause anyone who uses that app to unexpectedly start seeing ValidationErrors, even if it's for their own good. You could add senders to the connect calls for each of your models, but at that point you're better off going with the mixin below.

In my use of the signal approach I've run in to a problem with custom Celery Task states. Celery's docs give examples of arbitrary task states, but when full_clean is called on them on their way to their backing store a validation happens that complains about non-standard values. The easiest way I could find to deal with it was to have a list of opted out models, it's not the cleanest thing in the world, but it gets the job done.

dont_validate = {'TaskMeta'}


def validate_model(self, instance, raw=False, **kwargs):
    cls = instance.__class__.__name__
    if not raw and cls not in dont_validate:
        instance.full_clean()

Mixin With An Overridden save

from django.db import models


class ValidateOnSaveMixin(object):

    def save(self, force_insert=False, force_update=False, **kwargs):
        if not (force_insert or force_update):
            self.full_clean()
        super(ValidateOnSaveMixin, self).save(force_insert, force_update,
                                              **kwargs)


class Employee(ValidateOnSaveMixin, models.Model):
    name = models.CharField(max_length=128)
    # need to specify the max_length here or else it'll be too short for
    # some rfc emails
    email = EmailField(max_length=254, unique=True)

Basically the same logic, but here it's explicit which models are going to be validated. This is essentially the opposite of the signal approach. You don't have to worry about other models validating correctly or code working with them handling ValidationErrors, but you do have to explicitly include ValidateOnSaveMixin in each model's hierarchy. You'll also have to take a bit of care if you override the save method in any of the classes where the mixin is used to make sure you do things in an appropriate order and that the mixin's save method is called.

Things to Watch For

One thing to consider with either of these approaches is that you cannot rely on pre_save signals or field save methods to make objects valid. Both would happen too late. In the case of the mixin, after we've called full_clean and pass things up to super. with the pre_save signal field's save methods are called at a later point and there's no assurances on the order of signal handlers so you can't rely on the fixers being called before validate_model.

Unit Testing

I'm fan of thorough unit testing and this is a place when it can come in extra handy and the tests are trivial to write. You don't have to test the actual validation unless you're doing something custom, you can hope/assume that the Django unit tests have that covered. You can/should check that validations are being invoked.

from django.test import TestCase
from django.core.exceptions import ValidationError


class EmployeeTest(TestCase):

    def test_validation(self):

        with self.assertRaises(ValidationError):
            Employee(name='Bob', email='this.is.not.an.email').save()

That's enough of a smoke test to tell you whether or not the validation mixin or signal is getting called. If 6 months down the road you tweak the signal handler or change the inheritance hierarchy you'll have tests in place to make sure that things are still being validated.


About the Author

Ross McFarland Ross McFarland | | |

Ross is a 17 year veteran of the software industry with experience spanning low-level signal processing, web and mobile user interfaces, high-scale distributed web services, infrastructure, and networking. He has made extensive contributions to open source highlighted by his time as a primary maintainer of Gtk2-Perl and author of requests-futures and python-asynchttp libraries. (more)