When writing Django applications, we're accustomed to adding methods to our models
to encapsulate business logic and hide implementation details. This approach feels
completely natural and obvious, and indeed is used liberally throughout Django's
built-in apps:
Here
We're building a domain-specific API on top of the generic, low-level object-relational
mapping tools that Django gives us. This is basic domain modelling: we're increasing
the level of abstraction, making any code that interacts with this API less verbose.
The result is more robust, reusable and (most importantly) readable code.
So, we already do this for individual model instances. Why not apply the same idea to the APIs you use to select collections of model instances from the database?
Here's our application's
Now, let's consider a query we might want to make across this data. Say we're
creating a view for the dashboard of our Todo app. We want to show all of the
incomplete, high-priority Todos that exist for the currently logged in user.
Here's our first stab at the code:
(And yes, I know this can be written as
So, how can we improve on this?
Django has two intimately-related constructs related to table-level operations: managers and querysets.
A manager (an instance of
A queryset (
Phew. Confused? While the distinction between a
This confusion is made worse by the fact that the familiar
The
But the fact that
So how is this problem solved in Django's codebase? Thus, the API lie is exposed: all of the
To see the full horror, take a look at the
We'll return to this API sleight-of-hand shortly...
You can either add multiple extra managers to a model, or you can redefine
Let's try each of these approaches with our Todo application.
The API this gives us looks like this:
Unfortunately, there are several big problems with this approach.
Our API now looks like this:
This is better. It's much less verbose (only one class definition) and the query
methods remain namespaced nicely under
It's still not chainable, though.
Here's what this looks like from the point of view of code that calls it:
We're nearly there! This is not much more verbose than Approach 2, gives the same benefits,
and additionally (drumroll please...) it's chainable!
However, it's still not perfect. The custom
This gives us exactly the API we want:
Except that's a lot of typing, and very un-DRY. Every time you add a new
method to your
This is much nicer. We simply define our custom
The
With a bit of work, we could make it look something like this:
Hopefully you'll agree that this second version is much simpler, clearer and more readable than the first.
This single decorated method definition would make
Personally, I'm not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little "hacky". My gut feeling is that adding methods to a
Perhaps we could go further. By stepping back and re-examining Django's API design decisions from scratch, maybe we could make real, deep improvements. Can the distinction between Managers and QuerySets be removed (or at least clarified)?
I'm fairly sure that if a major reworking like that ever did happen, it would have to be in Django 2.0 or beyond.
>>> from django.contrib.auth.models import User
>>> user = User.objects.get(pk=5)
>>> user.set_password('super-sekrit')
>>> user.save()
set_password
is a method defined on the django.contrib.auth.models.User
model, which hides the implementation details of password hashing. The code
looks something like this (edited for clarity):from django.contrib.auth.hashers import make_password
class User(models.Model):
# fields go here..
def set_password(self, raw_password):
self.password = make_password(raw_password)
So, we already do this for individual model instances. Why not apply the same idea to the APIs you use to select collections of model instances from the database?
A toy problem: the Todo List
To illustrate the approach, we're going to use a simple todo list app. The usual caveats apply: this is a toy problem. It's hard to show a real-world, useful example without huge piles of code. Don't concentrate on the implementation of the todo list itself: instead, imagine how this approach would work in one of your own large-scale applications.Here's our application's
models.py
:from django.db import models
PRIORITY_CHOICES = [(1, 'High'), (2, 'Low')]
class Todo(models.Model):
content = models.CharField(max_length=100)
is_done = models.BooleanField(default=False)
owner = models.ForeignKey('auth.User')
priority = models.IntegerField(choices=PRIORITY_CHOICES,
default=1)
def dashboard(request):
todos = Todo.objects.filter(
owner=request.user
).filter(
is_done=False
).filter(
priority=1
)
return render(request, 'todos/list.html', {
'todos': todos,
})
request.user.todo_set.filter(is_done=False, priority=1)
.
Remember, toy example!)Why is this bad?
- First, it's verbose. Seven lines (depending on how you prefer to deal with newlines in chained method calls) just to pull out the rows we care about. And, of course, this is just a toy example. Real-world ORM code can be much more complicated.
- It leaks implementation details. Code that interacts with the model needs to
know that there exists a property called
is_done
, and that it's aBooleanField
. If you change the implementation (perhaps you replace theis_done
boolean with astatus
field that can have multiple values) then this code will break. - It's opaque - the meaning or intent behind it is not clear at a glance (which can be summarised as "it's hard to read").
- Finally, it has the potential to be repetetive. Imagine you are given a new
requirement: write a management command, called via
cron
every week, to email all users their list of incomplete, high-priority todo items. You'd have to essentially copy-and-paste these seven lines into your new script. Not very DRY.
So, how can we improve on this?
Managers and QuerySets
Before diving into solutions, we're going to take a slight detour to cover some essential concepts.Django has two intimately-related constructs related to table-level operations: managers and querysets.
A manager (an instance of
django.db.models.manager.Manager
) is described as "the
interface through which database query operations are provided to Django models."
A model's Manager
is the gateway to table-level functionality in
the ORM (model instances generally give you row-level functionality). Every model
class is given a default manager, called objects
.A queryset (
django.db.models.query.QuerySet
) represents "a collection of
objects from your database." It is essentially a lazily-evaluated abstraction
of the result of a SELECT
query, and can be filtered, ordered and generally
manipulated to restrict or modify the set of rows it represents. It's responsible
for creating and manipulating django.db.models.sql.query.Query
instances, which
are compiled into actual SQL queries by the database backends.Phew. Confused? While the distinction between a
Manager
and a QuerySet
can be
explained if you're deeply familiar with the internals of the ORM, it's far from
intuitive, especially for beginners.This confusion is made worse by the fact that the familiar
Manager
API isn't
quite what it seems...
The Manager
API is a lie
QuerySet
methods are chainable. Each call to a QuerySet
method (such as filter
)
returns a cloned version of the original queryset, ready for another method to be
called. This fluent interface is part
of the beauty of Django's ORM.But the fact that
Model.objects
is a Manager
(not a QuerySet
) presents a
problem: we need to start our chain of method calls on objects
, but continue
the chain on the resulting QuerySet
.So how is this problem solved in Django's codebase? Thus, the API lie is exposed: all of the
QuerySet
methods are reimplemented on the Manager
. The versions
of these methods on the Manager
simply proxy to a newly-created QuerySet
via
self.get_query_set()
:class Manager(object):
# SNIP some housekeeping stuff..
def get_query_set(self):
return QuerySet(self.model, using=self._db)
def all(self):
return self.get_query_set()
def count(self):
return self.get_query_set().count()
def filter(self, *args, **kwargs):
return self.get_query_set().filter(*args, **kwargs)
# and so on for 100+ lines...
Manager
source code.We'll return to this API sleight-of-hand shortly...
Back to the todo list
So, let's get back to solving our problem of cleaning up a messy query API. The approach recommended by Django's documentation is to define customManager
subclasses and attach them to your models.You can either add multiple extra managers to a model, or you can redefine
objects
, maintaining a single manager but adding your own custom methods.Let's try each of these approaches with our Todo application.
Approach 1: multiple custom Managers
class IncompleteTodoManager(models.Manager):
def get_query_set(self):
return super(TodoManager, self).get_query_set().filter(is_done=False)
class HighPriorityTodoManager(models.Manager):
def get_query_set(self):
return super(TodoManager, self).get_query_set().filter(priority=1)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = models.Manager() # the default manager
# attach our custom managers:
incomplete = models.IncompleteTodoManager()
high_priority = models.HighPriorityTodoManager()
>>> Todo.incomplete.all()
>>> Todo.high_priority.all()
- The implementation is very verbose. You need to define an entire class for each custom piece of query functionality.
- It clutters your model's namespace. Django developers are used to thinking of
Model.objects
as the "gateway" to the table. It's a namespace under which all table-level operations are collected. It'd be a shame to lose this clear convention. - Here's the real deal breaker: it's not chainable. There's no way of combining the managers:
to get todos which are incomplete and high-priority, we're back to low-level ORM code:
either
Todo.incomplete.filter(priority=1)
orTodo.high_priority.filter(is_done=False)
.
Approach 2: Manager methods
So, let's try the other Django-sanctioned approach: multiple methods on a single custom Manager.class TodoManager(models.Manager):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = TodoManager()
>>> Todo.objects.incomplete()
>>> Todo.objects.high_priority()
objects
.It's still not chainable, though.
Todo.objects.incomplete()
returns
an ordinary QuerySet
, so we can't then call Todo.objects.incomplete().high_priority()
.
We're stuck with Todo.objects.incomplete().filter(is_done=False)
. Not much use.Approach 3: custom QuerySet
Now we're in uncharted territory. You won't find this in Django's documentation...class TodoQuerySet(models.query.QuerySet):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class TodoManager(models.Manager):
def get_query_set(self):
return TodoQuerySet(self.model, using=self._db)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = TodoManager()
>>> Todo.objects.get_query_set().incomplete()
>>> Todo.objects.get_query_set().high_priority()
>>> # (or)
>>> Todo.objects.all().incomplete()
>>> Todo.objects.all().high_priority()
>>> Todo.objects.all().incomplete().high_priority()
Manager
is nothing more than boilerplate, and that
all()
is a wart, which is annoying to type but more importantly
is inconsistent - it makes our code look weird.Approach 3a: copy Django, proxy everything
Now our discussion of the "Manager API lie" above becomes useful: we know how to fix this problem. We simply redefine all of ourQuerySet
methods on the Manager
, and
proxy them back to our custom QuerySet
:class TodoQuerySet(models.query.QuerySet):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class TodoManager(models.Manager):
def get_query_set(self):
return TodoQuerySet(self.model, using=self._db)
def incomplete(self):
return self.get_query_set().incomplete()
def high_priority(self):
return self.get_query_set().high_priority()
>>> Todo.objects.incomplete().high_priority() # yay!
QuerySet
, or change the signature of an existing method,
you have to remember to make the same change on your Manager
, or it won't work properly.
This is a recipe for problems.Approach 3b: django-model-utils
Python is a dynamic language. Surely we can avoid all this boilerplate? It turns out we can, with a little help from a third-party app calleddjango-model-utils
. Just run
pip install django-model-utils
, then..from model_utils.managers import PassThroughManager
class TodoQuerySet(models.query.QuerySet):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = PassThroughManager.for_queryset_class(TodoQuerySet)()
QuerySet
subclass as before,
and attach it to our model via the PassThroughManager
class provided by
django-model-utils
.The
PassThroughManager
works by implementing the \_\_getattr\_\_
method, which intercepts calls to non-existing methods and
automatically proxies them to the QuerySet
. There's a bit
of careful checking to ensure that we don't get infinite recursion on some properties
(which is why I recommend using the tried-and-tested implementation supplied
by django-model-utils
rather than hand-rolling your own).How does this help?
Remember that view code from earlier?def dashboard(request):
todos = Todo.objects.filter(
owner=request.user
).filter(
is_done=False
).filter(
priority=1
)
return render(request, 'todos/list.html', {
'todos': todos,
})
def dashboard(request):
todos = Todo.objects.for_user(
request.user
).incomplete().high_priority()
return render(request, 'todos/list.html', {
'todos': todos,
})
Can Django help?
Ways of making this whole thing easier have been discussed on the django-dev mailing list, and there's an associated ticket. Zachary Voase proposed the following:class TodoManager(models.Manager):
@models.querymethod
def incomplete(query):
return query.filter(is_done=False)
incomplete
magically available on both the Manager
and the QuerySet
.Personally, I'm not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little "hacky". My gut feeling is that adding methods to a
QuerySet
subclass (rather than a Manager
subclass) is a better, simpler approach.Perhaps we could go further. By stepping back and re-examining Django's API design decisions from scratch, maybe we could make real, deep improvements. Can the distinction between Managers and QuerySets be removed (or at least clarified)?
I'm fairly sure that if a major reworking like that ever did happen, it would have to be in Django 2.0 or beyond.
So, to recap:
Using raw ORM query code in views and other high-level parts of your application is (usually) a bad idea. Instead, creating customQuerySet
APIs and attaching them to your models with a PassThroughManager
from
django-model-utils
gives you the following benefits:- Makes code less verbose, and more robust.
- Increases DRYness, raises abstraction level.
- Pushes business logic into the domain model layer where it belongs.
No comments:
Post a Comment