A QuerySet, in essence, is a list of objects of a given model. I say ‘list’
and not ‘group’ or the more formal ‘set’ because it is ordered. In fact, you’re
probably already familiar with how to get QuerySets because that’s what you
get when you call various Book.objects.XXX() methods. For example, consider
the following statement:
1
Book.objects.all()
What all() returns is a QuerySet of Book instances which happens to
include all Book instances that exist. There are other calls which you probably
already know:
123456789
# Return all books published since 1990Book.objects.filter(year_published__gt=1990)# Return all books *not* written by Richard DawkinsBook.objects.exclude(author=''RichardDawkins'')# Return all books, ordered by author name, then# chronologically, with the newer ones first.Book.objects.order_by(''author'',''-year_published'')
The cool thing about QuerySets is that, since every one of these function both
operates on and returns a QuerySet, you can chain them up:
1234567
# Return all book published after 1990, except for# ones written by Richard Dawkins. Order them by# author name, then chronologically, with the newer # ones first.Book.objects.filter(year_published__gt=1990) \
.exclude(author=''RichardDawkins'') \
.order_by(''author'',''-year_published'')
And that’s not all! It’s also fast:
Internally, a QuerySet can be constructed, filtered, sliced, and generally
passed around without actually hitting the database. No database activity
actually occurs until you do something to evaluate the queryset.
So we’ve established that QuerySets are cool. Now what?
Return QuerySets Wherever Possible
I’ve recently worked on a django app where I had a Model that represented a
tree (the data structure, not the christmas decoration). It meant that every
instance had a link to its parent in the tree. It looked something like this:
This worked pretty well. Trouble was, I had to add another method,
get_larger_ancestors, which should return all the ancestors whose value was
larger then the value of the current node. This is how I could have implemented
this:
The problem with this is that I’m essentially going over the list twice – one
time by django and another time by me. It got me thinking – what if
get_ancestors returned a QuerySet instead of a list? I could have done
this:
Pretty straight forward, The important thing here is that I’m not looping over
the objects. I could perform however many filters I want on what
get_larger_ancestors returned and feel safe that I’m not rerunning on a list
of object of an unknown size. The key advantage here is that I keep using the
same interface for querying. When the user gets a bunch of objects, we don’t
know how he’ll want to slice and dice them. When we return QuerySet objects
we guarantee that the user will know how to handle it.
But how do I implement get_ancestors to return a QuerySet? That’s a little
bit trickier. It’s not possible to collect the data we want with a single
query, nor is it possible with any pre-determined number of queries. The nature
of what we’re looking for is dynamic and the alternative implementation will
look pretty similar to what it is now. Here’s the alternative, better
implementation:
Take a while, soak it in. I’ll go over the specifics in just a minute.
The point I’m trying to make here is that whenever you return a bunch of
objects – you should always try to return a QuerySet instead. Doing so will
allow the user to freely filter, splice and order the result in a way that’s
easy, familiar and provides better performance.
(On a side note – I am hitting the database in get_ancestors, since I’m
using self.parent recursively. There is an extra hit on the database here –
once when executing the function and another in the future, when actually
inspecting the results. We do get the performance upside when we perform
further fliters on the results which would have meant more hits on the database
or heavy in-memory operations. The example here is to show how to turn
non-trivial operations into QuerySets).
Common QuerySet Manipulations
So, returning a QuerySet where we perform a simple query is easy. When we
want to implement something with a little more zazz, we need to perform
relational operations (and some helpers, too). Here’s a handy cheat sheet (as
an exercise, try to understand my implementation of get_larger_ancestors).
Union – The union operator for QuerySets is |, the pipe symbol. qs1 | qs2
returns a QuerySet with all the items from qs1 and all the items in qs2
while handling duplicates (items that are in both QuerySets will only appear
once in the result).
Intersection – there is no special operator for intersection, because you
already know how to do it! Chaining functions like filter and exclude are in
fact performing an intersection between the original QuerySet and the new
filter.
Difference – a difference (mathematically written as qs1 \ qs2) is all the
items in qs1 that do not exist in qs2. Note that this operation is
asymmetrical (as opposed to the previous operations). I’m afraid there is no
built-in way to do this in python, but you can do this:
qs1.exclude(pk__in=qs2)
Nothing – seems useless, but it actually isn’t, as the above example
shows. A lot of time, when you’re dynamically building a QuerySet with
unions, you need to start off with what would have been an empty list. This
is how to get it: MyModel.objects.none().
No comments:
Post a Comment