Note: At the time of writing, Django is at v1.4.1
So what actually happens when you have a Django query like the following?
cities = City.objects.all()
venues = Venue.objects.filter(city__in=cities)
QuerySet
as the input for the __in
clause, and in turn, generates a subquery. The raw query generated is something like this:SELECT ...
FROM "venue"
WHERE "venue"."city_id"
IN (
SELECT U0."id"
FROM "city" U0
)
What's wrong?
In this scenario, nothing at all. We've saved a round trip to the database, all is good. Let's throw caching into the mix.cities = cache.get('cities')
if cities is None:
cities = City.objects.all()
cache.set('cities', cities)
venues = Venue.objects.filter(city__in=cities)
City
QuerySet
gets cached for us like we expect, but when Django gets to the filter()
, it sees that you're passing it a QuerySet
and generates a subquery again. We've effective cached nothing because the exact same query is being run.This also applies when you want to use the same
__in
across multiple filters.cities = City.objects.all()
venues = Venue.objects.filter(city__in=cities)
users = UserAccount.objects.filter(city__in=cities)
Well, how can we fix this?
The reason why Django is doing this is solely because the argument passed intofilter()
is a QuerySet
. If we just pass a list
, Django will use an array of ids inside the IN
SQL.Let's look at an improved version with caching.
cities = cache.get('cities')
if cities is None:
cities = list(City.objects.all()) # Force query evaluation
cache.set('cities', cities)
venues = Venue.objects.filter(city__in=cities)
QuerySet
to be evaluated by converting it to a list
. Now, cities
is a list
, so Django won't try to perform a subquery anymore. This could be simplified to just caching the primary keys as a list
as well, depending on what you're trying to achieve.The resulting SQL will be something like:
SELECT ...
FROM "venue"
WHERE "venue"."city_id"
IN (
1, 2, 3
)
One more gotcha!
We all know that Django'sQuerySet
s are lazy, right? No query is actually performed until the QuerySet
is iterated over. Once a QuerySet
has been iterated over, it won't query again for the results. The results get cached into an internal _results_cache
object.At the moment, Django is not smart enough to detect that a
QuerySet
has already been evaluated before generating a subquery, effectively
causing the query to be run again when we already know the results.Picture this scenario:
cities = City.objects.all()
names = [city.name for city in cities] # QuerySet cache is filled
# ... do something awesome ...
venues = Venue.objects.filter(city__in=cities)
QuerySet
before passing it to the __in
clause, it'd be smart to just use the ids that we've already calculated
from before, but it doesn't. So pay attention and be careful.I've submitted a patch to Django core to try and get that behavior fixed:
No comments:
Post a Comment