Django's ManyToMany

Relating one data structure to 0..n of another is the cornerstone of relational database design.
One to many
One news feed may contain many articles
By using an intermediary table, a "Many to Many" relationship can be modeled. If I wanted to create some categories for these articles, where:
  1. an article could belong to many categories
  2. a category could contain many articles
I would use a many-to-many relationship.
ManyToMany
Categories have many articles, articles may have many categories
It is sometimes useful to attach metadata to the relationship between objects. Say I wanted to implement a white-listing feature, wherein articles would only get added to categories if they matched a list of keywords assigned to that category. This is a good time to use a many-to-many through relationship. The idea here is that we are describing the intersection of two objects:
ManyToMany Through
The intersection of feeds and categories contains an extra piece of data: which filter to apply to that feed before inserting articles into the category

Django Implementation

This is essentially the database schema for a project I started this afternoon - Django News. Having written an RSS aggregator in PHP already, it was mostly an issue of figuring out how to implement the same features in Python. I also wanted to add a couple extra features, like infinite-depth categories, a black-list of keywords assignable to feed/categories, and white/black-listing of HTML. Some feeds play nice, containing only links and paragraph tags, while some contain script, embed, img and other tags that you really don't want on your site - so blocking certain HTML elements can be very useful.
The biggest stumbling block I anticipated was implementing the Many-to-many through relationship of feeds, categories, and keyword filters. Django, however, comes to the rescue with a great implementation:
class Feed(models.Model):
    name = models.CharField(max_length=255)
    url = models.URLField()
    categories = models.ManyToManyField(Category, through='FeedCategoryRelationship')
    source = models.ForeignKey(Source)
    last_download = models.DateField(auto_now=True)
    new_articles_added = models.PositiveSmallIntegerField(default=0, editable=False)
    active = models.BooleanField(default=True)
    ...

class FeedCategoryRelationship(models.Model):
    feed = models.ForeignKey(Feed)
    category = models.ForeignKey(Category)
    white_list = models.ManyToManyField(WhiteListFilter, blank=True)
    ...
Accessing related data is a snap:
def perform_download(self):
    """Download articles associated with this feed"""
    for category in self.categories.all():
        relationship_queryset = FeedCategoryRelationship.objects.filter(feed=self, category=category)

        for relationship in relationship_queryset.all():
            whitelist = []
            for white_list in relationship.white_list.all():
                whitelist += white_list.keywords.split(',')
            ...

Admin Interface

ManyToMany Through Admin
By default, the FeedCategoryRelationship is not exposed in either the Category or the Feed admin, so we add it using an inline:
class FeedCategoryRelationshipInline(admin.TabularInline):
    model = FeedCategoryRelationship
    extra = 1

class FeedAdmin(admin.ModelAdmin):
    inlines = (FeedCategoryRelationshipInline,)

class CategoryAdmin(admin.ModelAdmin):
    inlines = (FeedCategoryRelationshipInline,)
    prepopulated_fields = { "slug": ("name",) }

No comments:

Post a Comment