Relating one data structure to 0..n of another is the cornerstone of relational database design.
One news feed may contain many articles
By using an intermediary table, a "Many to Many" relationship can be modeled. If I wanted to create some categories for these articles, where:
Categories have many articles, articles may have many categories
It is sometimes useful to attach metadata to the relationship between objects. Say I wanted to implement a white-listing feature, wherein articles would only get added to categories if they matched a list of keywords assigned to that category. This is a good time to use a many-to-many through relationship. The idea here is that we are describing the intersection of two objects:
The intersection of feeds and categories contains an extra piece of data: which filter to apply to that feed before inserting articles into the category
The biggest stumbling block I anticipated was implementing the Many-to-many through relationship of feeds, categories, and keyword filters. Django, however, comes to the rescue with a great implementation:
Accessing related data is a snap:
By default, the FeedCategoryRelationship is not exposed in either the Category or the Feed admin, so we add it using an inline:
One news feed may contain many articles
By using an intermediary table, a "Many to Many" relationship can be modeled. If I wanted to create some categories for these articles, where:
- an article could belong to many categories
- a category could contain many articles
Categories have many articles, articles may have many categories
It is sometimes useful to attach metadata to the relationship between objects. Say I wanted to implement a white-listing feature, wherein articles would only get added to categories if they matched a list of keywords assigned to that category. This is a good time to use a many-to-many through relationship. The idea here is that we are describing the intersection of two objects:
The intersection of feeds and categories contains an extra piece of data: which filter to apply to that feed before inserting articles into the category
Django Implementation
This is essentially the database schema for a project I started this afternoon - Django News. Having written an RSS aggregator in PHP already, it was mostly an issue of figuring out how to implement the same features in Python. I also wanted to add a couple extra features, like infinite-depth categories, a black-list of keywords assignable to feed/categories, and white/black-listing of HTML. Some feeds play nice, containing only links and paragraph tags, while some contain script, embed, img and other tags that you really don't want on your site - so blocking certain HTML elements can be very useful.The biggest stumbling block I anticipated was implementing the Many-to-many through relationship of feeds, categories, and keyword filters. Django, however, comes to the rescue with a great implementation:
class Feed(models.Model):
name = models.CharField(max_length=255)
url = models.URLField()
categories = models.ManyToManyField(Category, through='FeedCategoryRelationship')
source = models.ForeignKey(Source)
last_download = models.DateField(auto_now=True)
new_articles_added = models.PositiveSmallIntegerField(default=0, editable=False)
active = models.BooleanField(default=True)
...
class FeedCategoryRelationship(models.Model):
feed = models.ForeignKey(Feed)
category = models.ForeignKey(Category)
white_list = models.ManyToManyField(WhiteListFilter, blank=True)
...
def perform_download(self):
"""Download articles associated with this feed"""
for category in self.categories.all():
relationship_queryset = FeedCategoryRelationship.objects.filter(feed=self, category=category)
for relationship in relationship_queryset.all():
whitelist = []
for white_list in relationship.white_list.all():
whitelist += white_list.keywords.split(',')
...
Admin Interface
By default, the FeedCategoryRelationship is not exposed in either the Category or the Feed admin, so we add it using an inline:
class FeedCategoryRelationshipInline(admin.TabularInline):
model = FeedCategoryRelationship
extra = 1
class FeedAdmin(admin.ModelAdmin):
inlines = (FeedCategoryRelationshipInline,)
class CategoryAdmin(admin.ModelAdmin):
inlines = (FeedCategoryRelationshipInline,)
prepopulated_fields = { "slug": ("name",) }
No comments:
Post a Comment