Really Serviceable Syndication

September 20, 2014

What is the goal of RSS?

Disclaimer: in this article, I will almost uniformally use “RSS” to refer to both the RSS and Atom feed formats. For the purposes of this discussion, RSS and Atom are two philosophically different specs to implementing the same behavior.

An RSS feed is an updated XML file containing a list of items (e.g., blog posts for a blog, or episodes for a podcast) with metadata (or full text content). This makes it extremely easy to look at a summary of recently published content (and this makes sense, given the original and official name of RSS is Rich Site Summary), but is this the goal of RSS? It seems like an interesting goal, but I’ve never once heard of RSS being used in this way. Some services like Feedburner and Feedpress elegantly display an RSS feed so it can be used for the purpose, but it’s not reliable—a novice user who clicks on a link to a random RSS feed doesn’t know whether they’ll end up with a blob of XML or a nicely browsable summary, and I suspect users are immediately intimidated when they hit the XML.

RSS is rarely used for displayable summaries of content, so what is it used for? Content subscription. A scheme for providing updates to content should answer the following questions:

What are the new items since X?
What old items been modified since X?

and it should answer them:

concisely
quickly
reliably

Let’s look at how an application could use an RSS feed to answer these questions¹. When a user adds an RSS feed to an application, the client will read in the XML file and cache a representation of the items found (e.g., in a podcast client, it will create episode objects, and proably begin downloading the most recent episode’s audio file), and save a timestamp representing the last time the feed was read. On subsequent updates, the client will read in the new XML file and merge the new file with the old cached content—it is up to the app to create new objects representing new items, and modify or delete the previously-cached ones. So how does it answer the questions raised above? “I dunno, here’s all our content—you figure it out!”. Assuming an app is a perfect citizen and internalizes all of the diffing and merging logic without error, can the app use the feed to infer the answer to these questions concisely, quickly, and reliably?

If the XML file has not been modified since the last time an update was checked and the client is smart about using the “If-Modified-Since” header, the server can just reply with a “Not Modified” response concisely and quickly. However, every time the file has been modified, the RSS feed must provide all (redundant and mostly unmodified) content—if there’s a new episode of The Flop House podcast, the request response must include 258 kb of XML with metadata for all 144 episodes of the show for the client to get the 2 kb for the latest episode. Accessing several such feeds from a mobile device (which is especially relevant for podcast feeds) can take a very long time. If items are modified after they have been cached, different apps implement different behavior for how that should be dealt with, and neither users nor publishers can reliably predict how that content will be treated (if a podcast feed is updated with a new enclosure link, does the client delete the old episode and download the new one? if a blog post’s content is included in full and is modified, does the client show diffs like NetNewsWire and Newsblur?).

I have some thoughts about alternative syndication formats, but nothing fully formed that could more efficiently answer the important subscription-related queries without losing the simplicity of RSS². Do you have ideas? Do you think these issues with RSS are silly? Let me know on twitter.

I have never written an application which reads an RSS feed, so these are all just reasonable guesses. If you have experience with this and have feedback, I’d love to hear about it on Twitter.↩
Something backed by a git repository seems like the obvious thing, and version-controlled XML would give backwards compatiblity with RSS for free, but would lack the understanding of whether a commit represents a new item (in which case it should only be inserting lines at a specific part of the top of the file), old item modification, or feed modification. Maybe the git object model could be used to track the tree of XML…? There’s also the issue that publishers shouldn’t be able to modify content without “announcing” it (e.g., adding a commit if it were git-backed), but should be able to delete content (without it appearing in any public history anymore, as it would if everything was just a public git repo without rewritten history) both on demand and at a regular schedule. I also doubt fetching from a git repo would be as fast as being served a static file.↩