Post Archive
› May 24, 2004
Syndication, the web, the future, XML and flying pigs
Well, in a break from my tradition of posting browser news, I'll turn to the topic of the various syndication formats; web applications; XML based formats in general; where the world is heading; and flying pigs. Well, not so many flying pigs, really, though I can point you to p0nju - PiggyHunter! if you want to read more on that topic.
XML ≠ Web syndication
Well, I guess anyone reading Webgraphics is also reading mezzoblue, but in any way, I'll refer you to the What is RSS/XML/Atom/Syndication? post Dave wrote recently. What I want to bring to your attention is the fact that the image most often used to link the feed is labelled "XML", not "RDF", "RSS", "RSS2", "Atom" or anything else, and that RSS is not a homogenous format which makes it even worse. The problem here is one of degree of complexity - XML is a general syntax for a structure, RDF is also a generic grammar and syntax, while all the syndication formats are vocabularies for the syntaces of XML/RDF. It's a clear mistake to associate the term XML with syndication feeds, because XML is so much wider than that in terms of areas of application. An example of why this is a mistake is a recent test I did, sending XHTML as 'text/html', 'application/xhtml+xml', 'text/xml' and 'application/xml', and testing XHTML support, whether browsers support xml-stylesheet PI, whether scripts are loaded, and whether DOM added scripts were executed and styles were applied. OmniWeb 5.0 beta 2 managed to do all of those things, surprisingly since Safari doesn't execute scripts added using the DOM. However, recent OmniWeb 5.0 betas have added web syndication to it's set of features, and now all XML is treated as were it a feed, effectively destroying your possibilities of using XML for anything else than syndication.
Syndication formats
There are numerous syndication formats - RSS being the most common, though it's split up over numerous versions. Atom and it's earlier cousins also have a large representation on the web. Named formats are XML based, and rather verbose. ESF is another, very slim syndication format, being raw text based. These formats, and one other, are the ones I will discuss here.
As I see it, a feed is supposed to contain a minimum amount of content. It should contain relevant entries since last update, whether as links to the full entries or having the full text contained. It should contain a datetime stamp for each entry within. It should contain the entry title. It should contain feed title, contact and link. If it provides the full entries it should contain author data. Presentation should be up to the client.
What I on the other hand don't want to see in a feed is data that is neither the entry contents, entry metadata, nor feed metadata. That means excessively verbose XML and data sent at XHTML. That means xml-stylesheet PI.
Of mentioned formats, ESF is the one that comes closest to meeting those goals. So close, in fact, that I would say it's single bad point is that it is limited to sending the entry addresses only. Well, many will disagree with me about this, but if you want more than that, you have a format that is explicitly made for it - (X)HTML. And that's where I, not for the last time today, turn to Ian Hickson. On his site you can find, among other goodies, a working draft for another syndication format based on XHTML, cowritten with Tantek Çelik. This format, HSL (HTML Syndication Format), provides any features from HTML that you might want to add to your feed. It uses XHTML elements and attributes to signify a feed that may be embedded inside any XML document. It allows for XHTML contents and the same facilities for styling. This format is what you want if you need your feeds to be nicely rendered in browsers. Atom or RSS is what you want if you want to send entry contents with the entry, but don't need XHTML contents. ESF is what you want if you are just going to link to the entry. That is my view on it.
Web applications
I've said before that the web will not be a battlefield for Microsoft or others for some time. Well, I'll repeat that again. However, web applications relate to user interface and not to documents, which is what distinction I make between the web and web applications. And as I make that distinction, I say with confidence that XAML, XUL, Java, Flash, PDF or even SVG will make no larger difference on the web until Mac OS 9 and Windows versions before Longhorn are largely phased out. Sure, the browsers and plugin manufacturers will try. Internet Explorer is our roadblock. As long as versions 6 and below are in any kind of majority, we will see the web stay like it is. There!
However, the web applications scene is in for a shaky future. If the wrong decisions are made, we will see a proprietary Microsoft controlled technology, whether open or closed, emerge as a de facto standard for web applications even though we have agreed on a single champion against it. Why? Because, while Internet Explorer is not necessarily as major a factor here, it will be what most people use to access web applications.
Web applications are of course lifted to the surface as the W3C are holding a workshop on it in a week, and the position papers that are available. Note especially the joint position of Mozilla and Opera, and the non-surprising positions of Microsoft, Adobe, IBM and the companies working with mobile phones and/or PDAs. Also note that Macromedia is absent.
I can see a trend here - all these heavyweight companies seems to leverage the technology they are working with/on and have spent millions of R&D money on, but none of them take the single most important factor into count - the user. Users live on Windows. Users use current versions of Internet Explorer, and will still do so for years from now. Windows Longhorn might be fitted with a new version of Trident (the rendering engine from Internet Explorer for Windows), but those glimpses we have seen have not contained one such. And even if Longhorn does bring us a new browser, how long will it take until that browser is mainstream? No, the single component of Longhorn relevant to this discussion is Avalon. XAML and the Microsoft Infopath are the future directions taken by Microsoft. Those are the technologies that will reign the web applications world if we remain at status quo.
What opposition do we have, then? Well, prominent among the technologies are SVG and XForms. PDF, Java and Flash are possibilities for proprietary opposition. And then we have Opera and Mozillas sole positions, of course. Of these technologies, in my opinion, the proprietary are worth a glance but nothing more. Java has fallen out of grace, and not even Sun are promoting it as a possibility. Macromedias position we know nothing of, but I would not be surprised if they would promote Flash. Adobe positions themselves with SVG and not PDF, of course. However, SVG is not the ideal solution. The SVG WG seems to never looks at the works of others, and the standards they develop are not interoperable on the level that XML should be. XForms on the other hand is better when it comes to interoperability, but lacks support and already competes with two similar technologies, notably one I've already mentioned, Microsoft Infopath. And it's overly complicated. Looking again at Hixies website, we can find a much more appealing specification to base it on in Web Forms 2, an extension of the HTML Forms system.
Well, only the Mozilla and Opera position left, then. Their position is clear: Backwards compatibility is of the essence. No plug-in should be needed for current Internet Explorer versions to support it. It should be based on HTML/XHTML, ECMAScript and the DOM. The technologies are already device independent and should not need profiling. Again, Hixie provides an entry on the issues and links to a document of David Barons on the fragmentation of the web. I don't believe this standpoint will win out in the W3C, but I believe it is the one standpoint that could effectively give the W3C control of the future web applications, instead of building a technology that does not work on current Microsoft software, and thus allowing Microsoft to roll their own de facto standard.
My belief is that SVG will win out as base technology here. The browser developers are guarding their rear and implementing SVG already, but requiring a plug-in like Adobe's SVG Viewer is not enough for getting to the Internet Explorer based audience. It's not backward compatibility per se that is needed - it's native Internet Explorer compatibility. And SVG does not provide that.
Finally, I want to again point to something you can find on Hixie's website - Web Applications Markup Language. I'm not sure that the browser vendors will go with the W3C approved technology on this - they don't want to lose this war. Better than to lose to Microsoft because of a bad decision on W3Cs part, they may chose to leverage a technology that will work in both their browsers and in Internet Explorer, if you add some HTCs to augment it's behavior. This technology may be it. And that's all the speculation in this area that I'm willing to do today...
XML based formats
I don't know if it's only me, but I see a clear overuse of XML formats. XML is a good format for creating hierarchical stuctures. It's benefits are that it has a strict requirement on the syntax of the document, that it allows XML based formats to be extended with new attributes and elements in new versions without being backwards incompatible, that it can mix with other XML languages in other namespaces, that it has native association of style sheets, links and non-parsed data (through CDATA blocks), that it can use a single parser for many different languages. It's drawbacks are that it is overly verbose, lacks error recovery, doesn't allow parsable HTML content (unless given content is XML well-formed), contains lots of redundant code, and requires a heavier parser than really necessary for most formats.
Thus, when people want fuzzy XML parsing (that is, they want error recovery) because their format would benefit from fuzzy parsing, my answer is that the strict well-formedness constraints are part of what makes XML XML. If you want to get rid of the well-formedness constraints you should instead turn your head towards another base syntax, XML does not allow what you wish for. (Involed people all know who I mean and which language I happen to be thinking of.) Also, as can be seen with the syndication formats, ESF is really much slimmer than it's equivalent XML format would be. CSS is much slimmer than it's XML equivalent would be. JavaScript is much slimmer than it's XML equivalent would be. Of those languages, JavaScript is probably the one that matches XML most in similarity of syntactical concepts, especially with it's hierarchical object model and it's variable structure - and though XML can be used to describe JavaScript, it's far from optimal, because the language is not used in a way that benefits from XML parsing and handling. The more syntactically limited, linear serial structure model of CSS or EFS would fit XML even worse.
In fact, with the exception of data appearing in a highly variable structure when that data is handled as data and not further processed, XML is seldom a good base format choice. YAML or JSON are more program close for most processing languages, require a far smaller and slimmer parser, and are often more easily readable. A plain text based format such as ESF requires a custom parser, but that parser is still more effective than using a pre-written XML parser for a format with the same purpose would be. No, such languages are not what XML is made for.
Well, enough negativity. XML is a great format when used right. It's easier to read than most other hierarchal languages, especially when deeply nested. It's verbose and consistent, so it makes understanding nested hierarchical structures easier than what it might be in other languages. It's unambigious and strict, which makes catching structure errors easier. It has the possibility of validating, to ensure that your document means what you believe it means. It's easy for human reading, and not that bad for human writing.
Comments
1. May 24, 2004 02:21 PM
2. May 24, 2004 02:21 PM
Andrew Posted…
This is a good survey of these issues. But discussions like this often have an all-or-nothing tone to them: NO format is good enough until stadards are settled, NO language or *ML is safe to use until ALL old browsers are gone. Wait until Longhorn before deciding on anything. And so on. I know these aren't your words, although you seem primarily concerned with things that "will [or won't] make larger difference on the web..."
This stuff matters a lot if you're building large-scale, commerical projects that will get an unpredictablly wide swath of users, browsers, and OSs, a situation that won't change even when Longhorn arrives. Frankly, that's not really what most developers do. Intranet projects, special-purpose applications for a small user base, or other definable user populations might be good places to use other formats or tools.
3. May 24, 2004 03:34 PM
Paul G Posted…
Disclaimer: I'm not trying to rip into the author, these are just the thoughts that run through my head as I read this fascinating article, which I'll admit is a bit over my head in places
I often wonder at the amount of effort that goes into arguing/defining/navel-gazing the finest of fine points in technological circles. I am as guilty as anyone else, but I do think that maybe we spend too much time rabbling about the details when most users don't know what "RSS" stands for, don't know what that gibberish on the screen is when they click on one of the orange "XML" buttons that they see on so many sites these days, and couldn't define "web syndication" if their life depended on it.
Maybe we should focus on helping users rather than arguing over the technical merits of one format vs. another. Syndication is a powerful information-gathering tool once you learn how to use it. It is capable of cutting through the oceans of irrelevant data (with respect to the individual user. What is irrelevant to me is not necesarily irrelevant to you) on the web and bringing the content you want directly to you. No ads, no spam, no email address required. Unfortunately, I think it's still too hard for most users to figure out. The "XML" label does a piss-poor job of properly explaining what exactly you're looking at when you click on the orange button. Most users probably click it once, then ignore it completely, not realizing that they just had a brush with some genuine usability on the web.
I have written a bit more on this recently on my site: XSL for RSS: enhancing the news feed
Okay, a bit more on topic: I had similar thoughts to JonB's as I read the specification for HSL. Isn't that what we already have? I was interested to read about ESF, since I have come late to the syndication party and am not exactly up to speed. It sounds interesting, but I think it suffers from the same usability/user-education problems stated above.
4. May 24, 2004 03:41 PM
Anne Posted…
Jon B misunderstands XHTML.
liorean, would it be possible to make multiple short posts in the future? I really like what you are saying, but I think there is such a rule like: "keep posts short" ;-).
5. May 24, 2004 04:39 PM
Jon B Posted…
Anne, I don't misunderstand XHTML, I know exactly what it is and can describe it in great lengths with superior attention to detail ;o)
My point was based on:
QUOTE - "HSL (HTML Syndication Format), provides any features from HTML that you might want to add to your feed. It uses XHTML elements and attributes to signify a feed that may be embedded inside any XML document. It allows for XHTML contents and the same facilities for styling. This format is what you want if you need your feeds to be nicely rendered in browsers."
Although I'm starting to think differently now - maybe I misinterpreted what he said and in actuality feeds would only be styled beautifully in a web browser, otherwise it would be embedded in a XML document (or just called ???.xml since clean XHTML qualifies as XML compliant). If this is the case then it is potentially a great thing because the markup needen't be changed for web or feed display in the same way as it currently has to - maybe I should read up on it a bit more before shouting my mouth off. Still, Iike to cause a little controversy, makes me more memorable ;o)
6. May 24, 2004 06:15 PM
liorean Posted…
Jon B: The comments on HSF were prompted by posts such as Another crack at user-friendly feeds and Exploring pretty feeds. I really see no reason to serve such a feed instead of using (X)HTML, and if you want to serve a feed that is displayed like that if you visit it using a browser, HSF is your answer. HSF is made to be usable as XHTML websites by browsers, but still have the simplicity and required features of a syndication feed. If you on the other hand are using RSS, Atom or ESF, you should keep it down to a syndication feed, not make it behave like a webpage - we already have a language for webpages.
Paul G: Well. the thing is that an HSF feed can do what your RSS feed and that XSL transformation sheet does, without needing transformation, and is still not much larger. It's got the benefits of RSS for syndication purposes, but is vastly superior in it's full capabilities. What I'm getting at is, RSS is redundant in that manner, as a parser in a feed reader wouldn't have any more troubles with the HSF feed than it would have with the RSS feed. As for ESF, it's little different, since it's a such small format. It's as close to the ultimate format for the task as you are going to get.
When it comes to the user education issues, that's not really the problem of the feed format. It's the problem of users just throwing in those "XML" buttons without even providing a link to an explanatory "What is this?" page somewhere.
Anne: Then you're probably glad I decided to drop the non-computer devices, non-visual representations rant I had a great idea for that vanished in between Friday when I began writing that, and today when I finished it...
7. May 24, 2004 07:01 PM
Paul G Posted…
Ah, I think I understand a bit better now. It's more like a consolidation of technologies already available, using a language that's already out there.
It seems to me that the ideal would be to have syndication be like an extension or evolution of bookmarks. Something built into the browser (I currently use a plugin for Firefox, but a few shortcomings keep it from being everything I want), a "Press Ctrl-D to subscribe to this site" sort of ease-of-use. Ideally, this would work on any page of the site: Create a bookmark to that particular page, but also autodiscover and subscribe to the site's news feed. Also useful would be to be able to put a link to the feed that, when clicked, adds the bookmark, rather than displaying the XML/XHTML/ESF.
Hmm. Definitely bears more pondering on my part. See? I told you some of this stuff was a bit over my head :)
Jon B Posted…
Ok, well it strikes me as odd that someone should decide to create a syndication format that is based on XHTML. The key benefit to feeds is they come through unstyled allowing for a coherent look between entries depending on the needs of the user. When you start styling them with XHTML markup and people start receiving posts in a desktop application, what you have essentially is similar to what we call 'websites'.
Maybe that is the idea - to force a recreation of the whole web into a more semantically correct and clean format. Except that if feed readers are just apps built around browsers like Internet Explorer as it stands at the moment then the XHTML rendering is still going to fail to comform to the w3c standards.
I love RSS and RDF and maybe even Atom too, but it is for their simplicity and ease of use. I have around 200 feeds I'm subscribed to, and if each of them started to use a syndication format based on XHTML I will be back to where I started with loads of awkward, differently styled bits of info, except this time with a bookmarks/favourites list that downloads itself in one go.