Post Archive
› October 27, 2004
Joining the XHTML vs HTML discussions
Starting last week, I have a rather long response to this written. After managing to write about a quarter of what I wanted to say, I realised it was already too long for a single post. So, I decided to make a series of smaller posts on it. Well, let's start with the first one...
From the XHTML standpoint
So, let's look at the choice, XHTML versus HTML, from the standpoint of the XHTML specification. For now, discard the possibility of just changing over the document to HTML, and take XHTML1.0 for given.
XHTML documents may - as you already know - be sent as either XML (meaning content types 'text/xml' or 'application/xml') of XHTML (content type 'application/xhtml+xml'). This goes for any XHTML document, no matter what coding conventions are used in it. In both cases, XML well formedness and syntax rules are a requirement. However, there are some very notable diffences between the two.
One of them is that documents served as XML to user agents that support XML but not XML namespaces - or that support XML and XML namespaces but doesn't have knowledge of the XHTML1 namespace - will be treated as any XML, and will not have any semantics associated. Now, that means that those documents will not have any of the linking, embedding, external file association, scripting, styling, form handling or other special handling that XHTML1 normally gets associated with it. This includes, of course, the XHTML1 default style sheet. Now, the XML semantics still apply, and thus the xml-stylesheet PI, the five default XML entity references, and any other by the user agent supported XML by-technologies, but the document is crippled. Warning for this isn't as far fetched as you may think as it is what happens if you serve an XHTML1 document as XML to Internet Explorer. So, that is not a good idea unless you browser sniff and take meassures against it.
"Well how is the situation with XHTML1 served as XHTML then?" you should probably ask at this point. The answer I'll split in two: current browser problems, and specification problems.
The first is the case where a browser is an XHTML1 user agent but not an XML user agent. It's required to do XML well formedness checking, but that is it, which may cause some troubles. E.g. the single XHTML1 user agent that is not at the same time an XML user agent (MSN for OS X) doesn't support the xml-stylesheet PI.
As for the specification, there are some things of note. For instance, only the id attribute may be treated as a fragment identifier when handled as XML.
Enter HTML compatibility
Regard the XHTML1.0 appendix C. This appendix is the whole reason for why this discussion has arisen. Why? Because it allows us to serve our XHTML1 documents as 'text/html' if they conform to the compatibility guidelines, so that old HTML browsers also can display the documents. This of course leads to problems. Again, let me remind you that we are taking a perspective where the given factor is that the document is XHTML1, and everything else is negotiable.
There are many problems with serving XHTML documents as HTML. One of them is that the default character set differs. Another lies in the SGML declaration implicit in HTML, and the XML declaration. E.g. XHTML documents must be XML well formed but in SGML the XML empty element terminator is treated as tag close and the literal character ">". This might seem like it's nothing, since it's only a problem for real SGML user agents, not for tag soup HTML browsers. But it's just the top of the iceberg. Another problem is invalid element nesting. E.g. if you try to wrap a div element in a p element, tag soup parsers will end the p element before starting the div element while non validating XML parsers will have them wrapped. This means that your code can change in meaning between serving as XHTML and serving as HTML.
There are other changes as well. Notably, the content model of the style and script elements are CDATA in HTML but #PCDATA in XHTML. XHTML is required to be XML well formed in any case, so here we have a real conflict of specifications.
Enter real world support
Well, then we should just add this: The only browsers that have XHTML1 support are Mozilla Gecko based, Opera, MSN for OSX, Safari/WebKit, OmniWeb/WebCore, and Konqueror/KHTML. Those support XHTML1 served as XHTML. Internet Explorer has no support for documents served as XHTML on any platform, has no support for the XHTML1 namespace in XML, and has in fact no XHTML1 support at all. It has the HTML tag soup support that Appendix C was targetting only.
... and what is the conclusion of all this?
An XHTML document that is not XML well formed is not XHTML, so the argument that you should serve XHTML1 documents as HTML to allow for bad formed content is void. An XHTML document should never be served as HTML if it can't be served as XHTML or XML - it should either be modified so that it can be served equally well in either of those two forms as it can in HTML form, or it should not have been XHTML in the first place.
There are incompatibilities between XHTML1 documents served as XML, XHTML and HTML that should not be forgotten. These incompatibilities make XHTML1 documents less useful on the web. Essentially XHTML1 documents should either only target XHTML1 savvy user agents, or they should be made for XHTML1 user agents, tweaked to HTML compatibility. HTML compatibility is a bonus, not something that comes with the XHTML1 format. If you have HTML compatibility as a major factor, use HTML4 instead.
I haven't in this discussion mentioned XHTML1.1 at all. Neither will I now, but check in on the other perspectives when written and I might.
That was everything from the XHTML1 standpoint. Let's see what other perspectives there are in my future posts...
Comments
1. October 27, 2004 10:39 PM
2. October 28, 2004 10:19 AM
patrick h. lauke Posted…
You may send XHTML as text/html but it will never be treated as XHTML in this case.
with simple content negotiation server side, you'd only send it as text/html to old browsers who can't deal with XHTML anyway, and would therefore not treat it as XHTML even if you wanted to...which then makes it a moot point in my eyes. but i know feelings about this are running quite high, so that's all i'll say here.
3. October 28, 2004 11:39 AM
Faruk Ates Posted…
An XHTML document that is not XML well formed is not XHTML, so the argument that you should serve XHTML1 documents as HTML to allow for bad formed content is void. An XHTML document should never be served as HTML if it can’t be served as XHTML or XML - it should either be modified so that it can be served equally well in either of those two forms as it can in HTML form, or it should not have been XHTML in the first place.
Conclusion: your site is a colossal error? You're sending XHTML 1.0 as text/html yet you preach that this is completely unacceptable? Additionally, XHTML1 less useful on the web - excuse me? What about the whole fight for web standards, getting more compliance and awareness on the web, and so forth and so forth? If you're hoping that HTML will be our salvation, you're in for a major disappointment.
What exactly are you trying to accomplish with this article? Do you suggest people revert back to HTML? Do you think they should only use XHTML when they can send it with the right MIME-type? I'm all for paying more attention to the issue of XHTML on the web in its various forms and implementations, but (apologies for my rudeness) I fail to see what message exactly you're trying to convey here, other than that you seem to be complaining about XHTML not being the end-all-be-all solution (yet?) and how it shouldn't really be used even though you're doing so yourself...
4. October 28, 2004 12:42 PM
Curcan Ovidiu Posted…
I don't think the post was anti-XHTML, but rather the idea was that XHTML should be served as application/xhtml+xml (after all, that's that the specs say).
XHTML served as text/html is not treated as XML at all. It's just some fancy-looking HTML.
5. October 28, 2004 03:14 PM
liorean Posted…
Faruk: First of all, I mentioned that this was just a part of the entire discussion I wanted to do on this topic. This was - as stated multiple times - a take on the issue from the XHTML point of view. It deals with the word and intention of the XHTML, XHTML mime types, style sheets in XML, XML names and XML specifications. That is just one of number of perspectives that I will be discussing, to eventually reach a final conclusion.
Second, read through that again, with the thought that the document being XHTML is the single fixed point. Reformulated, you could say it like this: If it is an XHTML document, it is well formed according to the XML rules. No document that is not well formed is an XHTML document even if you have slapped an XHTML doctype on it. So, if you are serving your documents as 'text/html' just for the reason of allowing ill formed content, you are effectively lying - no ill formed content is XHTML, no matter how you serve it. If the document doesn't work just as fine if served as 'application/xhtml+xml' to XHTML1 user agents, then it is not XHTML and should not pretend to be.
Third, I did not at any point state that serving XHTML as HTML is wrong. If you read that out of what I was saying, you didn't listen. I repeat: HTML compatibility is a bonus, not something that comes with the XHTML1 format. This means that if you are using XHTML, your document should be XHTML and XML conformant independent of how you serve it. If that means that it doesn't work when served as HTML, then you have two options - change it to HTML4 and serve it as HTML, or serve it as XHTML or XML instead..
Fourth, I utilise content negotiation. My site is served as XHTML to those that will take it. It works perfectly in that format. It is fully compatible to HTML and conforms to Appendix C, and thus I serve it as HTML if the client doesn't state XHTML support.
Finally, don't be hasty in forming your opinion, I plan to contradict myself a few times in future posts taking other standpoints.
6. October 28, 2004 10:05 PM
Rimantas Posted…
What about the whole fight for web standards, getting more compliance and awareness on the web, and so forth and so forth? If you’re hoping that HTML will be our salvation, you’re in for a major disappointment.
Awareness! That's the word. Now, tell me, how many of those coding in XHTML are aware of what happens if XHTML will be served as application/xhtml+xml 1%? Less? I am pesimistic about this. It does not look like majority of developers in web-shops can tell what DOCTYPE is, not to mention its effect on rendering mode, so no illisions on my side about the knowledge related to MIME types. Am I painting that all to black?
Differences in treating HTML and XHTML are significant and fatal in many cases. I will always prefer HTML4.01 Strict written professionaly to XHTML written by uninformed.
I'd love to see a simple proxy-like service where you can enter URL of the site coded XHTML and get it served with application/xhtml+xml mime type. That would be fun to play with.
I have nothing agains XHTML, only served as text/html it works only by chance, and has absoulutely no technical advantages.
Using it as educational and awarenes rasing tool? Maybe. Only we should not forget the consequences that can be brought in by simple change in one HTTP header. What I am afrain of -- of sencario put down by Hixie:
1. Authors write XHTML that makes assumptions that are only valid for tag soup or HTML4 UAs, and not XHTML UAs, and send it as text/html. (The common assumptions are listed below.)
2. Authors find everything works fine.
3. Time passes.
4. Author decides to send the same content as application/xhtml+xml, because it is, after all, XHTML.
5. Author finds site breaks horribly.(...)
6. Author blames XHTML.
Having in mind what I see now this scentario is more than likely to happen too often. And after step 6 what will the author revert to? HTML? We'd be lucky if so. How about tag soup with "f*** the standards" salt in it?
This is not to say you should stop advocating XHTML. By all means do. But advocating should include necessary "issues waiting ahead" section too.
7. November 8, 2004 06:29 AM
Faruk Ates Posted…
Rimantas, you're obsessing over Hixie's document like far too many others, and ALL OF YOU fail to see that even Hixie himself points out that that point 6 will never, never happen. People working with XHTML already know enough about the whole situation to know that XHTML itself is not to blame, but they are (or, their code is). Stop clinging to hixie's document that was written 2,5 years ago, is thoroughly outdated, hardly relevant anymore and only taken out of context by people trying to prove that using XHTML is dangerous. It's not.
Liorean, sorry about my tone before, but your article made a bunch of claims and failed to really specify what the point behind it was (to me). I was mainly just wondering what you were trying to achieve. :-) no harm intended.
8. November 8, 2004 10:14 AM
Ben de Groot Posted…
While I agree that Hixie's document is outdated, to say that it is "hardly relevant anymore" is just ignoring the real issue. I don't think anybody is "trying to prove that using XHTML is dangerous". Not me, not Anne, not even Hixie. But we do stress that the MIME-type issue is important, and when ignored will create certain problems down the road. That is why raising awareness of the issues involved, and discussing these problems is a good and necessary thing.
9. November 19, 2004 10:49 AM
i need help Posted…
How do you link a video file on a web site and get it to open in the pc's default media player.
10. November 3, 2005 02:50 PM
Wake Posted…
Honestly, I fail to see why everyone argues about this so much. If you are only attempting to display a web page, then you are more than likely going to use xhtml:html as your root element anyway, therefore making it readable by a &lquo;lesser&rquo; user agent when served as text/html.
The only problem is that this breaks any attempt to use elements of a different namespace. XHTML, being a XML rendition of HTML, is HTML. You just have to make sure you follow XML's coding rules. HTML 4 was already trying to make everyone clean up their code anyway, so this is merely the next step.
The problem with serving XHTML as text/html is that you loose the ability to pull in elements from other namespaces (MathML, as the classic example). This is not a problem with XHTML or a page that uses it, but simply the fact that the user agents don't know what to do with it. Just as you shouldn't use proprietary HTML code for compatibility sake, for now, one shouldn't use non-XHTML with XHTML just yet.
However, simply throwing your hands up and saying the heck with it is not the answer. If pages are codes as per XHTML, then they will be ready for what ever comes. User agents will evolve, just as they always have, and with time, more and more will be able to handle XHTML in its proper application/xhtml+xml type. Until then, strip out those PIs, use CSS stylesheets linked by XHTML's link elements, and slap a MIME-type of text/html on it.
...then put in some coding to send it application/xhtml+xml for those 10% of us who use Firefox. ^_^
Rimantas Posted…
I agree with you totally on conclusions. Have been explaing these very things yesterday.
You may send XHTML as text/html but it will never be treated as XHTML in this case.
I have a draft example script prepared which illustrates some effects on rendering depending on DOCTYPE, presence of XML declaration, absence of xmlns attribute for xhtml and MIME type, as well as what happens if XHTML is not well formed.
Script demonstrates that document.write() stop working, but I should add more to this -- paritculary effect of html style comments inside SCRIPT tag.
It is also available only in Lithuanian so far, but I will make an English version.
And I should really tweak my WordPress instalation to make it output HTML4.01 strict instead of XHTML..