[Authors Note: It's been many years since I penned this page and time has moved on leaving xhtml in the dust. For a brief summary see my blogpost http://mitchfincher.blogspot.com/2011/12/html5-is-not-xml-time-to-get-over-it.html]. I leave this page here for historical research reasons. You should be learning HTML5, not xhtml. sigh.
- Some of my favorite Online References for XHTML:
- w3's overview of XHTML
- w3.org's validator checks if a document really is xml
- w3.org's css validator
- w3.org's Cascading Style Sheets
- What is XHTML?
XHTML is a more formal, stricter version of HTML. XHTML is defined by an XML dtd which makes it much easier to handle.
- Advantages of using XHTML instead of HTML
- Documents can be validated much easier
- Documents can be transformed via tools like XSLT into other documents for consumption by devices like handhelds
- Fragments of documents can be retrieved faster
- Text can be stored more effieciently in object oriented databases
- XHTML Versions
XHTML (like Gaul) is divided into three parts (or flavors), transitional , strict, and frames.
- How to convert most HTML pages to XHTML
- Heading lines at top
At the beginning of documents we need to include a few lines:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">The location of the dtd allows validating parsers to check the document. Most browsers will ignore these tags.
- Downcase HTML tags, attributes, and HTML defined values
<BODY BGCOLOR="RED">
becomes<body bgcolor="red">
(Capitols are ok in user defined attribute values like <img src="..." alt="My Favorite Picture">.)
- Attributes values must be in double or single quotes
<ol type=1>
becomes<ol type="1">
or<ol type='1'>
- Every element must have an end tag, even when it doesn't really matter.
<br>
becomes
<input type="text" value="Amazon.com" size="20" ><br />
<input type="text" value="Amazon.com" size="20" />For compatibility with older browsers its best to put a single space before the '/'. Some browsers have trouble with "<br></br>" so its best to use "<br />"
- Every attribute must have a value
<ol compact>
becomes
<input type="radio" name="title" value="decline" checked>decline</input><ol compact="compact" >
<input type="radio" name="title" value="decline" checked="checked">decline</input> - Tags may not overlap
This is <em> emphasized text and <b>bold </em>text</b>
becomesThis is <em>emphasized text </em> is <b>bold text</b>
- Only certain tags may nest inside other tags
Looking at the dtd for xhtml, the definition of the "ol" element is:
<!ELEMENT ol (li)+> <!ATTLIST ol %attrs; type %OLStyle; #IMPLIED compact (compact) #IMPLIED start %Number; #IMPLIED >
This implies that an order list, "ol", element may not contain paragraph tags or body text, just list items.
<ol>
becomes
These are some of my favorite animals:
<li>octopus</li>
<li>shrew</li>
<li>lemur</li>
and my most favorite
<li>meerkats</li>
</ol>
<p>These are some of my favorite animals:</p>
<ol>
<li>octopus</li>
<li>shrew</li>
<li>lemur</li>
<li>meerkats</li>
</ol>What do we do with the phase, "and my most favorite"?
- Ampersands in hrefs must convert "&" to "&" in the URI
<a href="http://www.phonelists.com/cgi-bin/Handler.pl?ListID=Test&Password=test&action=View">Sample List</a>
becomes<a href="http://www.phonelists.com/cgi-bin/Handler.pl?ListID=Test&Password=test&action=View">Sample List</a>
- The attribute "name" becomes "id" when used for a locator inside a document
For example, to reference a section within a document with a URI, we usually do something like
"<a href="favoriteAnimals.html#meerkats">Meerkats</a>"
Inside the referenced section,
<a name="meerkats"><h2>Meerkats of Africa</h2></a>
becomes<a id="meerkats"><h2>Meerkats of Africa</h2></a>
or better yet for backwards compatibility:<a id="meerkats" name="meerkats"><h2>Meerkats of Africa</h2></a>
- Tidy
tidy is a tool to automatically convert HTML to XHTML. You can find it at http://www.w3.org/People/Raggett/tidy/.
Java Section
- GetWebPage.java
Example of using Java to get web pages via http
GetWebPage http://www.cnn.com
- ValidateXML.java
Example of using Java to validate XML pages
ValidateXML http://www.w3.org
- SurveyTaker.java
Example of using Java to punch through surveys
- Misc:
One of many web tuning sites
- Heading lines at top
- Differences between XML and HTML
Since XML and HTML are derived from SGML they are similar, but have the following differences:
- XML is case-sensitive
- XML must have quotes (single or double) around attributes
- Most interpreters of HTML are very forgiving about missing end tags - XML parses are not.
- Comments start with <-- and end with -->. Inside a comment, "--" may not appear. Although this is fine in html, it confuses xml parsers.
It's a good idea to check your source XHTML pages against the validators at w3: