Notes from a lecture I gave on what is XML circa 2000. The discussion on DTD is still valid, but superseded by xml schemas.
- Pedantic Overview
What we will learn today:
Why use XML?
What is a tag?
What is an element?
What does it mean that an XML document is "well formed"?
What does it mean that an XML document is "valid"?
Disclaimer: These lessons are streamlined to teach how to read and write XML files. The real definition of XML is at http://www.w3.org/TR/REC-xml
- Introduction
XML is an acronym for "eXtensible Markup Language". XML has many advantages over other data formats:
Error checking is done automatically
Data can be shared by many different programs
Extensive programming libraries are available for free
Microsoft is making it the foundation of ".NYET". (Speaker at XMLOne: "If you cut someone at Microsoft, they bleed XML").
- Naming
Names of the basic building blocks of XML, elements and attributes, are restricted by the following rules:
- Must be composed only of letters (upper and lower case), numbers, hyphens(-), underscores(_), colons(:),and periods(.)
- Must start with a letter, '_', or ':'
- Cannot begin with "XML" (lower or uppercase or any combination)
(Although not formally required, colons are used for namespaces)
Quiz - which are good names?:- survey1
- red car
- stoplight4_1
- -go
- go!
- _go
- 4me
- Start Tags
A start "Tag" is a name that is prepended with a "<" and appended with ">". For example the paragraph tag, "<p>".
Note: Tags are case sensitive. <House> is different from <house> or <housE>
Quiz - which are nice start tags?:- <car>
- <Car>
- >Car<
- <caR>
- <test one>
- <test_2>
- <test3
- End Tags
An end "Tag" is a name that is prepended with a "</" and appended with ">". For example the paragraph end tag, "</p>".
- </car>
- <Car/>
- >Car<
- Elements
Elements have a start tag, optional content, and an end tag. The end tag is the same as the start except it's prepended with a "/". e.g.,
<p> I'm an element</p>
<car>...content...</car>In common usage "tag" and "element" are often interchanged, although technically an element is composed of a start tag, content, and an end tag.
If a tag has no content, you have the option to abbreviate the end tag by placing a "/" before the closing ">" in the start tag. For example:
<br />Quiz - which are nice elements?:
<hr />
<meta/>
- <car>
- <art>...contents...</art>
- <dna></dna>
- <coffee>...contents...</Coffee>
- <tea>...contents...<tea/>
- <hr/>
- </ol>
- Element Nesting
Elements are typically nested inside one another.
<book name="The Persian Expedition">
<author name="Xenophon" />
<chapter title="Persia Awaits">
<p>It was the spring of the year...</p>
</chapter>
</book>
An important thing to remember is that elements may not overlap:
The following is incorrect:<p> The <em>most <b>important </em> thing </b> is ...
- Attributes
"Attributes" are name-value pairs inside start tags. An example is the "border" attribute of the table tag,
<table border="1">
The general format is
General Notes<tag name1="value1" name2="value2" ... >
The attribute must be surrounded by quotes (unlike in HTML)
Either single or double quotes used to surround the attribute value. Single quotes may be embedded inside double quotes and visa versa. But the same type of quote must be used to surround the value.
Most any characters may be placed inside the attribute value except special characters like < and >.
XML purists advocating using very few attributes and putting most information in child elements.
- Quiz - which are nice attributes inside these start tags?:
- <table bgcolor=red>
- </table width="80%">
- <img src='mypic.png" />
- <img src=mypic.png" />
- <br clear="all">
- <div class="codeblock" align="left">
- <meta http-equiv="pragma" content="no-cache" />
- <prefix prefixtype=">" />
- Entities
Entities are string variables. Entities come in two flavors, general and parameter. They are both defined in the DTD.
- General
Although General Entities are defined in the DTD, they are referenced in your XML document. HTML coders will recognize "&" as an example of a general parameter. Another example is the copyright symbol, ©, ©.
Entity references start with "&" and end with ";". The syntax for creating your own Entity in the DTD is
An example would be<!ENTITY Name EntityDefinition>
<!ENTITY Computer "DELL">
Now everywhere "&Computer;" appears in your document, it will be replaced with "DELL".
- Parameter
These live only in your DTD. If you are repeating a series of attributes many times inside different elements, it may be good to define a parameter entity for them. For example, if many of your elements have an x, y, and z dimension, you could replace the tedious (and perhaps error prone) repeating of them by putting them in a parameter.
<!ENTITY % dimensions "x CDATA #IMPLIED y CDATA #IMPLIED z CDATA #IMPLIED">
- General
- Well Formed
An XML document is "Well Formed" when it follows the general rules of XML. For simple documents these include:
All elements have a start and an end tag.
All elements are properly nested.
All name-value pairs in attributes are properly formatted with the appropriate quotes.
How to test for well formed documents? Open it in IE5.0 and it will complain if it is not.
- Valid
"Valid" XML documents are well formed, but also conform to a specific Document Type Declaration (DTD). The DTD definition includes rules on
How elements may be nested
What attributes an element may have
The contents of those attribute values
Variables that have been defined
A single DTD may be used for many documents. Millions of XHTML documents may all use the same DTD.
How to test for valid documents? Our friends at Microsoft have a plugin for IE5 to validate files. Visit http://msdn.microsoft.com/downloads/default.asp?URL=/code/topic.asp?URL=/msdn-files/028/000/072/topic.xml and download the "Internet Explorer Tools for Validating XML and Viewing XSLT Output" package. When you right click on an xml document it will have an option to validate the document.
- Document Type Declaration (DTD)
The two major components of a DTD are the "Element Type Declarations" and "Attribute List Declarations".
- Element Type Declarations - general
One of the great things about xml is that you can define exactly what can be inside it and the order of its contents.
Inside a DTD, 'Element Type Declarations' describe what an element may contain.
The general syntax isThe contents can be one of the following:<!ELEMENT elementname contents >
- list of elements - e.g., (apple|banana|pear)
- EMPTY
No value is contained inside the element. - ANY
Anything can be inside. Dangerous, in a way, but useful sometimes. - mixed-content - character data and elements
- character data
Examples:
(The use of ANY is discouraged, but sometimes it could be helpful).<!ELEMENT br EMPTY>
<!ELEMENT container ANY> - Element Type Declarations - with children
Example:
<!ELEMENT book (author,chapter+)>
The allowable children are listed in a group inside parenthesis.
Special operators tell xml how many of each type are allowed and in what order.
- Element Type Declarations - exercise 1:
Operator Description () groups elements , separates items that must appear in this order | or operator ? 0 or 1 elements * 0 or more elements + 1 or more elements Given the following ETD:
Quiz - which are valid contents of survey?:<!ELEMENT survey (Head*,Page+)>
- <Head /><Page /><Page /><Page />
- <Head /><Page /><Page />
- <Head />
- <Page /><Page /><Page />
- <Page /><Page /><Page /><Head />
- Element Type Declarations - exercise 2:
Given the following ETD:
Quiz - which are valid contents of survey1?:<!ELEMENT Page ((Question*|p*)*,Buttons*)>
- <Page></Page>
- <Page><Buttons /></Page>
- <Page><Question /><Buttons /></Page>
- <Page><Question /><Question /><Buttons /></Page>
- <Page><Question /><Question /><Buttons /><Question /></Page>
- <Page><Question /><p /><p /><Question /><Buttons /></Page>
- Element Type Declarations - plain ol' text
Some children will contain just regular old plain text. You declare these with "#PCDATA".
For Example:Given the following DTD fragment:<!ELEMENT QuestionText (#PCDATA)>
Quiz - which are valid contents of survey1?:<!ELEMENT survey1 (Head*,Page+)>
<!ELEMENT Page ((Question*|p*)*,Buttons*)>
<!ELEMENT Question (QuestionText*)>
<!ELEMENT QuestionText (#PCDATA)>
- <Page><Question><QuestionText>What is your name?</QuestionText></Question></Page>
- <Page><Question><QuestionText><p>What is your name?</p></QuestionText></Question></Page>
- <Page><Question><QuestionText><p>What is your name?</p></QuestionText></Question><Buttons /></Question></Page>
- <Page><p>Thanks for taking our survey!</p></Page>
- Attribute List Declarations
What attributes can my element have, and what can be in their values?
<!ATTLIST attributeName
name1 type1 default1
name2 type2 default2
...
>
Where type is one of- CDATA character data
- ID unique value - one per document
- IDREF or IDREFs - points to an ID
- ENTITY or ENTITIES
- NMTOKEN or NMTOKENS - valid XML Names
- enumeration - list of valid strings
- NOTATION
- #REQUIRED element in the xml document must supply this attribute
- #IMPLIED this attribute is optional
- #FIXED value
- value if no value is specified, this one is used
- Attribute List Declarations 2
The two most common are CDATA and enumeration.
Examples:<!ATTLIST ChoiceList
tableAttributes CDATA #IMPLIED
HTMLWidget (dropdownlist|radio|checkbox) "dropdownlist"
debug (yes|no) #IMPLIED
choicelistdef IDREF #IMPLIED
>
The default value appears after an enumeration of choices. If the element in the xml document does not supply a value for this attribute, the default is used.
When "#IMPLIED" is used, it means the name-value pair is optional.
- Attribute List Declarations - quiz
Given the following Attribute List Declaration:
Quiz - which are valid attributes?:<!ATTLIST survey1
name NMTOKEN #REQUIRED
host CDATA #IMPLIED
test (yes|no) "yes"
debug "yes"
>- <survey1 host="dizzy" />
- <survey1 name="ae4" host="dizzy" />
- <survey1 name="ae4" host="dizzy" test="null" />
- <survey1 name=" xyz " test="null" />
- <survey1 host="dizzy" />
- Attribute List Declarations with Entities
Typically a DTD makes use of entities. For Example:
<!ENTITY % YesNo "(yes|no)">
<!ENTITY % Integer "CDATA">
<!ATTLIST survey1
show %YesNo; #IMPLIED
border %Integer; "1.0"
> - XML documents
XML documents start with a declaration of version and encoding,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
This is typically followed by reference to where the DTD document resides. The word "Coins" below refers to the top level element (the outermost) in the document. The actual dtd file may be across the Internet or on the same machine.
<!DOCTYPE Coins SYSTEM "Coins1.dtd">
In smaller documents the dtd may be embedded in the actual xml document.
- Sample XML document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE survey1 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://localhost/dtd/survey1.dtd">
<survey1 name="test">
<Page><Question><QuestionText>What is your name?</QuestionText></Question></Page>
<Page><p>Thanks for taking our survey!</p></Page>
</survey1>
- Sample DTD document
<!ELEMENT survey1 (Head*,Page+)>
<!ATTLIST survey1
name NMTOKEN #REQUIRED
host CDATA #IMPLIED
version CDATA "1.0"
test (yes|no) "yes"
>
<!ELEMENT Page ((Question*|p*)*,Buttons*)>
<!ELEMENT Question (QuestionText*)>
<!ELEMENT QuestionText (#PCDATA)>
<!ELEMENT Head (#PCDATA)>
<!ELEMENT Buttons (#PCDATA)>
<!ELEMENT p (#PCDATA)>
- Sample XML and DTD in one document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE survey1 [
<!ELEMENT survey1 (Head*,Page+)>
<!ATTLIST survey1
name NMTOKEN #REQUIRED
host CDATA #IMPLIED
version CDATA "1.0"
test (yes|no) "yes"
>
<!ELEMENT Page ((Question*|p*)*,Buttons*)>
<!ELEMENT Question (QuestionText*)>
<!ELEMENT QuestionText (#PCDATA)>
<!ELEMENT Head (#PCDATA)>
<!ELEMENT Buttons (#PCDATA)>
<!ELEMENT p (#PCDATA)>
]>
<survey1 name="test">
<Page><Question /><Buttons /></Page>
<Page><Buttons /></Page>
<Page><Question /><Question /><Buttons /></Page>
<Page><Question /><Question /><Buttons /></Page>
<Page><Question /><p /><p /><Question /><Buttons /></Page>
<Page><Question><QuestionText>What is your name?</QuestionText></Question></Page>
<Page><Question><QuestionText>What is your name?</QuestionText></Question></Page>
<Page><p>Thanks for taking our survey!</p></Page>
</survey1>
- How to include one DTD in another. From the XML FAQ.
<!ENTITY % mylists PUBLIC
"-//Foo, Inc//ENTITIES Common list structures//EN"
"dtds/listfrag.ent">
...
%mylists;
- How to use CDATA to tell the parser to ignore markup for elements
<AttributeScript attribute="firstBlock">
<![CDATA[
if(count < 10) {
answer.add("Block1");
} else {
answer.add("Block2");
}
]]> </AttributeScript>
- Online References for XML:
- http://www.xmlaustin.org
- XML Notepad
- www.w3schools.com great tutorials on xml, html, xsl
- Guide to XML software
- World Wide Web Consortium's standards for XML
- http://www.arbortext.com/index.html
- http://architag.com/xmlu/
- http://msdn.microsoft.com/xml (use ie)
- http://www.xml.com
- Yahoo's XML links
- Sun's development with xml notes
- IBM's XML site
- http://www.webdeveloper.com/xml/
- http://developerlife.com/
- Index of free xml tools
- http://www.ucc.ie/xml/ XML FAQ
- XHTML
- http://www.w3.org/MarkUp/
- http://www.xhtmlquickref.com/
- Pedantic Review
What we learned today:
Why use XML?
What is a tag?
What is an element?
What does it mean that an XML document is "well formed"?
What does it mean that an XML document is "valid"?
How do I read a simple DTD?