This deed is not made simpler by the fact that HTML is a rapidly moving target. While HTML 2.0 is now out as an Internet Draft (i.e., it is fairly fixed), many issues (tables, math, character sets) aren't part of it, and HTML 3.0 is still pretty much in flux.
This collection of links grew out of my attempts to hunt down adequate documentation. In the course of this undertaking, I got interested in SGML, too. HTML is based on (or, an application of) SGML. Some knowledge of SGML is needed to read and understand the DTDs (document type definitions) that are used to describe all the different HTML variations.
Some Web developers want to make HTML-browsers more SGML-savvy, so that advanced SGML techniques (e.g., DSSSL transformations) can be used to extend HTML. HTML might even be the ``killer application'' that sells SGML to the world. Other extension ideas like style sheets are orthogonal to SGML.
The following list of resources somehow related to HTML and/or SGML is divided into the following parts:
There are three HTML definitions, none of which are RFCs. HTML 1.0 was the original conception by Tim Berners-Lee et al. It has been superseded by HTML 2.0, which was frozen and submitted as a draft RFC on or about 29 November 1994, and which (by intention) reflects the then-current usage of HTML by WWW browsers. HTML 3.0 is currently under development; the Arena browser, published by the Worldwide Web Organization (W3O) supports all of its features. The home page for all three definitions is at CERN (09.07.95). (Which is about the most ``canonical'' WWW info site that exists.)
Pointers to the latest versions of the HTML 2.0 spec (versions of May 31 and June 16).
Earl Hood's hypertext version of the HTML 2.0 DTD.
There are two RFCs that describe URLs: RFC 1630 "Universal Resource Identifiers in WWW" (Ohio, local) and RFC 1738 "Uniform Resource Locators" (Ohio, local, local marked-up version)
HTML is based on the ISO 8859-1 character set, but has - or will have - some extra character entities. Chief amongst them, different spaces. Now, spaces are mysterious things - spaces and tabs and newlines are mostly interchangeable, and multiple "spaces" are collapsed by most browsers. Olle Jarnefors has written an is article that summarizes and critizes the current status of spaces and hyphens in HTML. (Includes 2 follow-ups on width details.)
A propos characters: Roman Czyborra has created a quite definite page on the different Latin-[1-10] character sets.
The Draft HTML 3.0 specification and DTD, courtesy of Dave Raggett (local copy).
The next big thing to come are style sheets, which allow the definition of new logical tags and their default visual interpretation. The details are still very much in the discussion stage, with half a dozen proposals floating around. (DSSSL can be seen as a meta-stylesheet-language.)
The current (0.96) Arena supports some kind of style sheets.
The currently very popular WWW browser NetScape (formerly Mozilla) is notorious for its non-HTML-2.0/3.0-compatible extensions, (<center>, <blink>), fondly called Mozillisms.
(08.03.95) Netscape Communications got quite a bit of flak on these issues, and has installed a page on ``Questions And Answers About Netscape And Open Standards'' to define their stance. It's mostly about a security protocol called SSL, which creates new problems of its own (export licenses etc.).
Eventually, variants of many Mozillisms will become standard. This is one of the reasons NetScape is scorned by standardizers: why invent new things when generalized variants of said features are already considered in the standard?
(26.08.95) Microsoft, not to be outdone by a start-up like Netscape, includes its own set of HTML extensions into its Internet Explorer browser. Amongst them: a <font> tag with colour (oops, color) and font attributes, and client-side imagemaps (no idea whether they want to use <FIG> or something else; they refer to a December'94 paper, written by a Spyglass employee, that uses the same idea, but calls it <MAP>. Might be a precursor). Of course, nowhere on that page is HTML 3 mentioned.
(11.10.95) Netscape proposes client-side Cookies. With these, state-guided dynamic menus and such can finally be implemented. No idea whether they are part of the 2.0 browser yet.
To avoid any further confusion over which "HTML" everyone is really talking about, Dave Raggett and Dan Connoly explain the new numbering scheme that has been adopted to describe the earliest HTML (1.0) which was not valid SGML, the HTML DTD (2.0) which is being written and tested by Dan Connoly, and the so-called HTML+ which is being written by Dave Raggett and will be referred to as HTML 3.0. In future, all claims of compliance with HTML will require reference to a version number(s).
DSSSL, the Document Style Semantics and Specification Language, the companion to SGML for formatting and transformation, has been largely re-written, and is out for balloting.Also, more relevant to the web:
Since the SGML Conference, an SGML Open technical committee, including experts from the ISO DSSSL committee, has begun work on defining a minimal subset of the formatting part of DSSSL such as would be appropriate for online delivery including World Wide Web SGML and HTML browsers. That work will be submitted to the HTML IETF Working Group and relevant lists for discussion.
>Do you know of anyplace to get a decent explanation of DSSSL? The DIS is the best explanation I've found so far. It's heavy on formal specifications but light on examples and rationale, which may be off-putting. The whole thing is available in SGML and PostScript (formatted for A4, but prints OK on US letter) at: ftp://ftp.ornl.gov:/pub/sgml/WG8/DSSSL/ James Clark's home page has some good introductory tutorial information, some STFP and STTP examples (*very* useful) and documentation about DSSSL-Lite. http://www.jclark.com/dsssl/ Speaking of DSSSL Lite, there's the archives at http://www.falch.no/~pepper/DSSSL-Lite/ The discussions on the comp-std-sgml mailing list have focussed on DSSSL lately; James Clark has posted several helpful messages to the list. The list is archived at ftp://ftp.naggum.no:/pub/comp.std.sgml The usual SGML repositories have some data: http://navysgml.dt.navy.mil/dsssl.html (this is mostly a list of links to other sites) http://gopher.sil.org/sgml/sgml.html#dssslRel (a more comprehensive list of links) ftp://ftp.ifi.uio.no/pub/SGML/DSSSL (a mirror of the DIS) I have to mention http://www.art.com/~joe/dsssl.txt I don't know of any other network-accessible resources. There are undoubtedly back-issues of <TAG> (to which I really ought to subscribe one of these days) with more comprehensive information. I don't think any books on DSSSL have been published yet. --Joe English
Watch out for those .wrl files and x-world/x-vrml MIME messages!