An HTML Tag Parser

The package "html" contains an HTML scanner and tag parser.
  1. HTMLScanner - produces a stream of character values. The special characters <, >, and " are marked by an extra boolean flag. The escape sequences &lt;, &gt;, &quot;, and &amp; are converted to the characters <, >, " and & with the boolean flag set to false. Can throw an HTMLFormatException.

    For general-purpose use, it would be interesting to convert more &; strings, perhaps into appropriate unicode characters.

  2. HTMLTag - represents a single html tag. Provides methods getTag( ) and getParameter( String ), thus implementing the ParameterStub interface. The constructor can be passed a string, an InputStream, or an HTMLScanner. Can throw an HTMLFormatException.

    This class is used in IDVI to parse the contents of html: specials. This means IDVI will handle quoted arguments, extra tag attributes, and embedded &; strings.

  3. HTMLFormatException - thrown by the HTMLScanner if an & does not have a matching ;, or if the client code advances past the end of the input, and by HTMLTag if a <, =, or > character is missing.