SAX

SAX stands for Simple API for XML, and was originally a Java API for reading XML. (Full details at http://saxproject.org). SAX implementations exist for most common modern computer languages.

FoX includes a SAX implementation, which translates most of the Java API into Fortran, and makes it accessible to Fortran programs, enabling them to read in XML documents in a fashion as close and familiar as possible to other languages.

SAX is a stream-based, event callback API. Conceptually, running a SAX parser over a document results in the parser generating events as it encounters different XML components, and sends the events to the main program, which can read them and take suitable action.

Events

Events are generated when the parser encounters, for example, an element opening tag, or some text, and most events carry some data with them - the name of the tag, or the contents of the text.

The full list of events is quite extensive, and may be seen below. For most purposes, though, it is unlikely that most users will need more than the 5 most common events, documented here.

Given these events and accompanying information, a program can extract data from an XML document.

Invoking the parser.

Any program using the FoX SAX parser must a) use the FoX module, and b) declare a derived type variable to hold the parser, like so:

   use FoX_sax
   type(xml_t) :: xp

The FoX SAX parser then works by requiring the programmer to write a module containing subroutines to receive any of the events they are interested in, and passing these subroutines to the parser.

Firstly, the parser must be initialized, by passing it XML data. This can be done either by giving a filename, which the parser will manipulate, or by passing a string containing an XML document. Thus:

  call open_xml_file(xp, "input.xml", iostat)

The iostat variable will report back any errors in opening the file.

Alternatively,

  call open_xml_string(xp, XMLstring)

where XMLstring is a character variable.

To now run the parser over the file, you simply do:

 call parse(xp, list_of_event_handlers)

And once you're finished, you can close the file, and clean up the parser, with:

 call close_xml_t(xp)

Options to parser

It is unlikely that most users will need to operate any of these options, but the following are available for use; all are optional boolean arguments to parse.

Receiving events

To receive events, you must construct a module containing event handling subroutines. These are subroutines of a prescribed form - the input & output is predetermined by the requirements of the SAX interface, but the body of the subroutine is up to you.

The required forms are shown in the API documentation below, but here are some simple examples.

To receive notification of character events, you must write a subroutine which takes as input one string, which will contain the characters received. So:

module event_handling
  use FoX_sax
contains

  subroutine characters_handler(chars)
    character(len=*), intent(in) :: chars

    print*, chars
  end subroutine
end module

That does very little - it simply prints out the data it receives. However, since the subroutine is in a module, you can save the data to a module variable, and manipulate it elsewhere; alternatively you can choose to call other subroutines based on the input.

So, a complete program which reads in all the text from an XML document looks like this:

module event_handling
  use FoX_sax
contains

  subroutine characters_handler(chars)
    character(len=*), intent(in) :: chars

    print*, chars
  end subroutine
end module

program XMLreader
  use FoX_sax
  use event_handling
  type(xml_t) :: xp
  call open_xml_file(xp, 'input.xml')
  call parse(xp, characters_handler=characters_handler)
  call close_xml_t(xp)
end program

Attribute dictionaries.

The other likely most common event is the startElement event. Handling this involves writing a subroutine which takes as input three strings (which are the local name, namespace URI, and fully qualified name of the tag) and a dictionary of attributes.

An attribute dictionary is essentially a set of key:value pairs - where the key is the attributes name, and the value is its value. (When considering namespaces, each attribute also has a URI and localName.)

Full details of all the dictionary-manipulation routines are given in AttributeDictionaries, but here we shall show the most common.

So, a simple subroutine to receive a startElement event would look like:

module event_handling

contains

 subroutine startElement_handler(URI, localname, name,attributes)
   character(len=*), intent(in)   :: URI  
   character(len=*), intent(in)   :: localname
   character(len=*), intent(in)   :: name 
   type(dictionary_t), intent(in) :: attributes

   integer :: i

   print*, name

   do i = 1, getLength(attributes)
      print*, getQName(attributes, i), '=', getValue(attributes, i)
   enddo

  end subroutine startElement_handler
end module

program XMLreader
 use FoX_sax
 use event_handling
 type(xml_t) :: xp
 call open_xml_file(xp, 'input.xml')
 call parse(xp, startElement_handler=startElement_handler)
 call close_xml_t(xp)
end program

Again, this does nothing but print out the name of the element, and the names and values of all of its attributes. However, by using module variables, or calling other subroutines, the data could be manipulated further.

Error handling

The SAX parser detects all XML well-formedness errors (and optionally validation errors). By default, when it encounters an error, it will simply halt the program with a suitable error message. However, it is possible to pass in an error handling subroutine if some other behaviour is desired - for example it may be nice to report the error to the user, finish parsing, and carry on with some other task.

In any case, once an error is encountered, the parser will finish. There is no way to continue reading past an error. (This means that all errors are treated as fatal errors, in the terminology of the XML standard).

An error handling subroutine works in the same way as any other event handler, with the event data being an error message. Thus, you could write:

subroutine fatalError_handler(msg)
  character(len=*), intent(in) :: msg

  print*, "The SAX parser encountered an error:"
  print*, msg
  print*, "Never mind, carrying on with the rest of the calcaulation."
end subroutine

Stopping the parser.

The parser can be stopped at any time. Simply do (from within one of the callback functions).

call stop_parser(xp)

(where xp is the XML parser object). The current callback function will be completed, then the parser will be stopped, and control will return to the main program, the parser having finished.


Full API

Derived types

There is one derived type, xml_t. This is entirely opaque, and is used as a handle for the parser.

Subroutines

There are four subroutines:

This opens a file. xp is initialized, and prepared for parsing. string must contain the name of the file to be opened. iostat reports on the success of opening the file. A value of 0 indicates success.

This closes down the parser (and closes the file, if input was coming from a file.) xp is left uninitialized, ready to be used again if necessary.

(Advanced: See above for the list of options that the parse subroutine may take.)

The full list of event handlers is in the next section. To use them, the interface must be placed in a module, and the body of the subroutine filled in as desired; then it should be specified as an argument to parse as:
name_of_event_handler = name_of_user_written_subroutine
Thus a typical call to parse might look something like:

  call parse(xp, startElement_handler = mystartelement, endElement_handler = myendelement, characters_handler = mychars)

where mystartelement, myendelement, and mychars are all subroutines written by you according to the interfaces listed below.


Callbacks.

All of the callbacks specified by SAX 2 are implemented. Documentation of the SAX 2 interfaces is available in the JavaDoc at http://saxproject.org, but as the interfaces needed adjustment for Fortran, they are listed here.

For documentation on the meaning of the callbacks and of their arguments, please refer to the Java SAX documentation.

Triggered when some character data is read from between tags.

NB Note that all character data is reported, including whitespace. Thus you will probably get a lot of empty characters events in a typical XML document.

NB Note also that it is not required that a single chunk of character data all come as one event - it may come as multiple consecutive events. You should concatenate the results of subsequent character events before processing.

Triggered when the parser reaches the end of the document.

Triggered by a closing tag.

Triggered when a namespace prefix mapping goes out of scope.

Triggered when whitespace is encountered within an element declared as having no PCDATA. (Only active in validating mode.)

Triggered by a Processing Instruction

Triggered when either an external entity, or an undeclared entity, is skipped.

Triggered when the parser starts reading the document.

Triggered when an opening tag is encountered. (see LINK for documentation on handling attribute dictionaries.

Triggered when a namespace prefix mapping start.

Triggered when a NOTATION declaration is made in the DTD

Triggered when an unparsed entity is declared

Triggered when a error is encountered in parsing. Parsing will continue after this event.

Triggered when a fatal error is encountered in parsing. Parsing will cease after this event.

Triggered when a parser warning is generated. Parsing will continue after this event.

Triggered when an attribute declaration is encountered in the DTD.

Triggered when an element declaration is enountered in the DTD.

Triggered when a parsed external entity is declared in the DTD.

Triggered when an internal entity is declared in the DTD.

Triggered when a comment is encountered.

Triggered by the end of a CData section.

Triggered by the end of a DTD.

Triggered at the end of entity expansion.

Triggered by the start of a CData section.

Triggered by the start of a DTD section.

Triggered by the start of entity expansion.


Exceptions.

The FoX SAX implementation implements all of XML 1.0 and 1.1; all of XML Namespaces 1.0 and 1.1; xml:id and xml:base.

Although FoX tries very hard to work to the letter of the XML and SAX standards, it falls short in a few areas.

(It is impossible to implement IO of non-ASCII documents in a portable fashion using standard Fortran 95, and it is impossible to handle non-ASCII data internally using standard Fortran strings. A fully unicode-capable FoX version is under development, but requires Fortran 2003. Please enquire for further details if you're interested.)

Beyond this, any aspects of the listed XML standards to which FoX fails to do justice to are bugs.


What of Java SAX 2 is not included in FoX?

The difference betweek Java & Fortran means that none of the SAX APIs can be copied directly. However, FoX offers data types, subroutines, and interfaces covering most of the facilities offered by SAX. Where it does not, this is mentioned here.

org.sax.xml:

org.sax.xml.ext:

org.sax.xml.helpers: