Malcolm's github site Baby X Resource compiler    MiniXML banner

xmlparser2

The miniXML parser source file, xmlparser2.c, is a set of functions designed to work with XML files. The functions make it easy for programmers to use these files and to manipulate them. It is not part of Baby X and therefore doesn't take the bbx prefix.

The functions

These are the functions in the library.


    #ifndef xmlparser_h
    #define xmlparser_h

    #include <stdio.h>

    typedef struct xmlattribute
    {
      char *name;                /* attribute name */
      char *value;               /* attribute value (without quotes) */
      struct xmlattribute *next; /* next pointer in linked list */
    } XMLATTRIBUTE;

    typedef struct xmlnode
    {
      char *tag;                 /* tag to identify data type */
      XMLATTRIBUTE *attributes;  /* attributes */
      char *data;                /* data as ascii */
      int position;              /* position of the node within parent's data string */
      int lineno;                /* line number of node in document */
      struct xmlnode *next;      /* sibling node */
      struct xmlnode *child;     /* first child node */
    } XMLNODE;

    typedef struct
    {
      XMLNODE *root;             /* the root node */
    } XMLDOC;


    XMLDOC *loadxmldoc(const char *fname, char *errormessage, int Nerr);
    XMLDOC *floadxmldoc(FILE *fp, char *errormessage, int Nerr);
    XMLDOC *xmldocfromstring(const char *str,char *errormessage, int Nerr);
    void killxmldoc(XMLDOC *doc);
    void killxmlnode(XMLNODE *node);

    XMLNODE *xml_getroot(XMLDOC *doc);
    const char *xml_gettag(XMLNODE *node);
    const char *xml_getdata(XMLNODE *node);
    const char *xml_getattribute(XMLNODE *node, const char *attr);
    int xml_Nchildren(XMLNODE *node);
    int xml_Nchildrenwithtag(XMLNODE *node, const char *tag);
    XMLNODE *xml_getchild(XMLNODE *node, const char *tag, int index);
    XMLNODE **xml_getdescendants(XMLNODE *node, const char *tag, int *N);
    char *xml_getnesteddata(XMLNODE *node);

    int xml_getlineno(XMLNODE *node);
    XMLATTRIBUTE *xml_unknownattributes(XMLNODE *node, ...);

    #endif
    

loadxmldoc

Loads an XML file and returns an XMLDOC object.

    XMLDOC *loadxmldoc(const char *fname, char *errormessage, int Nerr);
    Params:
           fname - the name of the file to load.
           errormesssage - return buffer for error messages.
           Nerr - size of the error message buffer.

    Returns: the constructed XMLDOC object.
    

This is the main function to load an XML file from disk and parse it. The XML loader is quite good, and will load UTF-16 and convert to UTF-8. It's also got fairly strong error reporting.

floadxmldoc

Loads XML from an open stream and returns an XMLDOC object.

    XMLDOC *floadxmldoc(FILE *fp, char *errormessage, int Nerr);
    
    Params:
           fp - pointer to a file opened for reading.
           errormessage - return buffer for error messages
           Nerr - size of the error message buffer
           
    Returns: the constructed XMLDOC object.
           

loadxmldoc is of course just a wrapper for this function, which is exposed in case you have data coming from an open stream and can't provide a filename.

xmldocfromstring

Reads XML from a string, and returns an XMLDOC object.

    XMLDOC *xmldocfromstring(const char *str,char *errormessage, int Nerr);
    
    Params:
           str - a string containing xml.
           errormesssage - retun buffer for error messages.
           Nerr - size of the error message bufffer.

    Returns: the constructed XMLDOC object.

Pass it a string with XML to use the system in an IO-free manner. Strings must be in UTF-8.

killxmldoc

Destroys an XMLDOC object.

    void killxmldoc(XMLDOC *doc);
    Params:
           doc - the XMLDOC object to destroy.
    

This destroys an XMLDOC. XML documents can get vey large, and so you want to destroy them as soon as possible.

killxmlnode

Destroys an XMLNODE, its siblings, and its children.

    void killxmlnode(XMLNODE *node);
    Params:
           node - the XMLNODE object to destroy.
           

The function will destroy all of the siblings of the xml node.b So the node must be unlinked before calling

Document query functions

xml_getroot

Returns the root node of the XMLDOC object.

    XMLNODE *xml_getroot(XMLDOC *doc);
    Params:
           doc - the XMLDOC object.

    Returns: root node of the document.
    

Access function. Never access the root directly, unless actually writng to it for some reason.

xml_gettag

Gets the tag or element name associated with a node.

    const char *xml_gettag(XMLNODE *node);
    Params:
           node - the XMLNODE.
    
    Returns: the tag or element name associated with the node.

Access function. You get a bit of security with "const".

xml_getdata

gets the data associated with the node.

    const char *xml_getdata(XMLNODE *node);
    Params:
           node - the XMLNODE object.
           
    Returns: the data associed with the node.

Some nodes have data, and other will just have whitespece, which nevertheless must be preserved, If a node was in the closed single args form (<mytag />) the the data element should be null.

xml_getatttribute

Get an attribute attached to a node.

    const char *xml_getattribute(XMLNODE *node, const char *attr);
    Params:
           node - the XMLNODE object.
           attr - the name of the attribute to query
    Returns: the value of the attribute, or null if it does not exist.

The attributesa are stored in a linked list. The function traverse he list and reports the first match.

xml_Nchildren

Get the number of direct children of the node.

    int xml_Nchildren(XMLNODE *node);
    Params:
           node - the XMLNODE object.
           
    Returns: number of direct children of the node.

Convenience function to count the children.

xml_Nchildrenwithtag

Get the number of direct children of a node associated wih a tag.

    int xml_Nchildrenwithtag(XMLNODE *node, const char *tag);
    Params:
           node - the XMLNODE object.
           tag - thetag or element name to query.
    Returns: the number of direct children with thaat tag type.

Convenience function to get the nodes of the tag type we are inrerested in.

xml_getchild

Get the node's first child.

    XMLNODE *xml_getchild(XMLNODE *node, const char *tag, int index);
    Params:
           node - the XMLNODE object.
           tag - the tag to query.
           index - seniority of the child with that tag
    Returns: the child with that tag, as referenced by index

Access function to first child.

xml_getdescendants

Get all the descendants of a node associated with a tag.

    XMLNODE **xml_getdescendants(XMLNODE *node, const char *tag, int *N);
    Params:
           node - the XMLNODE object.
           tag - the tag to query
           N - return pointer for the number of descendants found.
    Returns: an allocated list of all the descendants of that node.

This is often what you want to do.

xml_getnesteddata

Get the data held by a node and its descendants.

    char *xml_getnesteddata(XMLNODE *node);
    Params:
           node - the XMLNODE object.
        
    Returns: allocated pointer to the data held by the nodecand its children..

This is needed for XML which is markup instead of tagged data. The node needs to know the positions of any children within its data string.

xml_getlineno

Get the number of the line in the XML document where the node appeared.

    int xml_getlineno(XMLNODE *node);
    Params:
           node - the XMLNODE object.
           
    Returns: the line number the node appeared in the XML document.

This is an absolutely crucial little function for reporting errors. XML fles often get very large, and without some reference to the place a corrupted node appears in the data, repair is hopeless.

xml_unknownattributes

Return any attributes which do not match a list of known attributes.

    XMLATTRIBUTE *xml_unknownattributes(XMLNODE *node, ...)
    Params:
           node - the XMLNODE object.
           ... - a list of known attributes associated with the node.
    Returns: an allocated linked list of attributes not in thev list
             of known attributes.

This is another function mainly for debugging. Nodes might have unkown attributes associated with them. Which will often indicate a bug somewhere upstream.

You use it like this

    void *sentinel = 0;
    XMLARRIBUTE *attr;
     
     attr = xml_unknownattributes(node, "faith", "hope", "charity", sentinel);
     if (!attr)
     {
        /* all ok */
     }
     /* now we have deep copy of the unknown attributes. */
     

XML files

XML is a recognised format for files.