Malcolm's github site Baby X Resource compiler Baby X FileSystem    Baby X FS banner

The FileSystem XML file format

FileSystem is an XML file format designed for storing directories. It is intended for Baby X and works with the BabyXFS (Baby X FileSystem) suite of programs and functions. However it is independent of Baby X and is open to anyone for any use.

What is FileSystem XML

FileSystem XML is a format for storing hierarchical data. The root tag is <FileSystem> and then there are two descendant tags <directory> and <file> Files are leaf nodes which contain data, whilst directories can nest and contain files and other directories.

Example FileSystem XML file

    <FileSystem>>
        <directory name="owl_and_text">
            <directory name="owl">
                <file name="Cartoon_Owl_clip_art.png" type="binary">
    <![CDATA[M)"E3'U@":H````0#)A$12!````$```@/(8````PF!UQ;````!,G4')$`NZ,'
    MI#``;<02$%$5H50S;='=<6=E_W4U4ETHND5R]*N!&<PQ<=<H'@0HX0"F$(GE
    M4V-+)9A-;.DE-9A`!(]$(+_1.LA`$@L`!,$P8<!;<O)9)9999)9)9-2SH13O
    M7V?W/IQ:D0V28T>.[UGG^^FO7]>?W^[95X_[`-<HSFE\,I#Y:4GFRS2KV*TH
    M25I)2E:!LLH4`%PZ5I"P)_>CJ5IZ8)3EZ,1B'O[0Q2T3P88(6O+6\R2"6FV!

    More uuencoding removed for brevity

    M+"V\!9?&L7&70/8OM,$601^1[G\T8[--DD-W#)]9Z"^TZ'P$M.L/@GXKFYLI
    M,7!2BWZH1C967$-;OEPX-/2\H+LLX:V^Q#C-U0L7+44\[Y`$E!2U??NCM>3J
    MCD7'DNLEFC%WJN(J_;'.,^P3$/V"*-AFW@OOC3$__B21_%V'AX-M`3WF!E%E
    M:*7=)U7,^6T_<U].$>?>%8^\T3QSTY@8OU7(N-RE?BV'$/-;KX!G`Z,K%715
    M%B[CGH^`\8`/0-%@O(#I_P#X%M7;!XVX-?;W=Y"/+;;W*]8:Z/_O!MW)SX-<
    .9_(`````)5D3$YJ0@)(I
    ]]>
                </file>
            </directory>
            <file name="readme.txt" type="text">
    In the beginning was the Word, and the Word was with God, and the Word was God.
    The same was in the beginning with God.
    All things were made by Him, and without Him was made nothing that was made:
    in Him was life, and the life was the Light of men;
    and the Light shineth in darkness, and the darkness did not comprehend it.

            </file>
        </directory>
    </FileSystem>

image of a folder And it is fairly simple and self explanatory. Directories have names, but are otherwise just containers. Whilst files have names, and can be binary or text. Text is stored as plain ASCII, whilst binary data is uuencoded and put in a CDATA section. So it's a very clean format.

owl_and_text.xml

The <FileSystem> tag

    <FileSystem>
        <directory name="archive">
            <file name="readme.txt" type="text">
and round the neck of the bottles was a paper label, with the
words ‘DRINK ME’ beautifully printed on it in large letters
            </file>
        <directory>
    </FileSystem>

The FileSystem tag identifies the format. It should be the root element, and has one child, which is always a directory. The name of the root directory identifies the name of the FileSystem. Only directory and file tag elements are allowed within FileSystem elements.

The <directory> tag

<directory name="archive">
    <file name="readme.txt" type = >
and round the neck of the bottles was a paper label, with the
words ‘DRINK ME’ beautifully printed on it in large letters
    </file>
    <directory name="subfolder">
    </directory>
<directory>

The directory tag one compulsory element, which is the name. It can contain files and directories in any order. Directories are all normal and should not have names which suggest special directories like symbolic links on common computer systems, eg "..". A directory can be empty.

The <file> tag

      <file name="readme.txt" type="text">
and round the neck of the bottles was a paper label, with the
words ‘DRINK ME’ beautifully printed on it in large letters
      </file>
      <file name="rubbish.bin" type="binary">
      <![CDATA[M)"E3'U@":H````0#)A$1%
      ]]>
      </file>

The file tag is the leaf element which contains data. It has two compulsory attributes, a name and a type, which must be "text" or "binary". Text data is plain, whilst binary data is uuencoded. uuencoding is a common system and decoders are widely available, and there is of course code at the BabyXRC project.

The main consideration is to be extremely simple to parse, and robust. Text files are human readable. So if anything goes wrong with a file in FileSystem archive, you don't need any special sofware to diagnose the problem and fix it. Just a text editor which can handle large files. There is no way of making binary file human-readable, but uuencoding is the simplest widely use binary to text protocol there is. Almost anyone with any programming experience at all can write a decoder, uuncoding is a fairly simple system for encoding binary as ASCII. If the file has been corrupted and you have lost the data, a bedroom programmer may well have the skills to fix it.

Whitespace handling

Whitespace is difficult with XML. And most text files are robust to a bit of whitespace being added or trimmed. However ideally users want text whitespace perefectly preserved. And therefore the program babyxfs_dirtoxml, which generates the files, uses the following system. A newline is added immediately after the opening <file> tag. Then a newline is added imediately after the data. Then a series of tabs are added to indent the closing tag. So by replacing the opening and closing newlines with nulls, you obtain the original text.

The following code is used to trim text data.

    
        if (!strcmp(type, "text"))
        {
            leading = 0;
            len = (int) strlen(data);
            for (i = 0; data[i]; i++)
                if (!isspace((unsigned char) data[i]) || data[i] == '\n')
                    break;
            if (data[i] == '\n')
                leading = i + 1;
            
            trailing = 0;
            i = len - 1;
            for (i = len - 1; i > 0; i--)
                if (!isspace((unsigned char) data[i]) || data[i] == '\n')
                    break;
            if (i > 0 && data[i] == '\n')
                trailing = len - i;
            
            if (trailing + leading >= len )
                ;
            else
            {
                if (fwrite(data + leading, 1, len - trailing - leading, fp) != len - trailing - leading)
                    goto error_exit;
            }
        }
        

You should use this algorithm to trim text data. If the whitespace has not been manipulated, the leading space is always 1 whilst the trailing space is always the nesting level of the element plus one, and is the newline which terminates the text data.

Note that XML only allows newlines, carriage returns, and tabs as control characters under 32 (space). Some text files gave form feeds, backspaces, or other characters. So a "binary tag does not necessarily mean that the data is binary on the host computer.

Motivation

There was a need to allow users to packaged directories in the Baby X resource compiler. And because the spirit of the resource compiler is extreme portability, binary formats were not acceptable. And XML was the natural choice. There was also a desire to show off the XML parser, which has stood up to the files magnificently.

Baby X is designed for running small or baby programs on large computers, so there is not much need to save memory by going for a very compressed format. Instead I chose something which is robust and simple to use, and aimed at the hobby programmer, although of course professionals who derive their living from programming are perfectly welcome to use it, and as all of Baby X, it is free to anyone for any use.

Concluding

Enjoy programming - Malcolm.

Here are some useful links.