The Role of XML

This chapter describes the use of XML as an intermediary data layer for transferring information between the Ganymede database and DNS zone files. A middle layer is necessary because Ganymede itself does not emit zone files directly. Instead, it produces a data file that is parsed by an external script into the required zone files. We'll first give a brief overview of XML, then look at the benefits of having an XML layer, and finally describe the actual format used by the XML file.

What is XML?

XML stands for Extensible Markup Language. A markup language provides a way to describe the structure of a document by enclosing portions of that document within tags defined by the language. The tagged information can then be manipulated by an application that understands the markup. A web browser uses its knowledge of how to process HTML, for instance, to render the data inside a pair of <TABLE> tags in a meaningful manner.

HTML is probably the most familiar example of a markup language. It is, of course, the language that is used to write web pages. In general, it is useful for describing the types of documents that require headings, sections, lists, etc. It is not so good as a markup language for data in general, however, because the set of available tags is basically fixed by the HTML standard. There is no way for an author to add a new structuring tag like <customer-order> to a HTML document.

XML, on the other hand, is specifically designed to allow for the creation of new markup languages which can be customized to a specific application. It essentially provides a means for authors to define new tags to structure data in their domain of interest. For example, this is how XML might be used to describe a pizza:

<PIZZA>
   <TOPPINGS>
       <ONIONS></ONIONS>
       <TOMATO></TOMATO>
       <CHEESE></CHEESE>
   </TOPPINGS>
   <SAUCE TYPE="PIZZA"></SAUCE>
   <CRUST STYLE="THIN"></CRUST>
</PIZZA>

This should look somewhat familiar to anyone with HTML experience, but there are a few syntactic differences in XML that might not be obvious at first:

These points only illustrate a few of the characteristics of XML that might cause some confusion to those already familiar with HTML. Even as a simple overview of XML, it is by no means complete. For further information on the features of XML such as unicode support, validation, detailed language specifications, and a more in-depth look the items touched upon above, please see the numerous FAQ's and tutorials available at sites such as www.xml.com and www.w3schools.com.

Why use XML?

DNS zone files can vary a surprising amount from one organization to another. The ordering of the resource records can differ, the amount of shorthand notation may vary, and even the types of data represented may be slightly different at each site. Because of this variation, it is reasonable to assume that organizations might want the ability to customize the output of the zone files generated using Ganymede.

It's a bit much, though, to expect people to have to modify the internals of the Ganymede server just to tweak zone file generation. A better solution is to have Ganymede create a standard data file that can be parsed by an external script. This script could be written in Perl, Java, Python, etc., and could be modified more easily and safely than the server code to provide the desired zone file output.

XML provides a good format for the data file emitted by Ganymede because of the following reasons:

How do we model DNS using XML?

The structure of an XML file follows a certain pattern. This pattern is determined by the Document Type Definition (DTD). Basically, a DTD describes in detail things like which elements can be contained within which, which ones are optional, and which ones have associated attributes.

We have developed a DTD called GANY_DNS that describes the structure of the XML data file used by Ganymede for DNS support. An in-depth discussion of the DTD itself might just muddy the water at this point, but this annotated *table* of the various elements gives a good feel for the overall structure and content.

Probably the most practical way to describe how DNS information is modeled in XML using the GANY_DNS document type definition is to do a walk-through of a simple example file. The remainder of this chapter will examine the XML file that corresponds to the zone files db.mydomain1, db.mydomain2, db.10.1, and db.10.5 from the chapter on DNS.

Walk-through of an example file

The file begins with the XML declaration and document type declaration. These basically describe the XML specification and DTD to which this file should conform.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE GANY_DNS ... dtd removed here for brevity ... ]>
Next comes the actual XML data. Before getting into the details, here is an overview of the whole structure:
<GANY_DNS>

    <FORWARD_ZONES>
        <FORWARD_ZONE_1>...</FORWARD_ZONE_1> 
        <FORWARD_ZONE_2>...</FORWARD_ZONE_2> 
        ...
    </FORWARD_ZONES>

    <REVERSE_ZONES>
        <REVERSE_ZONE_1>...</REVERSE_ZONE_1> 
        <REVERSE_ZONE_2>...</REVERSE_ZONE_2> 
        ...
    </REVERSE_ZONES>

    <SYSTEMS>
        <SYSTEM_1>...</SYSTEM_1>
        <SYSTEM_2>...</SYSTEM_2>
        ...
    </SYSTEMS>

<GANY_DNS>

The topmost element, GANY_DNS, doesn't really encode any information. Instead, it acts as the root of the document structure tree. Contained within this top element are the three elements which actually hold the DNS information: FORWARD_ZONES, REVERSE_ZONES, and SYSTEMS.

FORWARD_ZONES

The FORWARD_ZONES element is a container that holds a series of ZONE elements. These define the zone-specific data for each of the forward zones.

The first of the forward zones is the DEFAULT zone. This element provides default values for all subsequent forward zones.

    <ZONE TYPE="DEFAULT" 
          NAME="FORWARD" 
          REFRESH="10800" 
          RETRY="600" 
          EXPIRE="604800" 
          MINTTL="21600" 
          HOST="ns1.mydomain1.com." 
          MAILADDR="root.ns1.mydomain1.com.">

      <NAMESERVERS>
        <HOST>dns1.mydomain1.com.</HOST>
        <HOST>dns2.mydomain1.com.</HOST>
      </NAMESERVERS>

      <DEFAULT_MX>
         <MX_HOST COST="10">mxhost1.mydomain1.com.</MX_HOST>
         <MX_HOST COST="15">mxhost2.mydomain1.com.</MX_HOST>
      </DEFAULT_MX>

    </ZONE>
The various attributes correspond to the values of the SOA resource record, with the exception of the TYPE and NAME attributes. The TYPE indicates that this is a DEFAULT zone. Other possible values correspond to the zone type designation in the BIND configuration file. They include MASTER, SLAVE, or HINT, though only MASTER is currently supported here. The NAME attribute gives the name of the zone. In this case, it is simply FORWARD.

Following the attributes, the NAMESERVERS element contains the default name servers for the forward zones. These will be the name servers designated for any forward zones that do not specify their own.

The final element is the DEFAULT_MX element, which contains the default mail exchangers to be assigned to systems that don't otherwise specify a mail exchanger. The COST attribute describes the priority of each one.

The DEFAULT zone is then followed by the MASTER forward zones.


For MASTER zones, the FILE attribute is the zone file which will contain the resource records for the zone indicated by the NAME attribute.

    <ZONE TYPE="MASTER" 
          FILE="db.mydomain1" 
          NAME="mydomain1.com." 
    </ZONE>

Note how much less XML is needed to repersent this zone. Attributes that are identical to the DEFAULT zone attributes are omitted, as are the NAMESERVERS and DEFAULT_MX elements. This reduces the size of the XML file considerably, but also makes it much easier to see the differences from zone to zone. The more zones there are, the more useful this space reduction and clarity becomes.


The next element is another MASTER zone.

    <ZONE TYPE="MASTER" 
          FILE="db.mydomain2" 
          NAME="mydomain2.com." 
          REFRESH="3600" 
          MINTTL="10800" 
          HOST="ns1.mydomain2.com." 
          MAILADDR="root.ns1.mydomain2.com.">

      <NAMESERVERS>
        <HOST>dns2.mydomain2.com.</HOST>
        <HOST>dns1.mydomain2.com.</HOST>
      </NAMESERVERS>

    </ZONE>

In this case, some of the attributes differ from the defaults. These are listed. The name servers for this zone are also different from the default values, so they too are listed. The mail exchangers are the same as the defaults and are omitted.

REVERSE_ZONES

The REVERSE_ZONES element is very similar to its forward counterpart. It contains a DEFAULT reverse zone followed by a series of MASTER reverse zones.
    <ZONE TYPE="DEFAULT" 
          NAME="REVERSE" 
          REFRESH="3600" 
          RETRY="600" 
          EXPIRE="604800" 
          MINTTL="10800" 
          HOST="ns1.mydomain1.com." 
          MAILADDR="root.ns1.mydomain1.com.">
      <NAMESERVERS>
        <HOST>dns1.mydomain1.com.</HOST>
        <HOST>dns2.mydomain1.com.</HOST>
      </NAMESERVERS>
      <DEFAULT_MX>
        <MX_HOST COST="10">mxhost1.mydomain1.com.</MX_HOST>
        <MX_HOST COST="15">mxhost2.mydomain1.com.</MX_HOST>
       </DEFAULT_MX>
    </ZONE>

    <ZONE TYPE="MASTER" 
          FILE="db.10.1" 
          NAME="1.10.in-addr.arpa." 
    </ZONE>

    <ZONE TYPE="MASTER" 
          FILE="db.10.5" 
          NAME="5.10.in-addr.arpa." 
    </ZONE>

The only things specified for the MASTER reverse zones are the names of the zone and the file which will contain the resource records for that zone. The DEFAULT reverse values are used here for everything else.

SYSTEMS

The SYSTEMS section of the XML contains a series of SYSTEM elements which represent all the information about the individual machines on the network.

Here is a SYSTEM in its simplest form, followed by a breakdown of its various components.

    <SYSTEM PTRTYPE="1">

      <DNS NAME="myhost1" DOMAIN="mydomain1.com."></DNS>

      <INFO CPU="Omni / 15.1" OS="WindowsNT 4.0" />

      <INTERFACE>
        <IP ADDR="10.1.2.4"></IP>
        <ETHER ADDR="aa:bb:cc:dd:ee:ff" />
      </INTERFACE>

    </SYSTEM>

Each SYSTEM has a PTRTYPE attribute which indicates how to automatically generate the reverse zone PTR resource records for that system. The possible values for the PTRTYPE are as follows:

0
For each IP element, there will be a PTR resource record generated for only those DNS names explicitly listed in the PTR fields of that IP element.

1
For each IP element, there will be a PTR resource record generated that points to the DNS entry of the SYSTEM. Used when all interface IPs should return the system name.

2
For each IP element, there will be a PTR resource record generated that points to each of the DNS entries of the INTERFACE containing that IP element. Used when every IP should point to the DNS name of the corresponding interface, rather than that of the system as a whole.

This system has a PTRTYPE of 1, meaning that there will be a PTR resource record generated that maps 10.1.2.4 to myhost1.mydomain.com. That is, the following resource record will appear in the reverse zone file db.10.1:

10.1.2.4     IN     PTR    myhost1.mydomain.com.

Next, there is a single DNS entry which says that this system's name is myhost1 and it belongs in the zone mydomain1.com. The fully qualified DNS name of this system is myhost1.mydomain1.com.

Following that, the INFO element gives a little information about the system machine itself, such as CPU type and OS.

Finally, the interfaces of this system are listed. This particular system is not multi-homed, so there is only one INTERFACE element containing the IP and ethernet address of this machine's only interface. As previously mentioned, the PTRTYPE is 1, so there is no need to explicitly list the PTR field within the IP element.


Here is one more basic system, this time for myhost2.

    <SYSTEM PTRTYPE="1">
      <DNS NAME="myhost2"> DOMAIN="mydomain1.com."</DNS>
      <INFO CPU="Apple / PB 180C" OS="MacOS" />
      <INTERFACE>
        <IP ADDR="10.1.170.5"></IP>
        <ETHER ADDR="0a:0b:0c:0d:0e:0f" />
      </INTERFACE>
    </SYSTEM>

The next system is similar to the others, but designates its own mail exchangers in the DNS element. By explicitly listing the two MX_HOST fields, mxhost1.mydomain2.com. and mxhost2.mydomain2.com., the DEFAULT_MX values for the zone are overridden.

 
    <SYSTEM PTRTYPE="1">
      <DNS NAME="yourhost1" DOMAIN="mydomain2.com.">
        <MX_HOST COST="10">mxhost1.mydomain2.com.</MX_HOST>
        <MX_HOST COST="15">mxhost2.mydomain2.com.</MX_HOST>
      </DNS>
      <INFO CPU="Dell / Dimension" OS="WinNT" />
      <INTERFACE>
        <IP ADDR="10.5.17.4"></IP>
        <ETHER ADDR="f0:e0:d0:c0:b0:a0" />
      </INTERFACE>
    </SYSTEM>

This next system also explicitly designates its own mail exchangers, but in addition demonstrates how aliases are represented. In the DNS section of the system, there is an ALIAS element for each alias of that system. These represent the corresponding CNAME resource records in the zone file. In this case, fred.mydomain2.com. is an alias of yourhost2.mydomain2.com.

  
    <SYSTEM PTRTYPE="1">
      <DNS NAME="yourhost2" DOMAIN="mydomain2.com.">
        <ALIAS NAME="fred" DOMAIN="mydomain2.com." />
        <MX_HOST COST="10">mxhost1.mydomain2.com.</MX_HOST>
        <MX_HOST COST="15">mxhost2.mydomain2.com.</MX_HOST>
      </DNS>
      <INFO CPU="HP / Custom" OS="Win95" />
      <INTERFACE>
        <IP ADDR="10.5.17.9"></IP>
        <ETHER ADDR="f0:e0:d0:c0:b0:a0" />
      </INTERFACE>
    </SYSTEM>


The final system is the most complicated. Like before, there is a DNS section which identifies the DNS name of this system, ns1.mydomain1.com., along with its aliases. The INFO element follows as usual, but the INTERFACE elements are what make this system interesting. They demonstrate how multi-homed systems are handled.

For every network interface of the system, there is an INTERFACE element. The INTERFACE contains information such as the IP and ethernet addresses of that interface. In addition, each INTERFACE can contain a DNS section of its own. This is analogous to the DNS section of the SYSTEM element, and contains the name/domain/alias/mxhost information of any additional DNS names associated with the interface.

The MX_HOST element of the ns1c interface is interesting because it shows how to explicitly designate that no mail exchanger be associated with this DNS name. If there were simply no MX_HOST listed, then the zone defaults would be applied. By using this empty element, no MX resource records at all will be generated for ns1c.

  
    <SYSTEM PTRTYPE="1">
      <DNS NAME="ns1" DOMAIN="mydomain1.com.">
        <ALIAS NAME="gopher" DOMAIN="mydomain1.com." />
        <ALIAS NAME="pop-server" DOMAIN="mydomain1.com." />
        <ALIAS NAME="www" DOMAIN="mydomain1.com." />
      </DNS>
      <INFO CPU="Sun / SparcCenter 2000" OS="2.5.1" />
      <INTERFACE>
        <DNS NAME="ns1c" DOMAIN="mydomain1.com.">
          <MX_HOST COST="" />
        </DNS>
        <IP ADDR="10.5.192.2"></IP>
      </INTERFACE>
      <INTERFACE>
        <DNS NAME="ns1b" DOMAIN="mydomain1.com.">
          <ALIAS NAME="ftp" DOMAIN="mydomain1.com." />
        </DNS>
        <IP ADDR="10.5.224.2"></IP>
      </INTERFACE>
    </SYSTEM>

One final thing to take a look at, since there are multiple interfaces in this system, is the PTRTYPE attribute. Because the PTRTYPE is 1 there will be a PTR resource record automatically generated for each IP, pointing to the DNS name of the system:

10.5.192.2     IN     PTR    ns1.mydomain.com.
10.5.224.2     IN     PTR    ns1.mydomain.com.

However, if the PTRTYPE had a value of 2, they would point to the DNS name of the interface to which the IP belongs:

10.5.192.2     IN     PTR    ns1c.mydomain.com.
10.5.224.2     IN     PTR    ns1b.mydomain.com.


Once all the systems are listed, the closing GANY_DNS element signals the end of the file. *Here* is the file in its entirety.


Brian O'Mara
Last modified: Wed Aug 23 15:29:14 CDT 2000