The Partitioning Process

This section describes the process by which dns2xml transforms existing BIND data into XML. In order to demonstrate what happens throughout the various stages of the process, some (pseudo)DNS resource records are listed below. Assume they represent data from both forward and reverse zone files. For clarity's sake, the DNS names have been reduced from a.mydomain.com. form to simply a, and IP addresses from 10.0.0.1 to 1. In other words, the systems are alpha and the IPs are numeric.

a  IN    A        1
a  IN    A        2 
a  IN    CNAME    x
b  IN    A        3
b  IN    A        4
c  IN    A        1
d  IN    A        2

1  IN    PTR      a 
2  IN    PTR      a
3  IN    PTR      b
4  IN    PTR      b

Stage 1: Extracting the Existing Data from the BIND Files

The initial step in the process of transforming the DNS data into an XML hierarchy is, of course, to extract the information stored in the zone files. This is a relatively straight-forward task in Perl. Given the resource records above, the result is, conceptually, a series of DNS names mapped to data as follows (see the dns2xml data structures section for more details):

a           b           c           d          
|           |           |           |
+-IPs       +-IPs       +-IPs       +-IPs
|  |           |           |           |
|  +-1         +-3         +-1         +-2
|  |           |
|  +-2         +-4
|
+-CNAMEs
   |
   +-x

1           2           3           4                      
|           |           |           |
+-PTRs      +-PTRs      +-PTRs      +-PTRs
   |           |           |        |
   +-a         +-a         +-b      +-b

Stage 2: Partitioning the Original Set of Data into Systems

Once we have extracted the data, the process becomes a bit trickier. We need to be able to group the resource record data into units that correspond to <SYSTEM> elements in the XML. We can do this by first thinking of the DNS names and IP addresses as nodes on a graph, with the A and PTR resource records representing the lines connecting them. Once in graph form, it becomes easy to determine the relationships among them and thereby separate them into systems. The first step is visualizing the resource record data in graph form, so here are some diagrams to help illustrate the process.

Here is the forward zone data. The arrows represent the link from DNS name to IP address indicated by the A resource records:

Here is the reverse zone data. The arrows in this case represent the links from IP address to DNS name indicated by the PTR resource records:

And here is the completed graph, showing how all the DNS names and IP addresses are related through the resource record data:

Hopefully, it is easy to see how the A and PTR resource record data translates to graph form. The next step is to find the connected components of the graph. These connected components are essentially individual subgraphs which, in this case, represent the groupings of related data that correspond to <SYSTEM> elements.

The algorithm (based on Depth First Search) that finds the connected components ignores the direction of the arrows linking the nodes, so we can now visualize the graph as follows:

The following diagram represents the partitioning of the original graph into its connected components, each of which contains the members of an eventual <SYSTEM> element:

These results show that System A will be comprised of a, c, and d, and the interfaces 1 and 2. Likewise, System B will be comprised of b and interfaces 3 and 4.

Stage 3: Generating the XML

At this point, all that is left is to determine how the above systems should map to XML. The first step is to determine which of the DNS names should be the name of the system. (Note that since the partitioning has already taken place, we can revert back to the directed graph representations.)

Since System B below only contains b, it is automatically the system. (As an aside, b has two interfaces in this example, but probably the most common situation by far is a single system with a single interface.)

The XML generated for System B is (roughly) as follows:

<SYSTEM>

  <DNS NAME="b"></DNS>  <-- 'b' is the only choice, so definitely the 'system'

  <INTERFACE>
    <IP ADDR="3">       <-- representing    b  IN  A    3
      <PTR>b</PTR>      <-- representing    3  IN  PTR  b
    </IP>
  </INTERFACE>

  <INTERFACE>
    <IP ADDR="4">       <-- representing    b  IN  A    4
      <PTR>b</PTR>      <-- representing    4  IN  PTR  b
    </IP>
  </INTERFACE>

</SYSTEM>

System A, on the other hand, has multiple DNS names: a, c, and d. Which of these should be the system? The answer is determined algorithmically, based upon a series of rules like "Given two DNS names that share an interface, if one of them has additional A records it is more likely to be the system than the one with only a single A record."

By inspecting the directed graph of System A, we can see that not only does a have more A records (black arrows) associated with it than c or d, but it is also referenced by a couple of PTR records (red arrows), which neither c nor d have. In this case, it is clear that a is most likely the best candidate to be the actual system. The other two are just alternate names for the 1 and 2 interfaces of a. Here's the xml:

<SYSTEM>

  <DNS NAME="a">           <-- 'a' was determined to be the system
    <ALIAS>x</ALIAS>       <-- representing      x  IN  CNAME   a
  </DNS>

  <INTERFACE>
    <DNS NAME="c"></DNS>   <-- representing      c  IN  A   1
    <IP ADDR="1">              (and implicitly)  a  IN  A   1
      <PTR>a</PTR>         <-- representing      1  IN  A   a
    </IP>
  </INTERFACE>

  <INTERFACE>
    <DNS NAME="d"></DNS>   <-- representing      d  IN  A   2
    <IP ADDR="2">              (and implicitly)  a  IN  A   2
      <PTR>a</PTR>         <-- representing      2  IN  A   a
    </IP>
  </INTERFACE>

</SYSTEM>

For most of the situations encountered while trying to convert existing DNS data into XML, this process works fine. There are, however, times when the systems generated in this manner are somewhat ambiguous. These situations are checked for by dns2xml, and if found, trigger some additional processing to try to resolve that ambiguity. An example of such a case and how it is resolved can be found here.