A Developer's Guide to the Scripts

This page is intended to be an informal guide (i.e. not yet an official part of the overall documentation) to working with the Perl code for the scripts involved in DNS processing using Ganymede.

The focus is on some of the guts of the two main scripts, dns2xml and xml2dns, in order to provide any poor soul who has to work with and modify my code some insight into how it all currently works.

The d2xpp script was (unfortunately) written very quickly to accomplish something more "important" at the time. It works, but should be rewritten to be less fragile and more correct (and to add sanity-checking for the DNS data.) It is a pretty simple script, and its role is spelled out pretty well in the Users's Guide, so rewriting should be relatively easy, and it won't be covered here.

dns2xml

At the most general level, dns2xml consists of three main actions:

Everything else is just details (covered shortly.)

Data Structures

The global data structures that are maintained by dns2xml are the $Zones, $Systems, and $Partition hash references. A brief discussion of these is necessary in order to talk about how the rest of dns2xml works:

$Zones

$Zones is a hash ref that maps a zone name to that zone's data. The zones in $Zones are the zones listed in the BIND config file (named.conf) and the information comes from the corresponding zone files. There are also default forward and default reverse zones in the dns2xml.conf file that are added to $Zones.
$Zones => {
           'mydomain1.com' => {
			'NAME'     => 'mydomain1.com.',
			'TYPE'     => 'MASTER',
			'FILE'     => 'db.mydomain1',
			'DIR'      => 'FORWARD',
			'SERIAL'   => '2000083101',              # \
			'REFRESH'  => 10800,                     # |
			'RETRY'    => 600,                       # |
			'MINTTL'   => 21600,                     # | SOA info 
			'NEGTTL'   => 21600,                     # | 
			'EXPIRE'   => 604800,                    # |
			'HOST'     => 'ns1.mydomain1.com.',      # |
			'MAILADDR' => 'root.ns1.mydomain1.com.', #/
			'NS' => {
				 'dns1.mydomain1.com.' => 1,    # nameservers 
				 'dns2.mydomain1.com.' => 1,     
				},
			'MX' => {
				 '10 mxhost1.mydomain1.com.' => 1, # mx hosts
				 '15 mxhost2.mydomain1.com.' => 1,
				},
		       }                       
           'zone2.com.'   =>  { ... info ... }
           'zone3.com.'   =>  { ... info ... }
           'FORWARD'      =>  { ... default forward info ... }
           'REVERSE'      =>  { ... default reverse info ... }
           }

$DNS_Names

$DNS_Names contains the graph structure and resource record data for all the dns names extracted from the zone files.

Given the following resource records:

    ; from zone file db.mydomain1 
    ns1.mydomain1.com.      IN   A       1.2.3.4
    ns1.mydomain1.com.      IN   A       1.2.3.5
    ns1.mydomain1.com.      IN   MX      10 mxhost1.mydomain1.com. 
    ns1.mydomain1.com.      IN   MX      15 mxhost1.mydomain1.com. 
    ns1.mydomain1.com.      IN   CNAME   ns1.mydomain1.com.

    ; from zone file db.1.2.3 
    4.3.2.1.in-addr.arpa.   IN   PTR     ns1.mydomain.com.

$DNS_Names would contain the following two entries:

$DNS_Names => {
    'ns1.mydomain1.com.' => {
			     'NAME'    => 'ns1',
			     'DOMAIN'  => 'mydomain1.com.',

			     'P_TO'    => {'4.3.2.1.in-addr.arpa.' => 1,
                                          '5.3.2.1.in-addr.arpa.' => 1,
                                         },
			     'P_FROM'  => {'4.3.2.1.in-addr.arpa.' => 1},

			     'A'       => {'4.3.2.1.in-addr.arpa.' => 1,
                                          '5.3.2.1.in-addr.arpa.' => 1,
                                         },
			     'CNAME'   => {'www.mydomain1.com.' => 1},
			     'MX'      => {
				           '10 mxhost1.mydomain1.com.' => 1
				           '15 mxhost2.mydomain1.com.' => 1,
				          }
			    },       
    '4.3.2.1.in-addr.arpa.' => {
			        'NAME'    => '4',
			        'DOMAIN'  => '3.2.1.in-addr.arpa.',

				'P_TO'    => {'ns1.mydomain1.com.' => 1},
				'P_FROM'  => {'ns1.mydomain1.com.' => 1},

				'PTR'     => {'ns1.mydomain1.com.' => 1},
			       },
                }
The keys under an individual dns name entry in $DNS_Names fall into some basic categories:

$Partition

While $DNS_Names is the place that all the resource record information gets bundled up and associated with the appropriate dns name, $Partition is where all those dns names get bundled into individual systems.

Essentially, the partitioning routines assign a number to each entry in $DNS_Names and then move all those entries to their corresponding spot in $Partition. ($DNS_Names is empty at the end of all the moves.)

If $DNS_Names looked like this:

$DNS_Names => {
               'dnsname1' => { ... info ... }
               'dnsname2' => { ... info ... }
               'dnsname3' => { ... info ... }
               'dnsname4' => { ... info ... }
               }

and the partitioning routines determined that dnsname1 and dnsname2 belonged to one system while dnsname3 and dnsname4 belonged to a different one, then $Partition might look like this:

$Partition => {
               '1' => {   # <---- system number, assigned in processing order
                      'dnsname1' => { ... info ... }
                      'dnsname2' => { ... info ... }
                      }
               '2' => {
                      'dnsname3' => { ... info ... }
                      'dnsname4' => { ... info ... }
                      }
               }

Remember: Whenever there is confusion with these multiple hashes of hashes, inserting a print Dumper $variable; will list the contents of the variable and can be invaluable to determine what's going on. It can substantially increase the running time of the script, though.

Algorithms

There are two main algorithms of interest, partitioning and subpartitioning. These both deal with grouping the extracted DNS information into systems so they can be structured into <SYSTEM> elements in the XML document.

Partitioning

The PartitionsIntoSystems subroutine partitions the original set of dns names in $DNS_Names into numbered systems in $Partition. See the Data Structures section at the start of this chapter for an example of the transformation and resulting structures.

The following is the basic algorithm for grouping the dns names into systems. This is the essence of the algorithm in pseudocode (doesn't correspond exactly w/code, but clearly shows the process):

     To begin with, no dns name entry in 
     $DNS_Names has an assigned SYSTEM_NUMBER 

     PartitionIntoSystems{
       $current_system_number = 0;
       foreach ($dns_name in $DNS_Names) {
         if ($dns_name doesn't have an assigned SYSTEM_NUMBER) {
           increment the $current_system_number;
           DFS($current_system_number, $dns_name);
         }
       }
     }

     DFS($system_num, $dns_name) {
       $dns_name->{SYSTEM_NUMBER} = $system_num;
       foreach ($adjacent_node of $dns_name) {
       if ($adjacent_node doesn't have an assigned SYSTEM_NUMBER) {
         DFS($system_num, $adjacent_node);
       }
     }

The end result is that we find the connected components of the graph such that:

     if 
       $DNS_Names->{$dns_name1}{SYSTEM_NUMBER} is equal to  
       $DNS_Names->{$dns_name2}{SYSTEM_NUMBER}
     then 
       $dns_name1 and $dns_name2 belong to the same connected component
       (the same system) in $Partition.

So, if the SYSTEM_NUMBER for $dns_name1 and $dns_name2 was 1, then

     $Partition->{'1'} => {
                           $dns_name1 => {...info...},
                           $dns_name2 => {...info...},
                          }
An illustrated walkthru of the process can be viewed here.

Subpartitioning

Sometimes the partitioning routine creates a grouping of data that doesn't quite fit into a <SYSTEM> element. It is ambiguous. An ambiguous system is one where there is an interface in the system for which there is no corresponding A resource record linking the system's dns name with the interface IP. The IP in question is pulled into the system through a cross linking with one of the interface DNS names:
   x  IN  A  1     |  The DFS will pull together {x,y,1,2,3,4} as a system.
   x  IN  A  2     |  Then x is determined to be the system name, and
   x  IN  A  3     |  y is the name of the '2' interface of system x. 
   y  IN  A  2     |  The problem is that y also points to '4', which has 
   y  IN  A  4     |  nothing to do with system x from a DNS standpoint.

We need to figure out how to split y and 4 out so that they make the most sense. Here's the basic algorithm:

   foreach $dns_name that points to one or more of $system_name's IPs:
     ($dns_name can be $system_name)
  
     if all $dns_name's IPs are pointed to by $system_name
  
       MOVE $dns_name
       MOVE all $dns_name's IPs
  
     else // $dns_name points to some IP that $system_name doesn't
  
       if subpartition type == ip_unique
  
         COPY $dns_name
         MOVE $dns_name's IPs that are pointed to by $system_name
  
       else // subpartition type == sys_unique
  
         SKIP $dns_name
         COPY $dns_name's IPs that are pointed to by $system_name 
  
   MOVE = remove from original system and copy to new one 
   COPY = leave in original system and copy to new one 
   SKIP = leave in original system and don't copy to new one 
The processing of this algorithm actually occurs in two stages. The first is the loop shown above, where each $dns_name is actually "marked" with the action to take. Once the loop is finished, and all the $dns_names have been marked with an action, then we sweep through all those $dns_names and take the appropriate action.

During the looping of the marking phase, a $dns_name may be marked more than once. That's OK, but the following rule is observed:

A COPY can overwrite a MOVE, but a MOVE cannot overwrite a COPY
For another view of this problem and what is done to fix it, see this walkthru of subpartitioning.

xml2dns

Didn't get this far. It's very straight-forward, though. Just use the Data::Dumper module to examine the data structures and take a look at the XML::Parser docs relating to the 'Tree' option. Sorry I didn't write more on this, but I figured that the dns2xml stuff needed more documentation and so spent my time on that one. It's 3am now and I must go to sleep....

If there are any questions on any parts of these scripts, please feel free to email me at bom@alumni.utexas.net.