Scripts/Tools

External Scripts

So far, the discussion has been a bit theoretical, setting up the general idea of how DNS management might be accomplished with Ganymede. The next part of this document deals a bit more directly with the processes involved in actually working with the DNS data and converting it between the xml and BIND formats. As mentioned a little earlier, Ganymede doesn't produce BIND db files directly, but rather generates the xml which is then converted to the db files by an external script. In addition, Ganymede doesn't read the db files into the database directly, either. It relies on another external script to convert the db files into xml, which it can then read into the database. There are two main (perl) scripts involved, dns2xml and xml2dns, that handle the conversions and the next few sections describe the roles they play.

Importing Existing BIND Data Using dns2xml

The goal of dns2xml is to take an existing set of DNS data contained in BIND db files and convert it into an xml representation that allows it to be imported quickly and easily into the Ganymede database. It basically extracts the resource record data and attempts to translate that information into the xml structure described earlier.

Because of the variations (abbreviations, directives, etc.) in db files, the first task in the importing process is to standardize them as much as possible. This is accomplished by the d2xpp script. This is a preprocessor that attempts to transform existing db files to a relatively uniform format that dns2xml can then read and process.

d2xpp

The dns2xml preprocessor (d2xpp) does several things. The first is that it converts all hosts to Fully Qualified Domain Name (FQDN) format. A common abbreviation used when writing db files by hand is to leave off the zone portion of the host name. If the host doesn't end with a period, it is implied that the current zone should be added. The preprocessor will append the current zone to any host that doesn't end in a period.

In the mydomain1.com. zone:

a     IN   A  10.1.2.3   becomes   a.mydomain1.com.    IN   A  10.1.2.3

(Note: The IP addresses in 'A' records are not touched. 10.1.2.3 does not become 3.2.1.10.in-addr.arpa.)

In addition, d2xpp will expand the $INCLUDE and $ORIGIN directives, explicitly replacing them with the data they represent.

Finally, the bulk of sanity checking on the existing data will occur during this preprocessing stage. [This isn't in place yet - how strict should it be? Probably should just report suspicious values and let the user make the call.]

All the preprocessed files will be generated in a new directory and all further processing will occur on these copies, leaving the original data files intact.

dns2xml.conf

[The config file stuff is very rudimentary now - needs more work] In addition to the preprocessed data described above, the dns2xml script also requires some default data values to be provided. These are, at the very least, values for the forward and reverse zone refresh, retry, minttl, etc. These values are stored in the dns2xml.conf file. If this file does not exist, dns2xml will prompt for the required data and then generate the text file. Subsequent runs of dns2xml will then use the new file. [Provide some examples when a bit more refined]

How dns2xml Operates

This section describes the process by which dns2xml structures the BIND db file data into xml. [May be a little low level for right here. Of interest to someone actually trying to figure out what's going on and why.]

Stage 1: Extracting the Existing Data from the BIND Files

The first step in this process is to extract all the information stored in the existing db files. The result is that each host becomes associated with a list of the resource record data for that host. At this point, the only information gathered is a mapping from host to data associated with that host. There is no knowledge of a host's relationship to any other hosts. For example, it's not possible to tell whether a given host is actually a 'system' host or just a named interface of another system (see previous system/interface discussions.)

In order to be able to generate xml based upon the system/interface structure, it's first necessary to determine the relationships among all the hosts gathered in the first part. All the data that existed in the db files has been extracted into numerous data "fragments", but it's necessary to make connections among these various fragments before we can determine their status as systems or interfaces. For example, we may know that host1 has two aliases, host2 has two associated mx hosts and host3 only has an 'A' resource record. It's not known yet, for example, if host 3 shares an IP with host2 (which might make it a named interface of host2.) Making all the appropriate connections is the job of stage 2.

Stage 2: Creating a Complete Picture of the Links Between the Hosts Identified in Stage 1

[Graph terminology might not be totally correct - check up on it]

The determination of relationships between hosts is based on the idea of a directed graph, with hosts and IPs as the nodes. The edges of the graph are determined by the 'A' and 'PTR' resource records. As an example, consider the following data extracted from hypothetical db files (both forward and reverse):

         x.ok.com.               IN A   10.0.1.1 
         y.ok.com.               IN A   10.0.1.1
         1.1.0.10.in-addr.arpa.  IN PTR x.mydomain.com.

Based upon this, the graph would have three nodes (x), (y), and (10.0.1.1). An arrow would be drawn from (x) to (10.0.1.1), another would be drawn from (y) to (10.0.1.1), and a final one would be drawn from (10.0.1.1) to (x).

This is what it looks like graphically:

Note on the graphics:
These schematic diagrams are meant to represent, as a directed graph, the information stored in the 'A' and 'PTR' resource records of the BIND db files. The oval black nodes are the hosts and the rectangular red ones are the IP addresses. The black arrows from the host to the IP indicates an 'A' resource record and the red arrows from the IPs to the hosts indicate 'PTR' records. If there is no red arrow from an IP to a host, for example, there is no PTR record from that IP to that host.

Even though there is obviously some relationship between x and y above (they share the 10.0.1.1 interface), x doesn't know that y exists. In other words, there is no way that x can reach y by following the existing links on the graph. In this example, y could reach x by following its link to 10.0.1.1, then following 10.0.1.1's link to x. That's good, but what if there was no PTR record to allow 10.0.1.1 to act as a bridge? The hosts x and y would both share the same IP, but have no idea about the existence of each other. In order to generate a structured xml representation of the data, it's necessary that one host's relationship to another is known. These relationships between the hosts are determined by how and to what they are connected.

[My whole way of describing x "knowing" this or that is really lame. I've been reading a little more on graphs and sets and I'm planning on upgrading the vocab soon. This'll do for right now....]

We need a way to make sure that given an arbitrary host in our extracted data, we know all the other hosts that will have an impact on the system/interface decision we must make for that host. In the example above, we need x to know about y.

Part of dns2xml's task is to visit each host in the original extracted information and ensure that it knows not only what it points to (as given by the 'A' or 'PTR' data), but also what points to it. The result is that any host knows all the other hosts with which it shares information.

Note - Throughout this discussion, the not-so-sophisticated terminology "points to" is used to describe the relationship between the host and IP in 'A' and 'PTR' records, since it maps well with the idea of nodes in a graph being pointed to by arrows from other nodes:

"x.ok.com. points to 10.0.1.1"    <==>   "x.ok.com.  IN A   10.0.1.1"

In the earlier example, the result of Stage 2 would be the following:

         x knows that it points to 10.0.1.1 (it already knew that, given the RR)
         y knows that it points to 10.0.1.1 (also already known)
         10.0.1.1 knows that it points to x (already known)
         10.0.1.1 knows that it is pointed to from x (new info)
         10.0.1.1 knows that it is pointed to from y (new info)
         x knows that it is pointed to from 10.0.1.1 (new info)
Basically, the information contained by each node is increased from only what each node points to, to what each node points to and what points to that node.

So, now:

	    x knows that it points to 10.0.1.1 (original info)
	    10.0.1.1 knows that it is pointed to from y (new info)
therefore:

            Through 10.0.1.1, it can now be determined that x is related somehow to y.
Similar linkages are made for all the original hosts, resulting in a picture of the connectivity of all the hosts and IPs in the original data. No hierarchical inferences have been made yet, but all the information should be available to make them when the time comes. All hosts that have a relationship (share an IP, etc.) can now be grouped together.

Stage 3: Determining Systems and Interfaces from the Linked Hosts

[This partition terminology is a little incorrect. A partition is a set of sets. All of the sets in the partition are pairwise disjoint and the union of all the sets = the given universal set. So each element that I call a partition here is really just one of the sets that makes up the actual partition. I'll need to spend a little time soon fixing this up.]

The next part of the process separates the original long list of all hosts into separate "partitions" consisting of hosts that are linked as described above. Each partition represents a system element in the xml. The partition will contain one or more hosts and IPs. One of these hosts is determined to be the 'system' host, based on a series of rules like "Given two hosts that share an IP, the host with many additional A records is more likely to be the system host than a host with only one A record." Once the system host is identified, the rest are assumed to be subordinate to it, and are considered the named interfaces. In the following case, x has several A records associated with it, while y only has one, shared by x. The x host is determined to be the system host, so y then becomes a named interface of x:

(The underlined host in the graphs is the one that has been determined to be the system host.)

SYSTEM: x.ok.com.

x.ok.com. IN  A  10.0.1.1  
x.ok.com. IN  A  10.0.1.2     
x.ok.com. IN  A  10.0.1.3


INTERFACE: y.ok.com.

y.ok.com. IN  A  10.0.1.1

[What follows takes a long time and a bit of complexity to explain handling of an abnormal, possibly rare situation. It's important, but might be in the way of the overall picture here. Maybe separate this out. Reader should probably skip down to "Stage 4" on first reading and then come back to this part.]

At this point, some checking is done to try to catch inconsistent results such as an interface host pointing to an IP that the system host doesn't point to. If something like this is found, an attempt is made to represent the information as best as can be done by sub-dividing the existing partition. The following is an example of a situation that would cause the sub-partitioning to go into effect, and the results that would be returned. This (obviously) isn't any sort of real world data, but it does show how a certain kind of "what if" scenario is handled.

The oval nodes represent hosts and the rectangular nodes represent IP addresses. The underlined host is the host which has been determined to be the 'system' host for the current partition.

The Original Data
Forward Data:
a.ok.com.    IN    A    10.0.0.1
a.ok.com.    IN    A    10.0.0.2
a.ok.com.    IN    A    10.0.0.3
a.ok.com.    IN    A    10.0.0.8
b.ok.com.    IN    A    10.0.0.2
b.ok.com.    IN    A    10.0.0.3
c.ok.com.    IN    A    10.0.0.3
c.ok.com.    IN    A    10.0.0.4
c.ok.com.    IN    A    10.0.0.5
d.ok.com.    IN    A    10.0.0.5
d.ok.com.    IN    A    10.0.0.6
d.ok.com.    IN    A    10.0.0.7
e.ok.com.    IN    A    10.0.0.6
f.ok.com.    IN    A    10.0.0.7
g.ok.com.    IN    A    10.0.0.8
g.ok.com.    IN    A    10.0.0.9

Reverse Data:
1.0.0.10.in-addr.arpa.    IN    PTR  a.ok.com.
2.0.0.10.in-addr.arpa.    IN    PTR  a.ok.com. 
2.0.0.10.in-addr.arpa.    IN    PTR  b.ok.com.
3.0.0.10.in-addr.arpa.    IN    PTR  c.ok.com.
4.0.0.10.in-addr.arpa.    IN    PTR  c.ok.com.
5.0.0.10.in-addr.arpa.    IN    PTR  c.ok.com.
5.0.0.10.in-addr.arpa.    IN    PTR  d.ok.com.
6.0.0.10.in-addr.arpa.    IN    PTR  d.ok.com.
7.0.0.10.in-addr.arpa.    IN    PTR  f.ok.com.
8.0.0.10.in-addr.arpa.    IN    PTR  a.ok.com.

With no partitioning

The parts of the data that are inconsistent are just ignored, resulting in loss of data, but the data that remains is correct.

Partitioning with the ip_unique option

The first sub-partitioning method assumes that it's ok for an interface of a system to have the same name as another separate system, but that IPs can't overlap between systems.

Partition 1

Partition 2

Partition 3

Partition 4

Partitioning with the sys_unique option

The second method is the opposite of the first. It assumes that it's ok for IPs to be represented in multiple systems, but that the same name can't be used for both a system and an interface of another system.

Partition 1

Partition 2

Partition 3

Partition 4

These are both attempts to make sense out of abnormal data, so it's highly doubtful that either result is what was originally intended, but there should be no loss of data (of course wrong data can be worse), and the user is given the option to use these or not as need be. On the (limited) real world data dns2xml has been tested on, I haven't had to use this option. I've had to generate specific cases by hand to demonstrate and test this situation, so I'm not sure a) how useful this remedy is, b) how common this situation is, and c) if there are similar situations where even this totally fails to resolve ambiguities.

Stage 4: Finally...

After all the partitions are generated, and any sub-partitioning has taken place, the data should be structured enough to be able to generate xml from it. Each group, or partition, of hosts corresponds to a SYSTEM element in the xml. There will be a main 'system' host and possibly some associated hosts which will appear in the INTERFACE elements of the given SYSTEM.

[Show the xml for the preceding examples, either here or side-by-side w/pics above.]

The remainder of the dns2xml process just loops over all the partitions and fills in all the xml tags with the data that was extracted and stored in Stage 1.

Current Limitations with dns2xml

Generating BIND db Files using xml2dns

Now that the dns data is contained in an xml file (which will eventually be generated by Ganymede itself) the obvious next step is to take that xml and use it to generate some dns db files representing that data.

Converting from xml to db files is much more straightforward than trying to convert existing db files into xml. The first step is to start xml2dns with the input xml file, a directory to output the dns db files to (a test directory is highly recommended for now...), and a 'verbosity' level. The first two options should be self-explanatory. The verbosity option may need a little more discussion.

Verbosity

The verbosity option is designed to allow varying degrees of shorthand in the resulting db files. For example, it is common in BIND db files to use the '@' symbol to represent the current zone. Another common shortcut is to write a host name on only the first line it is used. It is the implied host for all subsequent resource records until it is explicitly changed. Probably the most common shortcut of all is to drop the zone portion of the host name completely. The current zone is automatically added to all hosts in the file which do not end in a period.

Though these shortcuts are designed to reduce keystrokes on the part of the administrator, and thus shouldn't matter too much in an automated system, the shorthand can make it a little easier for humans to read the files.

The following shows the results of 3 different runs utilizing more shorthand each time:

[add examples here]

Implicit Values

In addition, xml2dns applies the logic to deal with implicit values encoded in the xml. For instance, it recognizes that if there are no mx host elements listed for a particular host, then the default mx hosts for that particular zone should be used.

Alluded to earlier, but skipped over then, is the "PTRTYPE" attribute of the SYSTEM element. This functions as a shorthand means of displaying some common default PTR record configurations. The values that the PTRTYPE attribute can take, and what each means is given below.

PTRTYPEs:

0
the only PTR records generated are those specifically listed in the <PTR> fields of the <IP> elements.
1
every interface will have a corresponding PTR record pointing to the DNS entry of the system.
2
every interface will have a corresponding PTR record(s) pointing to the DNS entries of that interface.

Here are some xml fragments and the resulting db files that might help to illustrate the use of the PTRTYPE attribute:

[PTRTYPE examples here]


Brian O'Mara
Last modified: Thu Jul 20 17:04:18 CDT 2000