CWB V3 code architecture

This document is a rough guideline to the architecture of the CWB source code.

The CL library

CL stands for "Corpus Library". This library contains the most basic functions for CWB and CQP. Things in the CL only depend on other things in the CL.

Note that the "dependencies" given below are based on which modules #include each others' headers. Some, e.g. attributes.c and cdaccess.c, are mutually ependent in this sense.

cl/cl.h

cl/attributes.h ; cl/attributes.c

cl/binsert.h ; cl/binsert.c

cl/bitfields.h ; cl/bitfields.c

cl/bitio.h ; cl/bitio.c

cl/cdaccess.h ; cl/cdaccess.c

cl/class-mapping.h ; cl/class-mapping.c

cl/compression.h ; cl/compression.c

cl/corpus.h ; cl/corpus.c

cl/dl_stub.c

cl/endian.h ; cl/endian.c

cl/fileutils.h ; cl/fileutils.c

cl/globals.h ; cl/globals.c

cl/lexhash.h ; cl/lexhash.c

cl/list.h ; cl/list.c

cl/macros.h ; cl/macros.c

cl/makecomps.h ; cl/makecomps.c

cl/registry.l ; cl/registry.y

cl/registry.tab.h

cl/regopt.h ; cl/regopt.c

cl/special-chars.h ; cl/special-chars.c

cl/storage.h ; cl/storage.c

cl/Makefile

CQi - Corpus Query interface

This is the "cqpserver" program and some modules that it depends on.

CQi/CQi.h

This file #defines all the CQI_* constants; there are no function prototypes or data structures here.

This part of CWB depends (a) on the CL library and (b) on CQP.

CQi/cqpserver.c

CQi/auth.h ; CQi/auth.c

CQi/server.h ; CQi/server.c

CQP (query processor and interactive environment)

Dependencies in this directory on the CL are not noted unless especially relevant. Basically everything here depends on the CL one way or another. Also, interdependencies between different cqp modules are not noted.

cqp/ascii-print.c ; cqp/ascii-print.h

cqp/attlist.c ; cqp/attlist.h

cqp/builtins.c ; cqp/builtins.h

cqp/concordance.c ; cqp/concordance.h

cqp/context_descriptor.c ; cqp/context_descriptor.h

cqp/corpmanag.c ; cqp/corpmanag.h

cqp/cqp.c ; cqp/cqp.h

cqp/cqpcl.c

cqp/dummy_auth.c

cqp/eval.c ; cqp/eval.h

cqp/groups.c ; cqp/groups.h

cqp/hash.c ; cqp/hash.h

cqp/html-print.c ; cqp/html-print.h

cqp/latex-print.c ; cqp/latex-print.h

cqp/llquery.c

cqp/macro.c ; cqp/macro.h

cqp/matchlist.c ; cqp/matchlist.h

cqp/options.c ; cqp/options.h

cqp/output.c ; cqp/output.h

cqp/parser.l ; cqp/parser.y

cqp/parse_actions.c ; cqp/parse_actions.h

cqp/paths.c ; cqp/paths.h

cqp/print-modes.c ; cqp/print-modes.h

cqp/print_align.c ; cqp/print_align.h

cqp/ranges.c ; cqp/ranges.h

cqp/regex2dfa.c ; cqp/regex2dfa.h

cqp/sgml-print.c ; cqp/sgml-print.h

cqp/symtab.c ; cqp/symtab.h

cqp/table.c ; cqp/table.h

cqp/targets.c ; cqp/targets.h

cqp/tree.c ; cqp/tree.h

cqp/treemacros.h

cqp/variables.c ; cqp/variables.h

cqp/Makefile

Command-line utilities

Most of these files contain the code for a single program, each of which is one of the non-interactive components of CWB. These files do not usually have headers - the functions in them are for that program alone.

These utilities are used most importantly for corpus setup but also for a range of administration tasks.

As a general rule, the utilities depend on the CL library. Most of them #include cl/cl.h but some #include other headers from the CL library.

utils/barlib.c ; utils/barlib.h

utils/feature_maps.c ; utils/feature_maps.h

utils/cwb-align-encode.c

utils/cwb-align-show.c

utils/cwb-align.c

utils/cwb-atoi.c

utils/cwb-compress-rdx.c

utils/cwb-decode-nqrfile.c

utils/cwb-decode.c

utils/cwb-describe-corpus.c

utils/cwb-encode.c

utils/cwb-huffcode.c

utils/cwb-itoa.c

utils/cwb-lexdecode.c

utils/cwb-makeall.c

utils/cwb-s-decode.c

utils/cwb-s-encode.c

utils/cwb-scan-corpus.c

utils/Makefile

Other directories within the CWB root directory

config

The subdirectories here contain chunks of makefile for use when compiling CWB on different operating systems.

doc

This contains documentation of the CWB code (note: not user documentation for CWB/CQP), including this file!

editline

This contains a (slightly patched) version of the Editline library, on which earlier versions of CQP were dependent. Now that CQP has been backported to GNU Readline in CWB 3.2.4+, the directory is no longer needed and will be deleted in a future check-in.

instutils

This directory contains shell scripts (sh) for configuring / installing CWB.

man

This contains the *.pod source files for the man entries for cqp and the CWB command-line utilties.

mingw-libgnurx-2.5.1

This contains an internal copy of the source code for the libregex needed to give CWB under windows (with MinGW) POSIX regular expression capability. It comes from here:
https://sourceforge.net/project/shownotes.php?release_id=140957
To quote the release notes, "This is a port of the GNU regex components from glibc, ported for use in native Win32 applications by Tor Lillqvist." There is a binary version, but for cross-compilation it seemed like
a better idea to have a copy of the source internal to the CWB tree.

Global variables in CL

(This is just an idea --- useful? Or overkill? -- AH)

NameTypeDefined inDeclared extern inWhat is it?
@@@@@@@@
@@@@@@@@
@@@@@@@@
@@@@@@@@
@@@@@@@@
@@@@@@@@

Global variables in CQP

(This is just an idea --- useful? Or overkill? -- AH)

NameTypeDefined inDeclared extern inWhat is it?
@@@@@@@@
@@@@@@@@
@@@@@@@@
@@@@@@@@
@@@@@@@@
@@@@@@@@