cwb-align.c File Reference

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <math.h>
#include "../cl/cl.h"
#include "feature_maps.h"

Defines

Functions

Variables


Define Documentation

#define DEFAULT_CONFIG_LINES   4

number of config lines in the default config

Referenced by print_usage().


Function Documentation

int do_alignment ( FMS  fms,
int  if1,
int  il1,
int  if2,
int  il2,
FILE *  outfile 
)

Actually does the alignment.

This function run a best_path alignment on sentence regions [f1,l1]x[f2,l2] and writes the result to {outfile} (in .align format).

Usage: steps = do_alignment(FMS, f1, l1, f2, l2, outfile);

Parameters:
fms The feature map to use in best_path alignment.
if1 First cpos in source corpus.
il1 Last cpos in source corpus.
if2 First cpos in target corpus.
il2 Last cpos in target corpus.
outfile File handle to print the alignment lines to.

References beam_width, best_path(), feature_match(), print_align_line(), split_factor, and verbose.

Referenced by main().

int main ( int  argc,
char *  argv[] 
)
int parse_args ( int  ac,
char *  av[],
int  min_args 
)

Parses the program's commandline arguments.

Usage: optindex = parse_args(argc, argv, required_arguments);

Parameters:
ac The program's argc
av The program's argv
min_args Minimum number of arguments to be parsed.
Returns:
The value of optind after parsing, ie the index of the first argument in argv[]

References beam_width, outfile_name, prealign_has_values, prealign_name, print_usage(), progname, registry_directory, split_factor, verbose, and word_name.

void print_align_line ( FILE *  fd,
int  f1,
int  l1,
int  f2,
int  l2,
int  quality 
)

Prints an alignment line.

This function writes the given information to the specified file handle as a .align format line.

A .align line looks like this: {f1} {l1} {f2} {l2} {type} [{quality}] eg. "140 169 137 180 1:2" means that corpus (position) ranges [140,169] and [137,180] form a 1:2 alignment pair .

Usage: print_align_line(fd, f1, l1, f2, l2, quality);

Parameters:
fd File handle to print to.
f1 First cpos in source corpus.
l1 Last cpos in source corpus.
f2 First cpos in target corpus.
l2 Last cpos in target corpus.
quality Quality of the alignment.

References cl_struc2cpos.

Referenced by do_alignment().

void print_usage ( void   ) 

string containing location of the registry directory.

Prints a message describing how to use the program to STDERR and then exits.

References default_config, DEFAULT_CONFIG_LINES, and progname.


Variable Documentation

int beam_width = 50

best path search beam width

Referenced by BAR_write(), do_alignment(), and parse_args().

Pointer to configuration strings.

Set initially to default_config ; should be reset to the {config} part of argv[], if configuration is specified on the command line.

Referenced by main().

int config_lines = DEFAULT_CONFIG_LINES

Number of lines in the configuration strings array.

Referenced by main().

corpus handle: source corpus

char* corpus1_name

name of the source corpus

corpus handle: target corpus

char* corpus2_name

name of the target corpus

char* default_config[DEFAULT_CONFIG_LINES]
Initial value:
 {
  "-C:1",
  "-S:50:0.4",
  "-3:3",
  "-4:4"
}

Set of strings containing default configuration options.

Notes on interpreting the lines (in order):

  • character count
  • shared tokens with frequency ratio >= 1/2
  • trigrams get 3 units
  • 4-grams get 2*3 + 4 = 10 units

Referenced by print_usage().

char outfile_name[1024] = "out.align"

name of the output file

Referenced by main(), and parse_args().

int pre1 = 0

number of pre-alignment regions (source corpus)

Referenced by main().

int pre2 = 0

number of pre-alignment regions (target corpus)

Referenced by main().

pre-alignment attribute (source) if given

pre-alignment attribute (target)

boolean: if 1, regions with same ID values are pre-aligned

Referenced by main(), and parse_args().

char prealign_name[1024] = ""

pre-alignment given by structural attribute

Referenced by main(), and parse_args().

char* progname

Name of the program (from the shell).

char* registry_directory = NULL

Referenced by main(), and parse_args().

sentence attribute handle: source

sentence attribute handle: target

char* s_name

name of the S-attribute containing sentence boundaries

Referenced by main().

int size1

size of source corpus in sentences

Referenced by main().

int size2

size of target corpus in sentences

Referenced by main().

double split_factor = 1.2

2:2 alignment split factor

Referenced by do_alignment(), and parse_args().

int verbose = 0

controls printing of some extra progress info

word attribute handle: source

Referenced by create_feature_maps().

word attribute handle: target

Referenced by create_feature_maps().

char word_name[1024] = "word"

name of the word attribute (default: word)

int ws1

size of source corpus in words (i.e.

corpus positions)

Referenced by main().

int ws2

size of target corpus in words (i.e.

corpus positions)

Referenced by main().


Generated on Sun Feb 28 18:08:04 2010 for CWB by  doxygen 1.6.1