feature_maps.c File Reference

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
#include "feature_maps.h"
#include "barlib.h"

Functions

Variables


Function Documentation

void best_path ( FMS  fms,
int  f1,
int  l1,
int  f2,
int  l2,
int  beam_width,
int  verbose,
int *  steps,
int **  out1,
int **  out2,
int **  out_quality 
)

Finds the best alignment path for the given regions of sentences in source and target corpus.

This function does a beamed dynamic programming search for the best path aligning the sentence regions (f1,l1) in the source corpus and (f2,l2) in the target corpus.

Allowed alignments are 1:0 0:1 1:1 2:1 1:2.

The results are returned in the vectors out1 and out2, which each contain a number of valid entries (alignment points) equal to {steps}.

Alignment points are given as sentence numbers and correspond to the start points of the sentences. At the end-of-region alignment point, sentence numbers will be l1 + 1 and l2 + 1, which must be considered by the caller if l1 (or l2) is the last sentence in the corpus!

The similarity measures of aligned regions are returned in the vector out_quality.

Memory allocated for the return vectors (out1, out2, out_quality) is managed by best_path() and must not be freed by the caller. Calling best_path() overwrites the results of the previous search.

Example usage:

best_path(FMS, f1, l1, f2, l2, beam_width, 0/1, &steps, &out1, &out2, &out_quality);

Parameters:
fms The FMS to use.
f1 Index of first sentence in source region.
l1 Index of last sentence in source region
f2 Index of first sentence in target region.
l2 Index of last sentence in target region.
beam_width Parameter for the beam search.
verbose Boolean: iff true, prints progress messages on STDOUT.
steps Put output here (see function description).
out1 Put output here (see function description).
out2 Put output here (see function description).
out_quality Put output here (see function description).

References BAR_delete(), BAR_new(), BAR_read(), BAR_write(), and feature_match().

Referenced by do_alignment().

void check_fvectors ( FMS  fms  ) 

Prints a message about the vector stack of the given FMS.

If it finds a non-zero-count, it prints a message to STDERR. If it doesn't, it prints a message to STDOUT with the count of feature vectors.

Parameters:
fms The FMS to check.

References vstack_t::fcount, feature_maps_t::n_features, vstack_t::next, and feature_maps_t::vstack.

FMS create_feature_maps ( char **  config,
int  config_lines,
Attribute w_attr1,
Attribute w_attr2,
Attribute s_attr1,
Attribute s_attr2 
)

Creates feature maps for a source/target corpus pair.

Example usage:

FMS = create_feature_maps(config_data, nr_of_config_lines, source_word, target_word, source_s, target_s);

Parameters:
config pointer to a list of strings representing the feature map configuration.
config_lines the number of configuration items stored in config_data.
w_attr1 The p-attribute in the first corpus to link.
w_attr2 The p-attribute in the second corpus to link.
s_attr1 The s-attribute in the first corpus to link.
s_attr2 The s-attribute in the second corpus to link.
Returns:
the new FMS object.

References feature_maps_t::att1, feature_maps_t::att2, char_map, char_map_range, cl_id2freq, cl_id2str, cl_id2strlen, cl_max_id, cl_str2id, feature_maps_t::fweight, init_char_map(), feature_maps_t::n_features, feature_maps_t::s1, feature_maps_t::s2, feature_maps_t::vstack, feature_maps_t::w2f1, feature_maps_t::w2f2, word1, and word2.

Referenced by main().

int feature_match ( feature_maps_t fms,
int  f1,
int  l1,
int  f2,
int  l2 
)

Sim = feature_match(FMS, source_first, source_last, target_first, target_last);.

Compute similarity measure for source and target regions, where *_first and *_last specify the index of the first and last sentence in a region.

Parameters:
fms The feature map
f1 Index of first sentence in source region.
l1 Index of last sentence in source region
f2 Index of first sentence in target region.
l2 Index of last sentence in target region.
Returns:
The similarity measure.

References feature_maps_t::att1, feature_maps_t::att2, feature_maps_t::fweight, get_bounds_of_nth_struc(), get_fvector(), get_id_at_position(), release_fvector(), feature_maps_t::s1, feature_maps_t::s2, feature_maps_t::w2f1, and feature_maps_t::w2f2.

Referenced by best_path(), and do_alignment().

int* get_fvector ( FMS  fms  ) 

Feature count vector handling (used internally by feature_match).

References vstack_t::fcount, feature_maps_t::n_features, vstack_t::next, and feature_maps_t::vstack.

Referenced by feature_match().

void init_char_map (  ) 

initialises char_mpa, qv

See also:
char_map

References char_map, and char_map_range.

Referenced by create_feature_maps().

void release_fvector ( int *  fvector,
FMS  fms 
)

Inserts a new vstack_t at the start of the vstack member of the given FMS.

{That's what it looks like it does, not sure how the function name fits with that... ???? - AH}

References feature_maps_t::vstack.

Referenced by feature_match().

void show_features ( FMS  fms,
int  which,
char *  word 
)

Prints the features in an FMS to STDOUT.

Usage: show_feature(FMS, 1/2, "word");

This will print all features listed in FMS for the token "word"; "word" is looked up in the source corpus if the 2nd argument == 1, and in the target corpus otherwise.

Parameters:
fms The FMS to print from.
which Which corpus to look up? (See function description)
word The token to look up.

References feature_maps_t::att1, feature_maps_t::att2, feature_maps_t::fweight, get_id_of_string(), feature_maps_t::w2f1, and feature_maps_t::w2f2.


Variable Documentation

unsigned char char_map[256]

a character map for accented characters.

Referenced by create_feature_maps(), and init_char_map().

int char_map_range = 0

the top of the range of char_map's outputs

See also:
char_map

Referenced by create_feature_maps(), and init_char_map().


Generated on Sun Feb 28 18:08:04 2010 for CWB by  doxygen 1.6.1