#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
#include "feature_maps.h"
#include "barlib.h"
void best_path | ( | FMS | fms, | |
int | f1, | |||
int | l1, | |||
int | f2, | |||
int | l2, | |||
int | beam_width, | |||
int | verbose, | |||
int * | steps, | |||
int ** | out1, | |||
int ** | out2, | |||
int ** | out_quality | |||
) |
Finds the best alignment path for the given regions of sentences in source and target corpus.
This function does a beamed dynamic programming search for the best path aligning the sentence regions (f1,l1) in the source corpus and (f2,l2) in the target corpus.
Allowed alignments are 1:0 0:1 1:1 2:1 1:2.
The results are returned in the vectors out1 and out2, which each contain a number of valid entries (alignment points) equal to {steps}.
Alignment points are given as sentence numbers and correspond to the start points of the sentences. At the end-of-region alignment point, sentence numbers will be l1 + 1 and l2 + 1, which must be considered by the caller if l1 (or l2) is the last sentence in the corpus!
The similarity measures of aligned regions are returned in the vector out_quality.
Memory allocated for the return vectors (out1, out2, out_quality) is managed by best_path() and must not be freed by the caller. Calling best_path() overwrites the results of the previous search.
Example usage:
best_path(FMS, f1, l1, f2, l2, beam_width, 0/1, &steps, &out1, &out2, &out_quality);
fms | The FMS to use. | |
f1 | Index of first sentence in source region. | |
l1 | Index of last sentence in source region | |
f2 | Index of first sentence in target region. | |
l2 | Index of last sentence in target region. | |
beam_width | Parameter for the beam search. | |
verbose | Boolean: iff true, prints progress messages on STDOUT. | |
steps | Put output here (see function description). | |
out1 | Put output here (see function description). | |
out2 | Put output here (see function description). | |
out_quality | Put output here (see function description). |
References BAR_delete(), BAR_new(), BAR_read(), BAR_write(), and feature_match().
Referenced by do_alignment().
void check_fvectors | ( | FMS | fms | ) |
Prints a message about the vector stack of the given FMS.
If it finds a non-zero-count, it prints a message to STDERR. If it doesn't, it prints a message to STDOUT with the count of feature vectors.
fms | The FMS to check. |
References vstack_t::fcount, feature_maps_t::n_features, vstack_t::next, and feature_maps_t::vstack.
FMS create_feature_maps | ( | char ** | config, | |
int | config_lines, | |||
Attribute * | w_attr1, | |||
Attribute * | w_attr2, | |||
Attribute * | s_attr1, | |||
Attribute * | s_attr2 | |||
) |
Creates feature maps for a source/target corpus pair.
Example usage:
FMS = create_feature_maps(config_data, nr_of_config_lines, source_word, target_word, source_s, target_s);
config | pointer to a list of strings representing the feature map configuration. | |
config_lines | the number of configuration items stored in config_data. | |
w_attr1 | The p-attribute in the first corpus to link. | |
w_attr2 | The p-attribute in the second corpus to link. | |
s_attr1 | The s-attribute in the first corpus to link. | |
s_attr2 | The s-attribute in the second corpus to link. |
References feature_maps_t::att1, feature_maps_t::att2, char_map, char_map_range, cl_id2freq, cl_id2str, cl_id2strlen, cl_max_id, cl_str2id, feature_maps_t::fweight, init_char_map(), feature_maps_t::n_features, feature_maps_t::s1, feature_maps_t::s2, feature_maps_t::vstack, feature_maps_t::w2f1, feature_maps_t::w2f2, word1, and word2.
Referenced by main().
int feature_match | ( | feature_maps_t * | fms, | |
int | f1, | |||
int | l1, | |||
int | f2, | |||
int | l2 | |||
) |
Sim = feature_match(FMS, source_first, source_last, target_first, target_last);.
Compute similarity measure for source and target regions, where *_first and *_last specify the index of the first and last sentence in a region.
fms | The feature map | |
f1 | Index of first sentence in source region. | |
l1 | Index of last sentence in source region | |
f2 | Index of first sentence in target region. | |
l2 | Index of last sentence in target region. |
References feature_maps_t::att1, feature_maps_t::att2, feature_maps_t::fweight, get_bounds_of_nth_struc(), get_fvector(), get_id_at_position(), release_fvector(), feature_maps_t::s1, feature_maps_t::s2, feature_maps_t::w2f1, and feature_maps_t::w2f2.
Referenced by best_path(), and do_alignment().
int* get_fvector | ( | FMS | fms | ) |
Feature count vector handling (used internally by feature_match).
References vstack_t::fcount, feature_maps_t::n_features, vstack_t::next, and feature_maps_t::vstack.
Referenced by feature_match().
void init_char_map | ( | ) |
initialises char_mpa, qv
References char_map, and char_map_range.
Referenced by create_feature_maps().
void release_fvector | ( | int * | fvector, | |
FMS | fms | |||
) |
Inserts a new vstack_t at the start of the vstack member of the given FMS.
{That's what it looks like it does, not sure how the function name fits with that... ???? - AH}
References feature_maps_t::vstack.
Referenced by feature_match().
void show_features | ( | FMS | fms, | |
int | which, | |||
char * | word | |||
) |
Prints the features in an FMS to STDOUT.
Usage: show_feature(FMS, 1/2, "word");
This will print all features listed in FMS for the token "word"; "word" is looked up in the source corpus if the 2nd argument == 1, and in the target corpus otherwise.
fms | The FMS to print from. | |
which | Which corpus to look up? (See function description) | |
word | The token to look up. |
References feature_maps_t::att1, feature_maps_t::att2, feature_maps_t::fweight, get_id_of_string(), feature_maps_t::w2f1, and feature_maps_t::w2f2.
unsigned char char_map[256] |
a character map for accented characters.
Referenced by create_feature_maps(), and init_char_map().
int char_map_range = 0 |
the top of the range of char_map's outputs
Referenced by create_feature_maps(), and init_char_map().