The words you're looking up. This simple criterion is perfectly adequate, as long as there is relatively few documents satisfying it - that is, if you know some rare words to look up (e. g. the name of the product you want). Such simple queries can be specified as command-line parameters for grazer.
If you don't know any rare words, you must search for a pattern of common ones. That can be as simple as searching for a whole name (query for "Buffalo Bill" gets a lot more descriptive when you specify that both words must be present and the first must immediately precede the second) or as complicated as extracting characteristic words and their patterns from known documents of the kind you're interested in and searching for these (truth be told, the performance of these complicated approaches is not what I would like it to be, but I'm working on it :-) ). Conformance to patterns is quantified by a hardcoded set of metrics and customized by assigning relative weights to these metrics (i. e. when searching for "Buffalo Bill", you would assign positive weights to existence of all words in a document, their order and locality and zero to everything else, like this). Weights should be numbers from the interval <0, 1> whose sum is 1. Set of weights (with one additional parameter) defines soft operator, which maps list of words to list of documents containing them, sorted by their relevance. Default soft operator is unnamed, defined in locus.opt (if it's not, grazer uses defaults as its attributes) and it's the one used on queries from command line. You can define additional, named soft operators in user-defined objects file.
Many documents stored in databases have general structure - for example e-mail messages contain author's name, subject, date etc. If you describe this structure before indexing them, you can restrict your queries to document parts (i. e. search for messages whose subject contains "engineering").
Of course, you may want to search for more than one pattern in one query (for example not only for "Buffalo Bill", but for "William Cody" as well). You can do that by composing soft operators in the query file with relational operators - the standard & and |. Composition of relevance follows rules of fuzzy arithmetics.
grazer gets all the documents (in a specified interval) containing at least one word from the query. Found documents are sorted by their relevance and those with relevance lower than a threshold (value of relevance_threshold parameter of the operator) are dropped. Document's relevance is computed as follows:
Then, from these, normalized metrics (numbers in the interval <0, 1>, the bigger the better correlation):
Buffalo Bill | William Codyis legal and equivalent to
(Buffalo Bill) | (William Cody)Named operators must have parameters in parentheses - for example
name(Buffalo Bill)Named operators can be restricted to some document parts by listing these parts in square brackets after operator's name - for example
name[from](Linus)Word "plain" refers to default part - for example
name[title plain](ISO)Operator parameters containing characters other than letters must be in double quotes - for example
name("O'Brien")Quoted operator parameter containing whitespace characters is broken into multiple ones - for example
"o la la"is equivalent to
o la
Note that only relational operators can be applied recursively - soft operators can't.