This page presents my pet project: fulltext database locus. I wanted
a fulltext which would be:
Personal but not lightweight
locus must run on hardware I can afford. It's decidedly non-distributed and
has no pretensions to replace Internet search engines like
Altavista. On the other hand,
I want to index all documents which fit on my disk, including CDs (i. e. with
texts of Project Gutenberg) and I'll
tolerate slower indexing and higher disk usage
(30-70 percent of source text size for indexes) than
Glimpse or
Swish as a tradeoff for
larger maximum database size and more focused search. locus was tested on
400MB in 1200 documents and can find uncommon words (i. e. the kind of words
you would normally use to search for something) under ten seconds.
Smart but not programmer-hostile
The ideal of fulltext search is clear: you just type in a few words and the
program finds what you meant to search for. The problem is, it doesn't
always work that way. So locus gives you the choice: if you just type in
a few words, it uses a relatively complicated search
algorithm trying to find the best match. When you're not satisfied,
you can see why it found what it found and tweak parameters to your heart's
content and beyond, using a simple query language. locus can search for
phrases - not just on one line with exactly matching spaces, like grep,
but for words near each other - as well as topics (get a word, find fifty
associations in your thesaurus and search for these). Simple
stemming
is also supported.
Universal back end for any front end
I don't like creating GUIs, and GUIs I do create tend to look awful even to
me (not to mention others). So I decided to concentrate my work on locus on
the back end. But of course, to use a program, one needs an interface...
You can specify queries on the command line and read results from standard
output (or redirect them to a file), and
if you want anything fancier, set up your own frontend.
grazer output is quite flexible - for example, you can
output html and query locus
databases through your browser.
Interested?
If you think you might use something like locus,
you have the Linux source.
Do let me know at
vbar@comp.cz
how you liked it. Now there's also a mailing
list for locus:
send (empty) mail message
to subscribe. If you have any questions, problems and/or
suggestions getting, installing, understanding, using and/or extending
locus, you may want to see FAQ before mailing me.
You can also take
a look at the
available options to see all the exciting
possibilities (well, all the exciting possibilities I cared to document -
but there's enough of them).
Your distribution contains just a (forever unfinished) core of locus.
The newest version is always (well, modulo connection problems) available
at locus homepage.
Some additional code and data files for special uses are
here, yet more are available upon request.
Last modified 02 Jan 99.