Briefly combineINIT is used to initialize SQL database and the job specific configuration directory. combineCtrl controls a Combine crawling job (start, stop, etc) as well as printing some statistics. combineExport export records in various XML formats and combineUtil provides various utility operations on the Combine database.
Detailed dependency information can be found in the 'Gory details' section.
If a topic definition filename is given, focused crawling using this topic defintion is enabled per default. Otherwise focused crawling is disabled, and Combine works as a general crawler.
Implements various control functionality to administer a crawling job, like starting and stoping crawlers, injecting URLs into the crawl que, scheduling newly found links for crawling, controlling scheduling, etc.
This is the preferred way of controling a crawl job.
The alvis profile format is defined by the Alvis Enriched Document XML Schema.
For convinience a switch -xsltscript adds the possibility to filter the output using a XSLT script. The script is feed a record according to the combine profile and the result is exported.
Main, crawler specific, library components are collected in the Combine:: Perl namespace.
root 2006-12-07