Which topic classification PlugIn module algorithm to use Combine::Check_record and Combine::PosCheck_record included by default NEW SVM classifier: Combine::classifySVM see classifyPlugInTemplate.pm and documentation to write your own
use either URL or HOST: (obs ’:’) to match regular expressions to either the full URL or the HOST part of a URL. Allow crawl of URLs or hostnames that matches these regular expressions
Used by:
selurl.pm.svn-base; selurl.pm
9.2.2 binext
Description:
Extensions of binary files
Used by:
UA.pm; UA.pm.svn-base
9.2.3 converters
Description:
Configure which converters can be used to produce a XWI object Format: 1 line per entry each entry consists of 3 ’;’ separated fields Entries are processed in order and the first match is executed external converters have to be found via PATH and executable to be considered a
match the external converter command should take a filename as parameter and convert
that file the result should be comming on STDOUT mime-type ; External converter command ; Internal converter
Used by:
UA.pm; combine; UA.pm.svn-base; combine.svn-base
9.2.4 exclude
Description:
Exclude URLs or hostnames that matches these regular expressions default: CGI and maps default: binary files default: Unparsable documents default: images default: other binary formats more excludes in the file config_exclude (automatically updated by other programs)
Used by:
selurl.pm.svn-base; selurl.pm
9.2.5 serveralias
Description:
List of servernames that are aliases are in the file ./config_serveralias (automatically updated by other programs) use one server per line example www.100topwetland.com www.100wetland.com means that www.100wetland.com is replaced by www.100topwetland.com during
URL normalization
9.2.6 sessionids
Description:
patterns to recognize and remove sessionids in URLs
9.2.7 url
Description:
url is just a conatiner for all URL related configuration patterns