SAP NetWeaver '04

com.sapportals.wcm.service.xcrawler
Interface IXCrawlerParameters

[contained in: com.sap.km.cm.service.base.par - km.shared.service.xcrawler_api.jar]
public interface IXCrawlerParameters

Parameters determining the behaviour of a crawl.

Copyright (c) SAP AG 2003


Inner Class Summary
static class IXCrawlerParameters.LogLevel
          Log levels for crawler log files
static class IXCrawlerParameters.ModificationCheckMode
          Modes for checking whether a resource was modified
 
Method Summary
 boolean getCrawlHidden()
          Check, whether hidden resources are included in the crawl.
 boolean getCrawlSystem()
          Check, whether system resources are included in the crawl.
 boolean getCrawlVersions()
          Check, whether versions of resources are included in the crawl.
 java.lang.String getDescription()
          Get the description of the parameter set.
 long getDocumentTimeoutInSeconds()
          Get the document timeout in seconds.
 int getErrorCacheCapacity()
          Get the capacity of the cache for the error-set.
 IPropertyName getExcludedHrefPropertyName()
          Get the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules.
 boolean getFindAllDocsInDepth()
          Check, whether resources are found on the shorted possible path (there may be multiple paths in a web-repository).
 int getFinishedCacheCapacity()
          Get the capacity of the cache for the finished-set.
 boolean getFollowLinks()
          Check, whether links are followed.
 int getFoundCacheCapacity()
          Get the capacity of the cache for the found-set.
 IPropertyName getHrefPropertyName()
          Get the name of the property which holds the HREFs of a resource from a web-repository.
 java.lang.String getLogFilePath()
          Get the path to the crawler log file.
 int getMaxBacklogFiles()
          Get the maximum number of old crawler log files.
 int getMaxDepth()
          Get the maximum depth of the crawl process (0 is unlimited).
 long getMaxLogFileSizeInBytes()
          Get the maximum size of the crawler log file in bytes.
 IXCrawlerParameters.LogLevel getMaxLogLevel()
          Get the maximum log level.
 IXCrawlerParameters.ModificationCheckMode getModificationCheckMode()
          Get the mode for checking whether a resource was modified.
 int getOldCacheCapacity()
          Get the capacity of the cache for the old-set.
 int getPostprocessedCacheCapacity()
          Get the capacity of the cache for the postprocessed-set.
 int getPostprocessingCacheCapacity()
          Get the capacity of the cache for the postprocessing-set.
 int getProviderCount()
          Get the number of provider threads.
 int getProvidingCacheCapacity()
          Get the capacity of the cache for the providing-set.
 long getRequestDelayInMilliseconds()
          Get the number of milliseconds every crawler thread waits after retrieving a resource from a repository to reduce the load on the underlying persistency (e.g. database) or channel (e.g. network).
 boolean getRespectRobots()
          Check, whether the robot-rules of web-servers are respected.
 com.sapportals.wcm.service.resourcefilter.IResourceFilter[] getResultFilters()
          Get the resource filters which are applied to the result of the crawl but do not narrow the scope.
 int getRetrieverCount()
          Get the number of retriever threads.
 int getRetrievingCacheCapacity()
          Get the capacity of the cache for the retrieving-set.
 com.sapportals.wcm.service.resourcefilter.IResourceFilter[] getScopeFilters()
          Get the resource filters which narrow the scope of the crawl.
 long getSleepDistanceInMilliseconds()
          Get the number of milliseconds between two sleep-periods of a crawler-thread.
 long getSleepDurationInMilliseconds()
          Get the duration of a sleep-period of a crawler-thread in milliseconds.
 boolean getTest()
          Check, whether the crawler runs in test-mode (no passing of results to the result receivers).
 int getTodoCacheCapacity()
          Get the capacity of the cache for the todo-set.
 boolean getUseChecksum()
          Check, whether a checksum is used to determine whether a resource has changed.
 boolean getUseETag()
          Check, whether the ETag is used to determine whether a resource has changed.
 

Method Detail

getDescription

public java.lang.String getDescription()
Get the description of the parameter set.
Returns:
the description of the parameter set

getMaxDepth

public int getMaxDepth()
Get the maximum depth of the crawl process (0 is unlimited).
Returns:
the maximum depth of the crawl process

getRetrieverCount

public int getRetrieverCount()
Get the number of retriever threads.
Returns:
the number of retriever threads

getProviderCount

public int getProviderCount()
Get the number of provider threads.
Returns:
the number of provider threads

getUseChecksum

public boolean getUseChecksum()
Check, whether a checksum is used to determine whether a resource has changed.
Returns:
true iff a checksum is used to determine whether a resource has changed

getUseETag

public boolean getUseETag()
Check, whether the ETag is used to determine whether a resource has changed.
Returns:
true iff the ETag is used to determine whether a resource has changed

getFollowLinks

public boolean getFollowLinks()
Check, whether links are followed.
Returns:
true iff links are followed

getCrawlVersions

public boolean getCrawlVersions()
Check, whether versions of resources are included in the crawl.
Returns:
true iff versions of resources are included in the crawl

getCrawlHidden

public boolean getCrawlHidden()
Check, whether hidden resources are included in the crawl.
Returns:
true iff hidden resources are included in the crawl

getCrawlSystem

public boolean getCrawlSystem()
Check, whether system resources are included in the crawl.
Returns:
true iff system resources are included in the crawl

getModificationCheckMode

public IXCrawlerParameters.ModificationCheckMode getModificationCheckMode()
Get the mode for checking whether a resource was modified.
Returns:
the mode for checking whether a resource was modified

getRequestDelayInMilliseconds

public long getRequestDelayInMilliseconds()
Get the number of milliseconds every crawler thread waits after retrieving a resource from a repository to reduce the load on the underlying persistency (e.g. database) or channel (e.g. network).
Returns:
the number of milliseconds every crawler thread waits after retrieving a resource from a repository

getFindAllDocsInDepth

public boolean getFindAllDocsInDepth()
Check, whether resources are found on the shorted possible path (there may be multiple paths in a web-repository).
Returns:
true iff resources are found on the shorted possible path

getRespectRobots

public boolean getRespectRobots()
Check, whether the robot-rules of web-servers are respected.
Returns:
true iff the robot-rules of web-servers are respected.

getTest

public boolean getTest()
Check, whether the crawler runs in test-mode (no passing of results to the result receivers).
Returns:
true iff whether the crawler runs in test-mode

getScopeFilters

public com.sapportals.wcm.service.resourcefilter.IResourceFilter[] getScopeFilters()
Get the resource filters which narrow the scope of the crawl.
Returns:
the resource filters which narrow the scope of the crawl

getResultFilters

public com.sapportals.wcm.service.resourcefilter.IResourceFilter[] getResultFilters()
Get the resource filters which are applied to the result of the crawl but do not narrow the scope.
Returns:
the resource filters which are applied to the result of the crawl but do not narrow the scope

getHrefPropertyName

public IPropertyName getHrefPropertyName()
Get the name of the property which holds the HREFs of a resource from a web-repository.
Returns:
the name of the property which holds the HREFs of a resource from a web-repository

getExcludedHrefPropertyName

public IPropertyName getExcludedHrefPropertyName()
Get the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules.
Returns:
the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules

getTodoCacheCapacity

public int getTodoCacheCapacity()
Get the capacity of the cache for the todo-set.
Returns:
the capacity of the cache for the todo-set

getRetrievingCacheCapacity

public int getRetrievingCacheCapacity()
Get the capacity of the cache for the retrieving-set.
Returns:
the capacity of the cache for the retrieving-set

getFoundCacheCapacity

public int getFoundCacheCapacity()
Get the capacity of the cache for the found-set.
Returns:
the capacity of the cache for the found-set

getProvidingCacheCapacity

public int getProvidingCacheCapacity()
Get the capacity of the cache for the providing-set.
Returns:
the capacity of the cache for the providing-set

getFinishedCacheCapacity

public int getFinishedCacheCapacity()
Get the capacity of the cache for the finished-set.
Returns:
the capacity of the cache for the finished-set

getOldCacheCapacity

public int getOldCacheCapacity()
Get the capacity of the cache for the old-set.
Returns:
the capacity of the cache for the old-set

getPostprocessingCacheCapacity

public int getPostprocessingCacheCapacity()
Get the capacity of the cache for the postprocessing-set.
Returns:
the capacity of the cache for the postprocessing-set

getPostprocessedCacheCapacity

public int getPostprocessedCacheCapacity()
Get the capacity of the cache for the postprocessed-set.
Returns:
the capacity of the cache for the postprocessed-set

getErrorCacheCapacity

public int getErrorCacheCapacity()
Get the capacity of the cache for the error-set.
Returns:
the capacity of the cache for the error-set

getSleepDistanceInMilliseconds

public long getSleepDistanceInMilliseconds()
Get the number of milliseconds between two sleep-periods of a crawler-thread.
Returns:
the number of milliseconds between two sleep-periods of a crawler-thread

getSleepDurationInMilliseconds

public long getSleepDurationInMilliseconds()
Get the duration of a sleep-period of a crawler-thread in milliseconds.
Returns:
the duration of a sleep-period of a crawler-thread in milliseconds

getMaxLogFileSizeInBytes

public long getMaxLogFileSizeInBytes()
Get the maximum size of the crawler log file in bytes.
Returns:
the maximum size of the crawler log file in bytes

getMaxBacklogFiles

public int getMaxBacklogFiles()
Get the maximum number of old crawler log files.
Returns:
the maximum number of old crawler log files

getLogFilePath

public java.lang.String getLogFilePath()
Get the path to the crawler log file.
Returns:
the path to the crawler log file (may return null)

getMaxLogLevel

public IXCrawlerParameters.LogLevel getMaxLogLevel()
Get the maximum log level.
Returns:
the maximum log level

getDocumentTimeoutInSeconds

public long getDocumentTimeoutInSeconds()
Get the document timeout in seconds.
Returns:
the document timeout in seconds

SAP NetWeaver '04

Copyright © 2004 by SAP AG. All Rights Reserved.
SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.

These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.