CAcInvertedFile Class Reference

[CADIHash.h Index] [CADIHash.h Hierarchy]


An accessor to an inverted file More...

#include <../libInvertedFile/include/CAcSQLInvertedFile.h>

Inherits: CAcURL2FTS

Public Members

Protected Members


Detailed Description

An accessor to an inverted file. This access is done "by hand" at present this not really efficient, however we plan to move to memory mapped files.


bool operator()()

for testing if the inverted file is correctly constructed

CAcInvertedFile(const CXMLElement& inCollectionElement)

This opens an exsisting inverted file, and then inits this structure. After that it is fully usable

As a paramter it takes an XMLElement which contains a "collection" element and its content.

If the attribute vi-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.

The REAL constructor.

bool init(bool)

called by constructors

~CAcInvertedFile()

Destructor

string IDToURL(TID inID)

Translate a DocumentID to a URL (for output)

TID URLToID(const string& inURL)

Translate an URL to its document ID

CDocumentFrequencyList* FeatureToList(TFeatureID)

List of documents containing the feature

CDocumentFrequencyList* URLToFeatureList(string inURL)

List of features contained by a document

CDocumentFrequencyList* DIDToFeatureList(TID inDID)

List of features contained by a document with ID inDID

double FeatureToCollectionFrequency(TFeatureID)

Collection frequency for a given feature

unsigned int getFeatureDescription(TID inFeatureID)

What kind of feature is the feature with ID inFeatureID?

double DIDToMaxDocumentFrequency(TID)

returns the maximum document frequency for one document ID

double DIDToDFSquareSum(TID)

Returns the document-frequency square sum for a given document ID

double DIDToSquareDFLogICFSum(TID)

Returns this function for a given document ID

bool generateInvertedFile()

Generating an inverted File, if there is none.Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.

bool newGenerateInvertedFile()

Generating an inverted File, if there is none. Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)

bool checkConsistency()

Check the consistency of the inverted file system accessed

bool findWithinStream(TID inFeatureID, TID inDocumentID, double inDocumentFrequency)

Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and

TID getMaximumFeatureID()

This is interesting for browsing

list<TID>* getAllFeatureIDs()

Getting a list of all features contained in this.This function is necessary, because in the present system only about 50 percent of the features are really used.

A feature is considered used if it arises in mIDToOffset.

TID mMaximumFeatureID

the maximum feature ID arising in this file

mutable ifstream mOffsetFile

Feature -> Offset in inverted file

ifstream mFeatureDescriptionFile

File of feature descriptions

string mInvertedFileName

Name of the inverted file

string mOffsetFileName

Name of the Offset file

string mFeatureDescriptionFileName

Name for the file with the feature description

CIDToOffset mIDToOffset

map from feature id to the offset for this feature

CADIHash mDocumentInformation

additional information about the document like, e.g.the euclidean length of the feature list.

void writeOffsetFileElement(TID inFeatureID, int inPosition, ostream& inOpenOffsetFile)

add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction)

CDocumentFrequencyList* getFeatureFile(string inFileName)

loads a *.fts file. and returns the feature list


Documentation generated by muellerw@pc7170 on Son Okt 8 16:04:40 CEST 2000
Kdoc