This class provides basic statistics for the indexed
collection of documents, such as the average length of documents,
or the total number of documents in the collection.
A static access method, to prevent having to instantiate a comparator
This has the same parameters, return and implementation as compare(Object,Object)
A static access method, to prevent having to instantiate a comparator
This has the same parameters, return and implementation as compare(Object,Object)
For the given collection, it iterates through the documents and
creates the direct index, document index and lexicon, using
information about blocks and possibly fields.
Builds the inverted file and lexicon file for the given collections
Loops through each document in each of the collections,
extracting terms and pushing these through the Term Pipeline
(eg stemming, stopping, lowercase).
Creates the lexicon index file that contains a mapping from the
given term id to the offset in the lexicon, in order to
be able to retrieve the term information according to the
term identifier.
Creates the lexicon index file that contains a mapping from the
given term id to the offset in the lexicon, in order to
be able to retrieve the term information according to the
term identifier.