|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.structures.CollectionStatistics
public class CollectionStatistics
This class provides basic statistics for the indexed
collection of documents, such as the average length of documents,
or the total number of documents in the collection.
After indexing, statistics are saved in the PREFIX.log file, along
with the classes that should be used for the Lexicon, the DocumentIndex,
the DirectIndex and the InvertedIndex. This means that an index knows
how it was build and how it should be opened again.
Field Summary | |
---|---|
protected double |
averageDocumentLength
The average length of a document in the collection. |
protected double[] |
avgFieldLengths
Average length of each field |
protected long[] |
fieldTokens
number of tokens in each field |
protected int |
numberOfDocuments
The total number of documents in the collection. |
protected int |
numberOfFields
Number of fields used to index |
protected long |
numberOfPointers
The total number of pointers in the inverted file. |
protected long |
numberOfTokens
The total number of tokens in the collection. |
protected int |
numberOfUniqueTerms
The total number of unique terms in the collection. |
Constructor Summary | |
---|---|
CollectionStatistics(int numDocs,
int numTerms,
long numTokens,
long numPointers,
long[] _fieldTokens)
Constructs an instance of the class with |
Method Summary | |
---|---|
void |
addStatistics(CollectionStatistics cs)
Increment the statistics by the specified amount |
double |
getAverageDocumentLength()
Returns the documents' average length. |
double[] |
getAverageFieldLengths()
Returns the average length of each field in tokens |
long[] |
getFieldTokens()
Returns the length of each field in tokens |
int |
getNumberOfDocuments()
Returns the total number of documents in the collection. |
int |
getNumberOfFields()
Returns the number of fields being used to index |
long |
getNumberOfPointers()
Returns the total number of pointers in the collection. |
long |
getNumberOfTokens()
Returns the total number of tokens in the collection. |
int |
getNumberOfUniqueTerms()
Returns the total number of unique terms in the lexicon. |
protected void |
relcaluateAverageLengths()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int numberOfFields
protected long[] fieldTokens
protected double[] avgFieldLengths
protected int numberOfDocuments
protected long numberOfTokens
protected long numberOfPointers
protected int numberOfUniqueTerms
protected double averageDocumentLength
Constructor Detail |
---|
public CollectionStatistics(int numDocs, int numTerms, long numTokens, long numPointers, long[] _fieldTokens)
numDocs
- numTerms
- numTokens
- numPointers
- _fieldTokens
- Method Detail |
---|
protected void relcaluateAverageLengths()
public double getAverageDocumentLength()
public int getNumberOfDocuments()
public long getNumberOfPointers()
public long getNumberOfTokens()
public int getNumberOfUniqueTerms()
public int getNumberOfFields()
public long[] getFieldTokens()
public double[] getAverageFieldLengths()
public void addStatistics(CollectionStatistics cs)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |