public class CollectionStatistics extends Object implements Serializable, org.apache.hadoop.io.Writable
Modifier and Type | Field and Description |
---|---|
protected double |
averageDocumentLength
The average length of a document in the collection.
|
protected double[] |
avgFieldLengths
Average length of each field
|
protected long[] |
fieldTokens
number of tokens in each field
|
protected int |
numberOfDocuments
The total number of documents in the collection.
|
protected int |
numberOfFields
Number of fields used to index
|
protected long |
numberOfPointers
The total number of pointers in the inverted file.
|
protected long |
numberOfTokens
The total number of tokens in the collection.
|
protected int |
numberOfUniqueTerms
The total number of unique terms in the collection.
|
Constructor and Description |
---|
CollectionStatistics() |
CollectionStatistics(int numDocs,
int numTerms,
long numTokens,
long numPointers,
long[] _fieldTokens)
Constructs an instance of the class with
|
Modifier and Type | Method and Description |
---|---|
void |
addStatistics(CollectionStatistics cs)
Increment the statistics by the specified amount
|
double |
getAverageDocumentLength()
Returns the documents' average length.
|
double[] |
getAverageFieldLengths()
Returns the average length of each field in tokens
|
long[] |
getFieldTokens()
Returns the length of each field in tokens
|
int |
getNumberOfDocuments()
Returns the total number of documents in the collection.
|
int |
getNumberOfFields()
Returns the number of fields being used to index
|
long |
getNumberOfPointers()
Returns the total number of pointers in the collection.
|
long |
getNumberOfTokens()
Returns the total number of tokens in the collection.
|
int |
getNumberOfUniqueTerms()
Returns the total number of unique terms in the lexicon.
|
void |
readFields(DataInput in) |
protected void |
relcaluateAverageLengths() |
String |
toString()
Returns a concrete representation of an index's statistics
|
void |
write(DataOutput out) |
protected int numberOfFields
protected long[] fieldTokens
protected double[] avgFieldLengths
protected int numberOfDocuments
protected long numberOfTokens
protected long numberOfPointers
protected int numberOfUniqueTerms
protected double averageDocumentLength
public CollectionStatistics(int numDocs, int numTerms, long numTokens, long numPointers, long[] _fieldTokens)
numDocs
- numTerms
- numTokens
- numPointers
- _fieldTokens
- public CollectionStatistics()
protected void relcaluateAverageLengths()
public String toString()
public double getAverageDocumentLength()
public int getNumberOfDocuments()
public long getNumberOfPointers()
public long getNumberOfTokens()
public int getNumberOfUniqueTerms()
public int getNumberOfFields()
public long[] getFieldTokens()
public double[] getAverageFieldLengths()
public void addStatistics(CollectionStatistics cs)
public void readFields(DataInput in) throws IOException
readFields
in interface org.apache.hadoop.io.Writable
IOException
public void write(DataOutput out) throws IOException
write
in interface org.apache.hadoop.io.Writable
IOException
Terrier 4.0. Copyright © 2004-2014 University of Glasgow