public class CollectionStatistics extends Object implements Serializable, org.apache.hadoop.io.Writable
Modifier and Type | Field and Description |
---|---|
protected double |
averageDocumentLength
The average length of a document in the collection.
|
protected double[] |
avgFieldLengths
Average length of each field
|
protected long[] |
fieldTokens
number of tokens in each field
|
protected int |
numberOfDocuments
The total number of documents in the collection.
|
protected int |
numberOfFields
Number of fields used to index
|
protected long |
numberOfPointers
The total number of pointers in the inverted file.
|
protected long |
numberOfTokens
The total number of tokens in the collection.
|
protected int |
numberOfUniqueTerms
The total number of unique terms in the collection.
|
Constructor and Description |
---|
CollectionStatistics() |
CollectionStatistics(int numDocs,
int numTerms,
long numTokens,
long numPointers,
long[] _fieldTokens)
Constructs an instance of the class with
|
Modifier and Type | Method and Description |
---|---|
void |
addStatistics(CollectionStatistics cs)
Increment the statistics by the specified amount
|
double |
getAverageDocumentLength()
Returns the documents' average length.
|
double[] |
getAverageFieldLengths()
Returns the average length of each field in tokens
|
long[] |
getFieldTokens()
Returns the length of each field in tokens
|
int |
getNumberOfDocuments()
Returns the total number of documents in the collection.
|
int |
getNumberOfFields()
Returns the number of fields being used to index
|
long |
getNumberOfPointers()
Returns the total number of pointers in the collection.
|
long |
getNumberOfTokens()
Returns the total number of tokens in the collection.
|
int |
getNumberOfUniqueTerms()
Returns the total number of unique terms in the lexicon.
|
void |
readFields(DataInput in) |
protected void |
relcaluateAverageLengths() |
String |
toString()
Returns a concrete representation of an index's statistics
|
void |
write(DataOutput out) |
protected int numberOfFields
protected long[] fieldTokens
protected double[] avgFieldLengths
protected int numberOfDocuments
protected long numberOfTokens
protected long numberOfPointers
protected int numberOfUniqueTerms
protected double averageDocumentLength
public CollectionStatistics(int numDocs, int numTerms, long numTokens, long numPointers, long[] _fieldTokens)
numDocs
- numTerms
- numTokens
- numPointers
- _fieldTokens
- public CollectionStatistics()
protected void relcaluateAverageLengths()
public String toString()
public double getAverageDocumentLength()
public int getNumberOfDocuments()
public long getNumberOfPointers()
public long getNumberOfTokens()
public int getNumberOfUniqueTerms()
public int getNumberOfFields()
public long[] getFieldTokens()
public double[] getAverageFieldLengths()
public void addStatistics(CollectionStatistics cs)
public void readFields(DataInput in) throws IOException
readFields
in interface org.apache.hadoop.io.Writable
IOException
public void write(DataOutput out) throws IOException
write
in interface org.apache.hadoop.io.Writable
IOException
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow