Terrier IR Platform
1.1.1

uk.ac.gla.terrier.structures
Class CollectionStatistics

java.lang.Object
  extended by uk.ac.gla.terrier.structures.CollectionStatistics

public class CollectionStatistics
extends java.lang.Object

This class provides basic statistics for the indexed collection of documents, such as the average length of documents, or the total number of documents in the collection.
After indexing, statistics are saved in the PREFIX.log file, along with the classes that should be used for the Lexicon, the DocumentIndex, the DirectIndex and the InvertedIndex. This means that an index knows how it was build and how it should be opened again.

Version:
$Revision: 1.26 $
Author:
Gianni Amati, Vassilis Plachouras, Craig Macdonald

Constructor Summary
CollectionStatistics()
           
CollectionStatistics(int numDocs, int numTerms, long numTokens, long numPointers)
           
CollectionStatistics(java.lang.String filename)
           
CollectionStatistics(java.lang.String Path, java.lang.String Prefix)
           
 
Method Summary
static void createCollectionStatistics(int docs, long tokens, int terms, long pointers, java.lang.String[] classes)
           
static void createCollectionStatistics(java.lang.String filename, int docs, long tokens, int terms, long pointers, java.lang.String[] classes)
          Given the collection statistics, it stores them in a file with a standard name.
static void createCollectionStatistics(java.lang.String Path, java.lang.String Prefix, int docs, long tokens, int terms, long pointers, java.lang.String[] classes)
           
 double getAverageDocumentLength()
          Returns the documents' average length.
 java.lang.String[] getClasses()
          Returns the classes line given in the log file.
 int getNumberOfDocuments()
          Returns the total number of documents in the collection.
 long getNumberOfPointers()
          Returns the total number of pointers in the collection.
 long getNumberOfTokens()
          Returns the total number of tokens in the collection.
 int getNumberOfUniqueTerms()
          Returns the total number of unique terms in the lexicon.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CollectionStatistics

public CollectionStatistics(int numDocs,
                            int numTerms,
                            long numTokens,
                            long numPointers)

CollectionStatistics

public CollectionStatistics()
                     throws java.io.IOException
Throws:
java.io.IOException

CollectionStatistics

public CollectionStatistics(java.lang.String Path,
                            java.lang.String Prefix)
                     throws java.io.IOException
Throws:
java.io.IOException

CollectionStatistics

public CollectionStatistics(java.lang.String filename)
                     throws java.io.IOException
Throws:
java.io.IOException
Method Detail

createCollectionStatistics

public static void createCollectionStatistics(java.lang.String Path,
                                              java.lang.String Prefix,
                                              int docs,
                                              long tokens,
                                              int terms,
                                              long pointers,
                                              java.lang.String[] classes)

createCollectionStatistics

public static void createCollectionStatistics(int docs,
                                              long tokens,
                                              int terms,
                                              long pointers,
                                              java.lang.String[] classes)

createCollectionStatistics

public static void createCollectionStatistics(java.lang.String filename,
                                              int docs,
                                              long tokens,
                                              int terms,
                                              long pointers,
                                              java.lang.String[] classes)
Given the collection statistics, it stores them in a file with a standard name.

Parameters:
docs - The number of documents in the collection
tokens - The number of tokens in the collection
terms - The number of terms in the collection
pointers - The number of pointers in the collection

getAverageDocumentLength

public double getAverageDocumentLength()
Returns the documents' average length.

Returns:
the average length of the documents in the collection.

getNumberOfDocuments

public int getNumberOfDocuments()
Returns the total number of documents in the collection.

Returns:
the total number of documents in the collection

getNumberOfPointers

public long getNumberOfPointers()
Returns the total number of pointers in the collection.

Returns:
the total number of pointers in the collection

getNumberOfTokens

public long getNumberOfTokens()
Returns the total number of tokens in the collection.

Returns:
the total number of tokens in the collection

getNumberOfUniqueTerms

public int getNumberOfUniqueTerms()
Returns the total number of unique terms in the lexicon.

Returns:
the total number of unique terms in the lexicon

getClasses

public java.lang.String[] getClasses()
Returns the classes line given in the log file. Used by the Index to determine which classes it should load for this Index.


Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow