Terrier IR Platform
1.1.1

uk.ac.gla.terrier.structures
Class DocumentIndex

java.lang.Object
  extended by uk.ac.gla.terrier.structures.DocumentIndex
Direct Known Subclasses:
DocumentIndexEncoded, DocumentIndexInMemory

public class DocumentIndex
extends java.lang.Object

This class provides an interface for accessing the document index file. Each entry in the document index consists of a document id, the document number, and the length of the document, that is the number of terms that make up the document.

Version:
$Revision: 1.39 $
Author:
Gianni Amati, Vassilis Plachouras

Field Summary
static int entryLength
          The length in bytes of an entry in the file.
 
Constructor Summary
DocumentIndex()
          A default constructor for the class.
DocumentIndex(java.lang.String filename)
          A constructor of a document index from a given filename.
DocumentIndex(java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 void close()
          Closes the random access file.
 FilePosition getDirectIndexEndOffset()
          Returns the ending offset of the document's entry in the direct index.
 FilePosition getDirectIndexStartOffset()
          Returns the starting offset of the document's entry in the direct index.
 int getDocumentId(java.lang.String docno)
          Returns the document's id for the given docno.
 int getDocumentLength(int i)
          Reading the length for the i-th document.
 int getDocumentLength(java.lang.String docno)
          Return the length of the document with the given docno.
 java.lang.String getDocumentNumber(int i)
          Reading the docno for the i-th document.
 int getNumberOfDocuments()
          Returns the number of documents in the collection.
static void main(java.lang.String[] args)
           
 void print()
          Prints out to the standard error stream the contents of the document index file.
 boolean seek(int i)
          Seeks from the document index the i-th entry.
 boolean seek(java.lang.String docno)
          Seeks the document index entry for the given document number.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

entryLength

public static final int entryLength
The length in bytes of an entry in the file. It is 2*sizeof(int) plus the length of the stored document number plus sizeof(long) plus sizeof(byte).

Constructor Detail

DocumentIndex

public DocumentIndex()
A default constructor for the class.


DocumentIndex

public DocumentIndex(java.lang.String filename)
A constructor of a document index from a given filename.
For the document pointers file we replace the extension of the document index file with the right default extension.
The given name should have an extension.

Parameters:
filename - String the filename of the document index, with an extension

DocumentIndex

public DocumentIndex(java.lang.String path,
                     java.lang.String prefix)
Method Detail

close

public void close()
Closes the random access file.


print

public void print()
Prints out to the standard error stream the contents of the document index file.


getDocumentId

public int getDocumentId(java.lang.String docno)
Returns the document's id for the given docno.

Parameters:
docno - java.lang.String The document's number
Returns:
int The document's id, or -1 if docno was not found in the index.

getDocumentLength

public int getDocumentLength(int i)
Reading the length for the i-th document.

Parameters:
i - the index of the document.
Returns:
the length of the i-th document, or -1 if the docid i wasn't found in the index.

getDocumentLength

public int getDocumentLength(java.lang.String docno)
Return the length of the document with the given docno. Creation date: (29/05/2003 10:56:49)

Parameters:
docno - java.lang.String The document's number
Returns:
int The document's length, or -1 if the docno wasn't found in the index.

getDocumentNumber

public java.lang.String getDocumentNumber(int i)
Reading the docno for the i-th document.

Parameters:
i - the index of the document.
Returns:
the document number of the i-th document, or null if the docid i wasn't found in the index.

getDirectIndexEndOffset

public FilePosition getDirectIndexEndOffset()
Returns the ending offset of the document's entry in the direct index.

Returns:
FilePosition an offset in the direct index.

getNumberOfDocuments

public int getNumberOfDocuments()
Returns the number of documents in the collection.

Returns:
the number of documents in the collection.

getDirectIndexStartOffset

public FilePosition getDirectIndexStartOffset()
Returns the starting offset of the document's entry in the direct index.

Returns:
FilePosition an offset in the direct index.

seek

public boolean seek(int i)
             throws java.io.IOException
Seeks from the document index the i-th entry.

Parameters:
i - the document id.
Returns:
boolean true if it was found, otherwise it returns false.
Throws:
java.io.IOException

seek

public boolean seek(java.lang.String docno)
             throws java.io.IOException
Seeks the document index entry for the given document number.

Parameters:
docno - java.lang.String the document's number
Returns:
true if the document was found, false otherwise.
Throws:
java.io.IOException - an input/output exception when it can not read from the file

main

public static void main(java.lang.String[] args)

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow