Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures
Class DocumentIndexEncoded

java.lang.Object
  extended by uk.ac.gla.terrier.structures.DocumentIndex
      extended by uk.ac.gla.terrier.structures.DocumentIndexEncoded
All Implemented Interfaces:
Closeable, IndexConfigurable
Direct Known Subclasses:
DocIndexEncodedHash

public class DocumentIndexEncoded
extends DocumentIndex

A document index class which reads the .docid file and keeps its contents in a array of bytes in memory. This class reduces the memory overhead introduced when we use the class DocumentIndexInMemory, by decoding the information on the fly.

Version:
$Revision: 1.33 $
Author:
Vassilis Plachouras, Craig Macdonald

Field Summary
 
Fields inherited from class uk.ac.gla.terrier.structures.DocumentIndex
entryLength
 
Constructor Summary
DocumentIndexEncoded()
           
DocumentIndexEncoded(java.lang.String filename)
          A constructor for DocumentIndexInMemory that specifies the file to open.
DocumentIndexEncoded(java.lang.String path, java.lang.String prefix)
          The default constructor for DocumentIndexInMemory.
 
Method Summary
 FilePosition getDirectIndexEndOffset()
          Returns the ending offset of the current document's entry in the direct index.
 FilePosition getDirectIndexStartOffset()
          Returns the starting offset of the current document's entry in the direct index.
 int getDocumentId(java.lang.String docno)
          Returns the id of a document with a given document number.
 int getDocumentLength(int docid)
          Returns the length of a document with a given id.
 int getDocumentLength(java.lang.String docno)
          Returns the document length of the document with a given document number .
 java.lang.String getDocumentNumber(int docid)
          Returns the number of a document with a given id.
 int getNumberOfDocuments()
          Returns the number of documents in the document index.
 void loadIntoMemory(java.io.DataInputStream dis, int numOfEntries)
          Loads the data from the file into memory.
static void main(java.lang.String[] args)
          A main method for testing the DocumentIndexEncoded class.
 void print()
          Prints to the standard error the document index structure, which is loaded into memory.
 boolean seek(int i)
          Overrides the seek(int docid) method of the DocumentIndex class.
 boolean seek(java.lang.String docno)
          Overrides the seek(String s) method of the super class.
 void setDocnoEntryLength(int l)
          Set the length of docnos in the index file
 
Methods inherited from class uk.ac.gla.terrier.structures.DocumentIndex
close, setIndex
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocumentIndexEncoded

public DocumentIndexEncoded(java.lang.String path,
                            java.lang.String prefix)
The default constructor for DocumentIndexInMemory. Opens the document index file and reads its contents into memory.


DocumentIndexEncoded

public DocumentIndexEncoded()

DocumentIndexEncoded

public DocumentIndexEncoded(java.lang.String filename)
A constructor for DocumentIndexInMemory that specifies the file to open. Opens the document index file and reads its contents into memory. For the document pointers file we replace the extension of the document index file with the right default extension.

Parameters:
filename - String The filename of the document index file.
Method Detail

setDocnoEntryLength

public void setDocnoEntryLength(int l)
Set the length of docnos in the index file

Overrides:
setDocnoEntryLength in class DocumentIndex

print

public void print()
Prints to the standard error the document index structure, which is loaded into memory.

Overrides:
print in class DocumentIndex

getDocumentId

public int getDocumentId(java.lang.String docno)
Returns the id of a document with a given document number.

Overrides:
getDocumentId in class DocumentIndex
Parameters:
docno - java.lang.String The document's number
Returns:
int The document's id, or a negative number if a document with the given number doesn't exist.

getDocumentLength

public int getDocumentLength(int docid)
Returns the length of a document with a given id.

Overrides:
getDocumentLength in class DocumentIndex
Parameters:
docid - the document's id
Returns:
int The document's length

getDocumentLength

public int getDocumentLength(java.lang.String docno)
Returns the document length of the document with a given document number .

Overrides:
getDocumentLength in class DocumentIndex
Parameters:
docno - java.lang.String The document's number
Returns:
int The document's length

getDocumentNumber

public java.lang.String getDocumentNumber(int docid)
Returns the number of a document with a given id.

Overrides:
getDocumentNumber in class DocumentIndex
Parameters:
docid - int The documents id
Returns:
java.lang.String The documents number

getDirectIndexEndOffset

public FilePosition getDirectIndexEndOffset()
Returns the ending offset of the current document's entry in the direct index.

Overrides:
getDirectIndexEndOffset in class DocumentIndex
Returns:
FilePosition an offset in the direct index.

getNumberOfDocuments

public int getNumberOfDocuments()
Returns the number of documents in the document index.

Overrides:
getNumberOfDocuments in class DocumentIndex
Returns:
int the number of documents in the document index.

getDirectIndexStartOffset

public FilePosition getDirectIndexStartOffset()
Returns the starting offset of the current document's entry in the direct index.

Overrides:
getDirectIndexStartOffset in class DocumentIndex
Returns:
FilePosition an offset in the direct index.

loadIntoMemory

public void loadIntoMemory(java.io.DataInputStream dis,
                           int numOfEntries)
                    throws java.io.IOException
Loads the data from the file into memory.

Parameters:
dis - java.io.DataInputStream The input stream from which the data are read
numOfEntries - int The number of entries to read
Throws:
java.io.IOException - An input/output exception is thrown if there any error while reading from disk.

seek

public boolean seek(int i)
Overrides the seek(int docid) method of the DocumentIndex class.

Overrides:
seek in class DocumentIndex
Parameters:
i - the docid of the document we are looking for.
Returns:
boolean true if it was found, otherwise it returns false.

seek

public boolean seek(java.lang.String docno)
Overrides the seek(String s) method of the super class.

Overrides:
seek in class DocumentIndex
Parameters:
docno - String the document number of the document we are seeking.
Returns:
Returns false if the given docno could not be found in the DocumentIndex

main

public static void main(java.lang.String[] args)
A main method for testing the DocumentIndexEncoded class.
The first command line argument corresponds to the filename of the document index file. This is followed by one of the options specified below: For example, we can write:
java -cp ... uk.ac.gla.terrier.structures.DocumentIndexEncoded filename --docno 1023
This will return the document number of the document with id 1023.

Parameters:
args - java.lang.String[] the command line parameters

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow