Terrier IR Platform
1.1.1

uk.ac.gla.terrier.structures
Class InvertedIndex

java.lang.Object
  extended by uk.ac.gla.terrier.structures.InvertedIndex
Direct Known Subclasses:
BlockInvertedIndex

public class InvertedIndex
extends java.lang.Object

This class implements the inverted index for performing retrieval, with field information optionally.

Version:
$Revision: 1.34 $
Author:
Douglas Johnson, Vassilis Plachouras, Craig Macdonald

Field Summary
static double FIELD_LOAD_FACTOR
          This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments() - retrieval with Fields.
static double NORMAL_LOAD_FACTOR
          This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments().
 
Constructor Summary
InvertedIndex(Lexicon lexicon)
          Creates an instance of the HtmlInvertedIndex class using the lexicon.
InvertedIndex(Lexicon lexicon, java.lang.String filename)
          Creates an instance of the HtmlInvertedIndex class using the given lexicon.
InvertedIndex(Lexicon lexicon, java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 void close()
          Closes the underlying bit file.
 BitFile getBitFile()
          Returns the underlying bit file, in order to make more efficient use of the bit file during assigning scores to the retrieved documents.
 int[][] getDocuments(int termid)
          Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.
 int[][] getDocuments(int termid, int startDocid, int endDocid)
          Returns a five dimensional array containing the document ids, the term frequencies, the field scores the block frequencies and the block ids for the given documents.
 int[][] getDocuments(LexiconEntry lEntry)
           
 int[][] getDocuments(long sOffset, byte sBitOffset, long eOffset, byte eBitOffset)
          Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.
 void print()
          Prints out the inverted index file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NORMAL_LOAD_FACTOR

public static final double NORMAL_LOAD_FACTOR
This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments(). The higher this value, the less chance that the arraylist will have to be grown (growing is expensive), however more memory may be used unnecessarily.

See Also:
Constant Field Values

FIELD_LOAD_FACTOR

public static final double FIELD_LOAD_FACTOR
This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments() - retrieval with Fields. The higher this value, the less chance that the arraylist will have to be grown (growing is expensive), however more memory may be used unnecessarily.

See Also:
Constant Field Values
Constructor Detail

InvertedIndex

public InvertedIndex(Lexicon lexicon,
                     java.lang.String path,
                     java.lang.String prefix)

InvertedIndex

public InvertedIndex(Lexicon lexicon)
Creates an instance of the HtmlInvertedIndex class using the lexicon.

Parameters:
lexicon - The lexicon used for retrieval

InvertedIndex

public InvertedIndex(Lexicon lexicon,
                     java.lang.String filename)
Creates an instance of the HtmlInvertedIndex class using the given lexicon.

Parameters:
lexicon - The lexicon used for retrieval
filename - The name of the inverted file
Method Detail

print

public void print()
Prints out the inverted index file.


getDocuments

public int[][] getDocuments(LexiconEntry lEntry)

getDocuments

public int[][] getDocuments(int termid)
Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.

Parameters:
termid - the identifier of the term whose documents we are looking for.
Returns:
int[][] the two dimensional [3][n] array containing the n document identifiers, frequencies and field scores. If fields is not enabled, then size is [2][n].

getDocuments

public int[][] getDocuments(long sOffset,
                            byte sBitOffset,
                            long eOffset,
                            byte eBitOffset)
Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.

Parameters:
sOffset - start byte of the postings in the inverted file
sBitOffset - start bit of the postings in the inverted file
eOffset - end byte of the postings in the inverted file
eBitOffset - end bit of the postings in the inverted file
Returns:
int[][] the two dimensional [3][n] array containing the n document identifiers, frequencies and field scores. If fields is not enabled, then size is [2][n].

getDocuments

public int[][] getDocuments(int termid,
                            int startDocid,
                            int endDocid)
Returns a five dimensional array containing the document ids, the term frequencies, the field scores the block frequencies and the block ids for the given documents. The returned postings are for the documents within a specified range of docids.

Parameters:
termid - the id of the term whose documents we are looking for.
startDocid - The starting docid that will be returned.
endDocid - The last possible docid that will be returned.
Returns:
int[][] the five dimensional [5][] array containing the document ids, frequencies, field scores and block frequencies, while the last vector contains the block identifiers and it has a different length from the document identifiers.

close

public void close()
Closes the underlying bit file.


getBitFile

public BitFile getBitFile()
Returns the underlying bit file, in order to make more efficient use of the bit file during assigning scores to the retrieved documents.

Returns:
file the underlying bit file

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow