Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures
Class InvertedIndex

java.lang.Object
  extended by uk.ac.gla.terrier.structures.InvertedIndex
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
BlockInvertedIndex

public class InvertedIndex
extends java.lang.Object
implements Closeable

This class implements the inverted index for performing retrieval, with field information optionally.

Version:
$Revision: 1.40 $
Author:
Douglas Johnson, Vassilis Plachouras, Craig Macdonald

Field Summary
static double FIELD_LOAD_FACTOR
          This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments() - retrieval with Fields.
static double NORMAL_LOAD_FACTOR
          This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments().
 
Constructor Summary
InvertedIndex(Lexicon lexicon)
          Creates an instance of the HtmlInvertedIndex class using the lexicon.
InvertedIndex(Lexicon lexicon, java.lang.String filename)
          Creates an instance of the HtmlInvertedIndex class using the given lexicon.
InvertedIndex(Lexicon lexicon, java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 void close()
          Closes the underlying bit file.
 BitInSeekable getBitFile()
          Returns the underlying bit file, in order to make more efficient use of the bit file during assigning scores to the retrieved documents.
 int[][] getDocuments(int termid)
          Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.
 int[][] getDocuments(LexiconEntry lEntry)
           
 int[][] getDocuments(long sOffset, byte sBitOffset, long eOffset, byte eBitOffset, int df)
          Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.
 java.lang.String getInfo(int term)
          Returns the information for a posting list in string format
 void print()
          Prints out the inverted index file.
 void reOpenLegacyBitFile()
          forces the data structure to reopen the underlying bitfile using the legacy implementation of BitFile (OldBitFile)
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NORMAL_LOAD_FACTOR

public static final double NORMAL_LOAD_FACTOR
This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments(). The higher this value, the less chance that the arraylist will have to be grown (growing is expensive), however more memory may be used unnecessarily.

See Also:
Constant Field Values

FIELD_LOAD_FACTOR

public static final double FIELD_LOAD_FACTOR
This is used during retrieval for a rough guess sizing of the temporaryTerms arraylist in getDocuments() - retrieval with Fields. The higher this value, the less chance that the arraylist will have to be grown (growing is expensive), however more memory may be used unnecessarily.

See Also:
Constant Field Values
Constructor Detail

InvertedIndex

public InvertedIndex(Lexicon lexicon,
                     java.lang.String path,
                     java.lang.String prefix)

InvertedIndex

public InvertedIndex(Lexicon lexicon)
Creates an instance of the HtmlInvertedIndex class using the lexicon.

Parameters:
lexicon - The lexicon used for retrieval

InvertedIndex

public InvertedIndex(Lexicon lexicon,
                     java.lang.String filename)
Creates an instance of the HtmlInvertedIndex class using the given lexicon.

Parameters:
lexicon - The lexicon used for retrieval
filename - The name of the inverted file
Method Detail

reOpenLegacyBitFile

public void reOpenLegacyBitFile()
                         throws java.io.IOException
forces the data structure to reopen the underlying bitfile using the legacy implementation of BitFile (OldBitFile)

Throws:
java.io.IOException

print

public void print()
Prints out the inverted index file.


getDocuments

public int[][] getDocuments(LexiconEntry lEntry)

getDocuments

public int[][] getDocuments(int termid)
Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.

Parameters:
termid - the identifier of the term whose documents we are looking for.
Returns:
int[][] the two dimensional [3][n] array containing the n document identifiers, frequencies and field scores. If fields is not enabled, then size is [2][n].

getDocuments

public int[][] getDocuments(long sOffset,
                            byte sBitOffset,
                            long eOffset,
                            byte eBitOffset,
                            int df)
Returns a two dimensional array containing the document ids, term frequencies and field scores for the given documents.

Parameters:
sOffset - start byte of the postings in the inverted file
sBitOffset - start bit of the postings in the inverted file
eOffset - end byte of the postings in the inverted file
eBitOffset - end bit of the postings in the inverted file
Returns:
int[][] the two dimensional [3][n] array containing the n document identifiers, frequencies and field scores. If fields is not enabled, then size is [2][n].

getInfo

public java.lang.String getInfo(int term)
Returns the information for a posting list in string format


close

public void close()
Closes the underlying bit file.

Specified by:
close in interface Closeable

getBitFile

public BitInSeekable getBitFile()
Returns the underlying bit file, in order to make more efficient use of the bit file during assigning scores to the retrieved documents.

Returns:
file the underlying bit file

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow