Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures
Class BlockLexicon

java.lang.Object
  extended by uk.ac.gla.terrier.structures.Lexicon
      extended by uk.ac.gla.terrier.structures.BlockLexicon
All Implemented Interfaces:
java.lang.Iterable<java.lang.String>, Closeable
Direct Known Subclasses:
UTFBlockLexicon

public class BlockLexicon
extends Lexicon

A lexicon class that saves the number of different blocks a term appears in. It is used only during creating the block inverted index. After the block inverted index has been created, the block lexicon is transformed into a lexicon.

Version:
$Revision: 1.33 $
Author:
Douglas Johnson, Vassilis Plachouras

Field Summary
static int lexiconEntryLength
          The size in bytes of an entry in the lexicon file.
 
Constructor Summary
BlockLexicon()
          A default constructor.
BlockLexicon(java.lang.String lexiconName)
          Constructs an instace of BlockLexicon and opens the corresponding file.
BlockLexicon(java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 boolean findTerm(int termId)
          Finds the term given its term code.
 boolean findTerm(java.lang.String _term)
          Performs a binary search in the lexicon in order to locate the given term.
 int getBlockFrequency()
          Returns the block frequency for the given term
static int numberOfEntries(java.io.File f)
           
static int numberOfEntries(java.lang.String filename)
           
 boolean seekEntry(int i)
          Seeks the i-th entry of the lexicon.
 boolean updateEntry(int i, int frequency, long endOffset, byte endBitOffset)
          Deprecated. The BlockLexicon is used during indexing, but not during retrieval.
 
Methods inherited from class uk.ac.gla.terrier.structures.Lexicon
close, getEndBitOffset, getEndOffset, getIthLexiconEntry, getLexiconEntry, getLexiconEntry, getNt, getNumberOfLexiconEntries, getStartBitOffset, getStartOffset, getTerm, getTermId, getTF, iterator, print
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lexiconEntryLength

public static final int lexiconEntryLength
The size in bytes of an entry in the lexicon file. An entry corresponds to a string, an int (termCode), an int (docf), an int (tf), a long (the offset of the end of the term's entry in bytes in the inverted file) and a byte (the offset in bits of the last byte of the term's entry in the inverted file.

Constructor Detail

BlockLexicon

public BlockLexicon()
A default constructor.


BlockLexicon

public BlockLexicon(java.lang.String lexiconName)
Constructs an instace of BlockLexicon and opens the corresponding file.

Parameters:
lexiconName - the name of the lexicon file.

BlockLexicon

public BlockLexicon(java.lang.String path,
                    java.lang.String prefix)
Method Detail

findTerm

public boolean findTerm(int termId)
Finds the term given its term code.

Overrides:
findTerm in class Lexicon
Parameters:
termId - the term's id
Returns:
true if the term is found, else return false

findTerm

public boolean findTerm(java.lang.String _term)
Performs a binary search in the lexicon in order to locate the given term. If the term is located, the properties termCharacters, documentFrequency, termFrequency, startOffset, startBitOffset, endOffset and endBitOffset contain the values related to the term.

Overrides:
findTerm in class Lexicon
Parameters:
_term - the term to search for.
Returns:
true if the term is found, and false otherwise.

getBlockFrequency

public int getBlockFrequency()
Returns the block frequency for the given term

Returns:
int The block frequency for the given term

seekEntry

public boolean seekEntry(int i)
Seeks the i-th entry of the lexicon.

Overrides:
seekEntry in class Lexicon
Parameters:
i - The index of the entry we are looking for.
Returns:
true if the entry was found, false otherwise.

updateEntry

public boolean updateEntry(int i,
                           int frequency,
                           long endOffset,
                           byte endBitOffset)
Deprecated. The BlockLexicon is used during indexing, but not during retrieval.

In an already stored entry in the lexicon file, the information about the term frequency, the endOffset in bytes, and the endBitOffset in the last byte, is updated. The term is specified by the index of the entry.

Overrides:
updateEntry in class Lexicon
Parameters:
i - the i-th entry
frequency - the term's Frequency
endOffset - the offset of the ending byte in the inverted file
endBitOffset - the offset in bits in the ending byte in the term's entry in inverted file
Returns:
true if the information is updated properly, otherwise return false

numberOfEntries

public static int numberOfEntries(java.io.File f)

numberOfEntries

public static int numberOfEntries(java.lang.String filename)

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow