Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures
Class UTFBlockLexicon

java.lang.Object
  extended by uk.ac.gla.terrier.structures.Lexicon
      extended by uk.ac.gla.terrier.structures.BlockLexicon
          extended by uk.ac.gla.terrier.structures.UTFBlockLexicon
All Implemented Interfaces:
java.lang.Iterable<java.lang.String>, Closeable

public class UTFBlockLexicon
extends BlockLexicon

A lexicon class that saves the number of different blocks a term appears in, using UTF encoding of Strings. It is used only during creating a utf block inverted index. After the utf block inverted index has been created, the utf block lexicon is transformed into a utf lexicon.

Version:
$Revision: 1.16 $
Author:
Douglas Johnson, Vassilis Plachouras

Field Summary
static int lexiconEntryLength
          The size in bytes of an entry in the lexicon file.
 
Constructor Summary
UTFBlockLexicon()
          A default constructor.
UTFBlockLexicon(java.lang.String lexiconName)
          Constructs an instace of BlockLexicon and opens the corresponding file.
UTFBlockLexicon(java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 boolean findTerm(int termId)
          Finds the term given its term code.
 boolean findTerm(java.lang.String _term)
          Performs a binary search in the lexicon in order to locate the given term.
 int getBlockFrequency()
          Returns the block frequency for the given term
static int numberOfEntries(java.io.File f)
          returns the number of entries in the lexicon named by f
static int numberOfEntries(java.lang.String filename)
          returns the number of entries in the lexicon named by filename
 boolean seekEntry(int i)
          Seeks the i-th entry of the lexicon.
 boolean updateEntry(int i, int frequency, long endOffset, byte endBitOffset)
          Deprecated. Block Lexicons are used during indexing, but not during retrieval.
 
Methods inherited from class uk.ac.gla.terrier.structures.Lexicon
close, getEndBitOffset, getEndOffset, getIthLexiconEntry, getLexiconEntry, getLexiconEntry, getNt, getNumberOfLexiconEntries, getStartBitOffset, getStartOffset, getTerm, getTermId, getTF, iterator, print
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lexiconEntryLength

public static final int lexiconEntryLength
The size in bytes of an entry in the lexicon file. An entry corresponds to a string, an int (termCode), an int (docf), an int (tf), a long (the offset of the end of the term's entry in bytes in the inverted file) and a byte (the offset in bits of the last byte of the term's entry in the inverted file.

Constructor Detail

UTFBlockLexicon

public UTFBlockLexicon()
A default constructor.


UTFBlockLexicon

public UTFBlockLexicon(java.lang.String path,
                       java.lang.String prefix)

UTFBlockLexicon

public UTFBlockLexicon(java.lang.String lexiconName)
Constructs an instace of BlockLexicon and opens the corresponding file.

Parameters:
lexiconName - the name of the lexicon file.
Method Detail

findTerm

public boolean findTerm(int termId)
Finds the term given its term code.

Overrides:
findTerm in class BlockLexicon
Parameters:
termId - the term's id
Returns:
true if the term is found, else return false

findTerm

public boolean findTerm(java.lang.String _term)
Performs a binary search in the lexicon in order to locate the given term. If the term is located, the properties termCharacters, documentFrequency, termFrequency, startOffset, startBitOffset, endOffset and endBitOffset contain the values related to the term.

Overrides:
findTerm in class BlockLexicon
Parameters:
_term - the term to search for.
Returns:
true if the term is found, and false otherwise.

getBlockFrequency

public int getBlockFrequency()
Returns the block frequency for the given term

Overrides:
getBlockFrequency in class BlockLexicon
Returns:
int The block frequency for the given term

seekEntry

public boolean seekEntry(int i)
Seeks the i-th entry of the lexicon.

Overrides:
seekEntry in class BlockLexicon
Parameters:
i - The index of the entry we are looking for.
Returns:
true if the entry was found, false otherwise.

updateEntry

public boolean updateEntry(int i,
                           int frequency,
                           long endOffset,
                           byte endBitOffset)
Deprecated. Block Lexicons are used during indexing, but not during retrieval.

In an already stored entry in the lexicon file, the information about the term frequency, the endOffset in bytes, and the endBitOffset in the last byte, is updated. The term is specified by the index of the entry.

Overrides:
updateEntry in class BlockLexicon
Parameters:
i - the i-th entry
frequency - the term's Frequency
endOffset - the offset of the ending byte in the inverted file
endBitOffset - the offset in bits in the ending byte in the term's entry in inverted file
Returns:
true if the information is updated properly, otherwise return false

numberOfEntries

public static int numberOfEntries(java.io.File f)
returns the number of entries in the lexicon named by f


numberOfEntries

public static int numberOfEntries(java.lang.String filename)
returns the number of entries in the lexicon named by filename


Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow