Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures
Class UTFBlockLexiconInputStream

java.lang.Object
  extended by uk.ac.gla.terrier.structures.LexiconInputStream
      extended by uk.ac.gla.terrier.structures.BlockLexiconInputStream
          extended by uk.ac.gla.terrier.structures.UTFBlockLexiconInputStream
All Implemented Interfaces:
java.lang.Iterable<java.lang.String>, Closeable

public class UTFBlockLexiconInputStream
extends BlockLexiconInputStream

An input stream for accessing sequentially the entries of a block lexicon.

Version:
$Revision: 1.17 $
Author:
Douglas Johnson, Vassilis Plachouras

Constructor Summary
UTFBlockLexiconInputStream()
          A default constructor.
UTFBlockLexiconInputStream(java.io.DataInput in)
          Read a lexicon from the specified input stream
UTFBlockLexiconInputStream(java.io.File file)
          A constructor given the filename.
UTFBlockLexiconInputStream(java.lang.String filename)
          A constructor given the filename.
 
Method Summary
 java.lang.String getTerm()
          Returns the string representation of the term.
 byte[] getTermCharacters()
          Returns the bytes of the String.
 int numberOfEntries()
          Returns the number of entries in the lexicon file.
 void print()
          Prints out the contents of the lexicon file to check.
 int readNextEntry()
          Read the next lexicon entry.
 int readNextEntryBytes()
          Read the next lexicon entry, where the term is saved as a byte array.
 
Methods inherited from class uk.ac.gla.terrier.structures.BlockLexiconInputStream
getBlockFrequency
 
Methods inherited from class uk.ac.gla.terrier.structures.LexiconInputStream
close, getEndBitOffset, getEndOffset, getEntrySize, getNt, getNumberOfPointersRead, getNumberOfTermsRead, getNumberOfTokensRead, getStartBitOffset, getStartOffset, getTermId, getTF, iterator
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UTFBlockLexiconInputStream

public UTFBlockLexiconInputStream()
A default constructor.


UTFBlockLexiconInputStream

public UTFBlockLexiconInputStream(java.lang.String filename)
A constructor given the filename.

Parameters:
filename - java.lang.String the name of the lexicon file.

UTFBlockLexiconInputStream

public UTFBlockLexiconInputStream(java.io.File file)
A constructor given the filename.

Parameters:
file - java.io.File the name of the lexicon file.

UTFBlockLexiconInputStream

public UTFBlockLexiconInputStream(java.io.DataInput in)
Read a lexicon from the specified input stream

Method Detail

readNextEntry

public int readNextEntry()
                  throws java.io.IOException
Read the next lexicon entry.

Overrides:
readNextEntry in class BlockLexiconInputStream
Returns:
the number of bytes read if there is no error, otherwise returns -1 in case of EOF
Throws:
java.io.IOException - if an I/O error occurs

numberOfEntries

public int numberOfEntries()
Returns the number of entries in the lexicon file.

Overrides:
numberOfEntries in class BlockLexiconInputStream

readNextEntryBytes

public int readNextEntryBytes()
                       throws java.io.IOException
Read the next lexicon entry, where the term is saved as a byte array. No attempt is made to parse the byte array and the padding bytes into a String. Use this method when you want to get the bytes of the string using getTermCharacters(). This method does NOT work with getTerm()

Overrides:
readNextEntryBytes in class LexiconInputStream
Returns:
the number of bytes read if there is no error, otherwise returns -1 in case of EOF
Throws:
java.io.IOException - if an I/O error occurs

print

public void print()
Prints out the contents of the lexicon file to check.

Overrides:
print in class BlockLexiconInputStream

getTerm

public java.lang.String getTerm()
Returns the string representation of the term.

Overrides:
getTerm in class LexiconInputStream
Returns:
the string representation of the already found term.

getTermCharacters

public byte[] getTermCharacters()
Returns the bytes of the String. Only valid is readNextEntryByte was used.

Overrides:
getTermCharacters in class LexiconInputStream
Returns:
the byte array holding the term's byte representation

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow