Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures
Class UTFLexiconInputStream

java.lang.Object
  extended by uk.ac.gla.terrier.structures.LexiconInputStream
      extended by uk.ac.gla.terrier.structures.UTFLexiconInputStream
All Implemented Interfaces:
java.lang.Iterable<java.lang.String>, Closeable

public class UTFLexiconInputStream
extends LexiconInputStream

This class implements an input stream for the lexicon structure.

Version:
$Revision: 1.16 $
Author:
Vassilis Plachouras, Craig Macdonald

Constructor Summary
UTFLexiconInputStream()
          A default constructor.
UTFLexiconInputStream(java.io.DataInput in)
          Read a lexicon from the specified input stream
UTFLexiconInputStream(java.io.File file)
          A constructor given the filename.
UTFLexiconInputStream(java.lang.String filename)
          A constructor given the filename.
UTFLexiconInputStream(java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 java.lang.String getTerm()
          Returns the string representation of the term.
 byte[] getTermCharacters()
          Returns the bytes of the String.
 int numberOfEntries()
          Returns the number of entries in the lexicon file.
 int readNextEntry()
          Read the next lexicon entry, where the term is parsed as a string.
 int readNextEntryBytes()
          Read the next lexicon entry, where the term is saved as a byte array.
 
Methods inherited from class uk.ac.gla.terrier.structures.LexiconInputStream
close, getEndBitOffset, getEndOffset, getEntrySize, getNt, getNumberOfPointersRead, getNumberOfTermsRead, getNumberOfTokensRead, getStartBitOffset, getStartOffset, getTermId, getTF, iterator, print
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UTFLexiconInputStream

public UTFLexiconInputStream()
A default constructor.


UTFLexiconInputStream

public UTFLexiconInputStream(java.lang.String filename)
A constructor given the filename.

Parameters:
filename - java.lang.String the name of the lexicon file.

UTFLexiconInputStream

public UTFLexiconInputStream(java.io.File file)
A constructor given the filename.

Parameters:
file - java.io.File the name of the lexicon file.

UTFLexiconInputStream

public UTFLexiconInputStream(java.lang.String path,
                             java.lang.String prefix)

UTFLexiconInputStream

public UTFLexiconInputStream(java.io.DataInput in)
Read a lexicon from the specified input stream

Method Detail

readNextEntry

public int readNextEntry()
                  throws java.io.IOException
Read the next lexicon entry, where the term is parsed as a string. This method does NOT work with getTermCharacters() - use readNextEntryBytes() iterator for that.

Overrides:
readNextEntry in class LexiconInputStream
Returns:
the number of bytes read if there is no error, otherwise returns -1 in case of EOF
Throws:
java.io.IOException - if an I/O error occurs

readNextEntryBytes

public int readNextEntryBytes()
                       throws java.io.IOException
Read the next lexicon entry, where the term is saved as a byte array. No attempt is made to parse the byte array and the padding bytes into a String. Use this method when you want to get the bytes of the string using getTermCharacters(). This method does NOT work with getTerm()

Overrides:
readNextEntryBytes in class LexiconInputStream
Returns:
the number of bytes read if there is no error, otherwise returns -1 in case of EOF
Throws:
java.io.IOException - if an I/O error occurs

numberOfEntries

public int numberOfEntries()
Returns the number of entries in the lexicon file.

Overrides:
numberOfEntries in class LexiconInputStream

getTerm

public java.lang.String getTerm()
Returns the string representation of the term.

Overrides:
getTerm in class LexiconInputStream
Returns:
the string representation of the already found term.

getTermCharacters

public byte[] getTermCharacters()
Returns the bytes of the String. Only valid is readNextEntryByte was used.

Overrides:
getTermCharacters in class LexiconInputStream
Returns:
the byte array holding the term's byte representation

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow