Terrier IR Platform
2.2.1

uk.ac.gla.terrier.structures
Class LexiconInputStream

java.lang.Object
  extended by uk.ac.gla.terrier.structures.LexiconInputStream
All Implemented Interfaces:
java.lang.Iterable<java.lang.String>, Closeable
Direct Known Subclasses:
BlockLexiconInputStream, UTFLexiconInputStream

public class LexiconInputStream
extends java.lang.Object
implements java.lang.Iterable<java.lang.String>, Closeable

This class implements an input stream for the lexicon structure.

Version:
$Revision: 1.36 $
Author:
Vassilis Plachouras

Constructor Summary
LexiconInputStream()
          A default constructor.
LexiconInputStream(java.io.DataInput in)
          Read a lexicon from the specified input stream
LexiconInputStream(java.io.File file)
          A constructor given the filename.
LexiconInputStream(java.lang.String filename)
          A constructor given the filename.
LexiconInputStream(java.lang.String path, java.lang.String prefix)
           
 
Method Summary
 void close()
          Closes the lexicon stream.
 byte getEndBitOffset()
          Returns the bit offset in the last byte of the term's entry in the inverted file.
 long getEndOffset()
          Returns the ending offset of the term's entry in the inverted file.
 int getEntrySize()
           
 int getNt()
          Return the document frequency for the given term.
 long getNumberOfPointersRead()
          Returns the number of pointers there would be in an inverted index built using this lexicon (thus far).
 int getNumberOfTermsRead()
          Returns the number of terms written so far by this LexiconInputStream
 long getNumberOfTokensRead()
          Returns the number of tokens there are in the entire collection represented by this lexicon (thus far).
 byte getStartBitOffset()
          Returns the bit offset in the first byte of the term's entry in the inverted file.
 long getStartOffset()
          Returns the starting offset of the term's entry in the inverted file.
 java.lang.String getTerm()
          Returns the string representation of the term.
 byte[] getTermCharacters()
          Returns the bytes of the String.
 int getTermId()
          Returns the term's id.
 int getTF()
          Returns the term frequency for the already seeked term.
 java.util.Iterator<java.lang.String> iterator()
          Returns an Interator of Strings of each term in this lexicon
 int numberOfEntries()
          Returns the number of entries in the lexicon file.
 void print()
          Prints out the contents of the lexicon file to check.
 int readNextEntry()
          Read the next lexicon entry.
 int readNextEntryBytes()
          This is an alias to readNextEntry(), except for implementations that cannot parse the string from the byte array.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LexiconInputStream

public LexiconInputStream()
A default constructor. Opens the default lexicon.


LexiconInputStream

public LexiconInputStream(java.lang.String filename)
A constructor given the filename.

Parameters:
filename - java.lang.String the name of the lexicon file.

LexiconInputStream

public LexiconInputStream(java.lang.String path,
                          java.lang.String prefix)

LexiconInputStream

public LexiconInputStream(java.io.File file)
A constructor given the filename.

Parameters:
file - java.io.File the name of the lexicon file.

LexiconInputStream

public LexiconInputStream(java.io.DataInput in)
Read a lexicon from the specified input stream

Method Detail

close

public void close()
Closes the lexicon stream.

Specified by:
close in interface Closeable
Throws:
java.io.IOException - if an I/O error occurs

getEntrySize

public int getEntrySize()

readNextEntry

public int readNextEntry()
                  throws java.io.IOException
Read the next lexicon entry.

Returns:
the number of bytes read if there is no error, otherwise returns -1 in case of EOF
Throws:
java.io.IOException - if an I/O error occurs

readNextEntryBytes

public int readNextEntryBytes()
                       throws java.io.IOException
This is an alias to readNextEntry(), except for implementations that cannot parse the string from the byte array.

Throws:
java.io.IOException

numberOfEntries

public int numberOfEntries()
Returns the number of entries in the lexicon file.


print

public void print()
Prints out the contents of the lexicon file to check.


getNumberOfPointersRead

public long getNumberOfPointersRead()
Returns the number of pointers there would be in an inverted index built using this lexicon (thus far). This is equal to the sum of the Nts written to this lexicon output stream.


getNumberOfTokensRead

public long getNumberOfTokensRead()
Returns the number of tokens there are in the entire collection represented by this lexicon (thus far). This is equal to the sum of the TFs written to this lexicon output stream.


getNumberOfTermsRead

public int getNumberOfTermsRead()
Returns the number of terms written so far by this LexiconInputStream


getEndBitOffset

public byte getEndBitOffset()
Returns the bit offset in the last byte of the term's entry in the inverted file.

Returns:
byte the bit offset in the last byte of the term's entry in the inverted file

getEndOffset

public long getEndOffset()
Returns the ending offset of the term's entry in the inverted file.

Returns:
long The ending byte of the term's entry in the inverted file.

getStartBitOffset

public byte getStartBitOffset()
Returns the bit offset in the first byte of the term's entry in the inverted file.

Returns:
byte the bit offset in the first byte of the term's entry in the inverted file

getStartOffset

public long getStartOffset()
Returns the starting offset of the term's entry in the inverted file.

Returns:
long The starting byte of the term's entry in the inverted file.

getNt

public int getNt()
Return the document frequency for the given term.

Returns:
int The document frequency for the given term

getTerm

public java.lang.String getTerm()
Returns the string representation of the term.

Returns:
the string representation of the already found term.

getTermId

public int getTermId()
Returns the term's id.

Returns:
the term's id.

getTF

public int getTF()
Returns the term frequency for the already seeked term.

Returns:
the term frequency in the collection.

getTermCharacters

public byte[] getTermCharacters()
Returns the bytes of the String.

Returns:
the byte array holding the term's byte representation

iterator

public java.util.Iterator<java.lang.String> iterator()
Returns an Interator of Strings of each term in this lexicon

Specified by:
iterator in interface java.lang.Iterable<java.lang.String>

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow