Terrier IR Platform
1.1.1

uk.ac.gla.terrier.structures
Class LexiconOutputStream

java.lang.Object
  extended by uk.ac.gla.terrier.structures.LexiconOutputStream
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
BlockLexiconOutputStream, UTFLexiconOutputStream

public class LexiconOutputStream
extends java.lang.Object
implements Closeable

This class implements an output stream for the lexicon structure.

Version:
$Revision: 1.24 $
Author:
Vassilis Plachouras

Constructor Summary
LexiconOutputStream()
          A default constructor.
LexiconOutputStream(java.io.File file)
          A constructor given the filename.
LexiconOutputStream(java.lang.String filename)
          A constructor given the filename.
LexiconOutputStream(java.lang.String path, java.lang.String prefix)
          A constructor for a LexiconOutputStream given the index path and prefix
 
Method Summary
 void close()
          Closes the lexicon stream.
 long getNumberOfPointersWritten()
          Returns the number of pointers there would be in an inverted index built using this lexicon.
 int getNumberOfTermsWritten()
           
 long getNumberOfTokensWritten()
          Returns the number of tokens there are in the entire collection represented by this lexicon.
 void setEndBitOffset(byte _endBitOffset)
          Deprecated.  
 void setEndOffset(long _endOffset)
          Deprecated.  
 void setNt(int _Nt)
          Deprecated.  
 void setTerm(java.lang.String _term)
          Deprecated.  
 void setTermId(int _termId)
          Deprecated.  
 void setTF(int _termFrequency)
          Deprecated.  
 int writeNextEntry(byte[] _term, int _termId, int _documentFrequency, int _termFrequency, long _endOffset, byte _endBitOffset)
          Writes a lexicon entry.
 int writeNextEntry(java.lang.String _term, int _termId, int _documentFrequency, int _termFrequency, long _endOffset, byte _endBitOffset)
          Writes a lexicon entry.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LexiconOutputStream

public LexiconOutputStream()
A default constructor.


LexiconOutputStream

public LexiconOutputStream(java.lang.String filename)
A constructor given the filename.

Parameters:
filename - java.lang.String the name of the lexicon file.

LexiconOutputStream

public LexiconOutputStream(java.io.File file)
A constructor given the filename.

Parameters:
file - java.io.File the name of the lexicon file.

LexiconOutputStream

public LexiconOutputStream(java.lang.String path,
                           java.lang.String prefix)
A constructor for a LexiconOutputStream given the index path and prefix

Parameters:
path - String the path to the index
prefix - String the prefix of the filenames in the index
Method Detail

close

public void close()
Closes the lexicon stream.

Specified by:
close in interface Closeable
Throws:
java.io.IOException - if an I/O error occurs while closing the stream.

writeNextEntry

public int writeNextEntry(java.lang.String _term,
                          int _termId,
                          int _documentFrequency,
                          int _termFrequency,
                          long _endOffset,
                          byte _endBitOffset)
                   throws java.io.IOException
Writes a lexicon entry.

Parameters:
_term - the string representation of the term
_termId - the terms integer identifier
_documentFrequency - the term's document frequency in the collection
_termFrequency - the term's frequency in the collection
_endOffset - the term's ending byte offset in the inverted file
_endBitOffset - the term's ending byte bit-offset in the inverted file
Returns:
the number of bytes written to the file.
Throws:
java.io.IOException - if an I/O error occurs

writeNextEntry

public int writeNextEntry(byte[] _term,
                          int _termId,
                          int _documentFrequency,
                          int _termFrequency,
                          long _endOffset,
                          byte _endBitOffset)
                   throws java.io.IOException
Writes a lexicon entry.

Parameters:
_term - the byte[] representation of the term. Using this format means that the term does not have to be decoded and recoded every time.
_termId - the terms integer identifier
_documentFrequency - the term's document frequency in the collection
_termFrequency - the term's frequency in the collection
_endOffset - the term's ending byte offset in the inverted file
_endBitOffset - the term's ending byte bit-offset in the inverted file
Returns:
the number of bytes written.
Throws:
java.io.IOException - if an I/O error occurs

getNumberOfPointersWritten

public long getNumberOfPointersWritten()
Returns the number of pointers there would be in an inverted index built using this lexicon. This is equal to the sum of the Nts written to this lexicon output stream.


getNumberOfTokensWritten

public long getNumberOfTokensWritten()
Returns the number of tokens there are in the entire collection represented by this lexicon. This is equal to the sum of the TFs written to this lexicon output stream.


getNumberOfTermsWritten

public int getNumberOfTermsWritten()

setEndBitOffset

public void setEndBitOffset(byte _endBitOffset)
Deprecated. 

Sets the bit offset in the last byte of the term's entry in the inverted file.

Parameters:
_endBitOffset - byte the bit offset in the last byte of the term's entry in the inverted file.

setEndOffset

public void setEndOffset(long _endOffset)
Deprecated. 

Sets the ending offset of the term's entry in the inverted file.

Parameters:
_endOffset - long The ending byte of the term's entry in the inverted file.

setNt

public void setNt(int _Nt)
Deprecated. 

Sets the document frequency for the given term.

Parameters:
_Nt - int The document frequency for the given term.

setTerm

public void setTerm(java.lang.String _term)
Deprecated. 

Sets the string representation of the term.

Parameters:
_term - java.lang.String The string representation of the seeked term.

setTermId

public void setTermId(int _termId)
Deprecated. 

Sets the term's id.

Parameters:
_termId - int the term's identifier.

setTF

public void setTF(int _termFrequency)
Deprecated. 

Sets the term frequency for the already found term.

Parameters:
_termFrequency - int The term frequency in the collection.

Terrier IR Platform
1.1.1

Terrier Information Retrieval Platform 1.1.1. Copyright 2004-2007 University of Glasgow