|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.structures.LexiconOutputStream
public class LexiconOutputStream
This class implements an output stream for the lexicon structure.
Constructor Summary | |
---|---|
LexiconOutputStream()
A default constructor. |
|
LexiconOutputStream(java.io.DataOutput out)
Create a lexicon using the specified data stream |
|
LexiconOutputStream(java.io.File file)
A constructor given the filename. |
|
LexiconOutputStream(java.lang.String filename)
A constructor given the filename. |
|
LexiconOutputStream(java.lang.String path,
java.lang.String prefix)
A constructor for a LexiconOutputStream given the index path and prefix |
Method Summary | |
---|---|
void |
close()
Closes the lexicon stream. |
long |
getNumberOfPointersWritten()
Returns the number of pointers there would be in an inverted index built using this lexicon (thus far). |
int |
getNumberOfTermsWritten()
Returns the number of terms written so far by this LexiconInputStream |
long |
getNumberOfTokensWritten()
Returns the number of tokens there are in the entire collection represented by this lexicon (thus far). |
void |
setEndBitOffset(byte _endBitOffset)
Deprecated. |
void |
setEndOffset(long _endOffset)
Deprecated. |
void |
setNt(int _Nt)
Deprecated. |
void |
setTerm(java.lang.String _term)
Deprecated. |
void |
setTermId(int _termId)
Deprecated. |
void |
setTF(int _termFrequency)
Deprecated. |
int |
writeNextEntry(byte[] _term,
int _termId,
int _documentFrequency,
int _termFrequency,
long _endOffset,
byte _endBitOffset)
Writes a lexicon entry. |
int |
writeNextEntry(java.lang.String _term,
int _termId,
int _documentFrequency,
int _termFrequency,
long _endOffset,
byte _endBitOffset)
Writes a lexicon entry. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public LexiconOutputStream()
public LexiconOutputStream(java.io.DataOutput out)
public LexiconOutputStream(java.lang.String filename)
filename
- java.lang.String the name of the lexicon file.public LexiconOutputStream(java.io.File file)
file
- java.io.File the name of the lexicon file.public LexiconOutputStream(java.lang.String path, java.lang.String prefix)
path
- String the path to the indexprefix
- String the prefix of the filenames in the indexMethod Detail |
---|
public void close()
close
in interface Closeable
java.io.IOException
- if an I/O error occurs while closing the stream.public int writeNextEntry(java.lang.String _term, int _termId, int _documentFrequency, int _termFrequency, long _endOffset, byte _endBitOffset) throws java.io.IOException
_term
- the string representation of the term_termId
- the terms integer identifier_documentFrequency
- the term's document frequency in the collection_termFrequency
- the term's frequency in the collection_endOffset
- the term's ending byte offset in the inverted file_endBitOffset
- the term's ending byte bit-offset in the inverted file
java.io.IOException
- if an I/O error occurspublic int writeNextEntry(byte[] _term, int _termId, int _documentFrequency, int _termFrequency, long _endOffset, byte _endBitOffset) throws java.io.IOException
_term
- the byte[] representation of the term. Using this format means that
the term does not have to be decoded and recoded every time._termId
- the terms integer identifier_documentFrequency
- the term's document frequency in the collection_termFrequency
- the term's frequency in the collection_endOffset
- the term's ending byte offset in the inverted file_endBitOffset
- the term's ending byte bit-offset in the inverted file
java.io.IOException
- if an I/O error occurspublic long getNumberOfPointersWritten()
public long getNumberOfTokensWritten()
public int getNumberOfTermsWritten()
public void setEndBitOffset(byte _endBitOffset)
_endBitOffset
- byte the bit offset in the last byte of the
term's entry in the inverted file.public void setEndOffset(long _endOffset)
_endOffset
- long The ending byte of the term's
entry in the inverted file.public void setNt(int _Nt)
_Nt
- int The document frequency for the given term.public void setTerm(java.lang.String _term)
_term
- java.lang.String The string representation of
the seeked term.public void setTermId(int _termId)
_termId
- int the term's identifier.public void setTF(int _termFrequency)
_termFrequency
- int The term frequency in the collection.
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |