|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.structures.LexiconInputStream
public class LexiconInputStream
This class implements an input stream for the lexicon structure.
Constructor Summary | |
---|---|
LexiconInputStream()
A default constructor. |
|
LexiconInputStream(java.io.DataInput in)
Read a lexicon from the specified input stream |
|
LexiconInputStream(java.io.File file)
A constructor given the filename. |
|
LexiconInputStream(java.lang.String filename)
A constructor given the filename. |
|
LexiconInputStream(java.lang.String path,
java.lang.String prefix)
|
Method Summary | |
---|---|
void |
close()
Closes the lexicon stream. |
byte |
getEndBitOffset()
Returns the bit offset in the last byte of the term's entry in the inverted file. |
long |
getEndOffset()
Returns the ending offset of the term's entry in the inverted file. |
int |
getEntrySize()
|
int |
getNt()
Return the document frequency for the given term. |
long |
getNumberOfPointersRead()
Returns the number of pointers there would be in an inverted index built using this lexicon (thus far). |
int |
getNumberOfTermsRead()
Returns the number of terms written so far by this LexiconInputStream |
long |
getNumberOfTokensRead()
Returns the number of tokens there are in the entire collection represented by this lexicon (thus far). |
byte |
getStartBitOffset()
Returns the bit offset in the first byte of the term's entry in the inverted file. |
long |
getStartOffset()
Returns the starting offset of the term's entry in the inverted file. |
java.lang.String |
getTerm()
Returns the string representation of the term. |
byte[] |
getTermCharacters()
Returns the bytes of the String. |
int |
getTermId()
Returns the term's id. |
int |
getTF()
Returns the term frequency for the already seeked term. |
java.util.Iterator<java.lang.String> |
iterator()
Returns an Interator of Strings of each term in this lexicon |
int |
numberOfEntries()
Returns the number of entries in the lexicon file. |
void |
print()
Prints out the contents of the lexicon file to check. |
int |
readNextEntry()
Read the next lexicon entry. |
int |
readNextEntryBytes()
This is an alias to readNextEntry(), except for implementations that cannot parse the string from the byte array. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public LexiconInputStream()
public LexiconInputStream(java.lang.String filename)
filename
- java.lang.String the name of the lexicon file.public LexiconInputStream(java.lang.String path, java.lang.String prefix)
public LexiconInputStream(java.io.File file)
file
- java.io.File the name of the lexicon file.public LexiconInputStream(java.io.DataInput in)
Method Detail |
---|
public void close()
close
in interface Closeable
java.io.IOException
- if an I/O error occurspublic int getEntrySize()
public int readNextEntry() throws java.io.IOException
java.io.IOException
- if an I/O error occurspublic int readNextEntryBytes() throws java.io.IOException
java.io.IOException
public int numberOfEntries()
public void print()
public long getNumberOfPointersRead()
public long getNumberOfTokensRead()
public int getNumberOfTermsRead()
public byte getEndBitOffset()
public long getEndOffset()
public byte getStartBitOffset()
public long getStartOffset()
public int getNt()
public java.lang.String getTerm()
public int getTermId()
public int getTF()
public byte[] getTermCharacters()
public java.util.Iterator<java.lang.String> iterator()
iterator
in interface java.lang.Iterable<java.lang.String>
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |