|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.structures.Lexicon
public class Lexicon
The class that implements the lexicon structure. Apart from the lexicon file, which contains the actual data about the terms, and takes its name from ApplicationSetup.LEXICON_FILENAME, another file is created and used, containing a mapping from the term's code to the offset of the term in the lexicon. The name of this file is given by ApplicationSetup.LEXICON_INDEX_FILENAME.
ApplicationSetup.LEXICON_FILENAME
,
ApplicationSetup.LEXICON_INDEX_FILENAME
Field Summary | |
---|---|
static int |
lexiconEntryLength
The size in bytes of an entry in the lexicon file. |
Constructor Summary | |
---|---|
Lexicon()
A default constructor. |
|
Lexicon(java.lang.String lexiconName)
Constructs an instace of Lexicon and opens the corresponding file. |
|
Lexicon(java.lang.String path,
java.lang.String prefix)
|
Method Summary | |
---|---|
void |
close()
Closes the lexicon and lexicon index files. |
boolean |
findTerm(int _termId)
Finds the term given its term code. |
boolean |
findTerm(java.lang.String _term)
Performs a binary search in the lexicon in order to locate the given term. |
byte |
getEndBitOffset()
Deprecated. |
long |
getEndOffset()
Deprecated. |
LexiconEntry |
getIthLexiconEntry(int termNumber)
Returns a LexiconEntry describing all the information in the lexicon about the ith term in the lexicon. |
LexiconEntry |
getLexiconEntry(int termid)
Returns a LexiconEntry describing all the information in the lexicon about the term denoted by termid |
LexiconEntry |
getLexiconEntry(java.lang.String _term)
Returns a LexiconEntry describing all the information in the lexicon about the term denoted by _term |
int |
getNt()
Deprecated. |
long |
getNumberOfLexiconEntries()
Deprecated. |
byte |
getStartBitOffset()
Deprecated. |
long |
getStartOffset()
Deprecated. |
java.lang.String |
getTerm()
Deprecated. |
int |
getTermId()
Deprecated. |
int |
getTF()
Deprecated. |
java.util.Iterator<java.lang.String> |
iterator()
Returns an interator that gives every item in the lexicon, in lexical order. |
static int |
numberOfEntries(java.io.File f)
Returns the number of entries in the lexicon file specified by f. |
static int |
numberOfEntries(java.lang.String filename)
Returns the number of entries in the lexicon file specified by filename. |
void |
print()
Prints out the contents of the lexicon file. |
boolean |
seekEntry(int i)
Seeks the i-th entry of the lexicon. |
boolean |
updateEntry(int i,
int frequency,
long endOffset,
byte endBitOffset)
Deprecated. The Lexicon class is only used for reading the lexicon file, and not for writing any information. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int lexiconEntryLength
Constructor Detail |
---|
public Lexicon()
public Lexicon(java.lang.String path, java.lang.String prefix)
public Lexicon(java.lang.String lexiconName)
lexiconName
- the name of the lexicon file.Method Detail |
---|
public void close()
close
in interface Closeable
public void print()
public boolean findTerm(int _termId)
_termId
- the term's identifier
public boolean findTerm(java.lang.String _term)
_term
- The term to search for.
public byte getEndBitOffset()
public long getEndOffset()
public int getNt()
public long getNumberOfLexiconEntries()
public byte getStartBitOffset()
public long getStartOffset()
public java.lang.String getTerm()
public int getTermId()
public int getTF()
public boolean seekEntry(int i)
i
- The index of the entry we are looking for.
public boolean updateEntry(int i, int frequency, long endOffset, byte endBitOffset)
i
- the i-th entryfrequency
- the term's FrequencyendOffset
- the offset of the ending byte in the inverted fileendBitOffset
- the offset in bits in the ending byte
in the term's entry in inverted file
public static int numberOfEntries(java.io.File f)
f
- The file to find the number of entries inpublic static int numberOfEntries(java.lang.String filename)
filename
- public LexiconEntry getIthLexiconEntry(int termNumber)
termNumber
- The ith term in the lexicon. i is 0-based, and runs to getNumberOfLexiconEntries()-1
public LexiconEntry getLexiconEntry(int termid)
termid
- the termid of the term of interest
public LexiconEntry getLexiconEntry(java.lang.String _term)
_term
- the String term that is of interest
public java.util.Iterator<java.lang.String> iterator()
iterator
in interface java.lang.Iterable<java.lang.String>
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |