|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.structures.Lexicon uk.ac.gla.terrier.structures.UTFLexicon
public class UTFLexicon
The class that implements the lexicon structure. Apart from the lexicon file, which contains the actual data about the terms, and takes its name from ApplicationSetup.LEXICON_FILENAME, another file is created and used, containing a mapping from the term's code to the offset of the term in the lexicon. The name of this file is given by ApplicationSetup.LEXICON_INDEX_FILENAME.
ApplicationSetup.LEXICON_FILENAME
,
ApplicationSetup.LEXICON_INDEX_FILENAME
Field Summary | |
---|---|
static int |
lexiconEntryLength
The size in bytes of an entry in the lexicon file. |
Constructor Summary | |
---|---|
UTFLexicon()
A default constructor. |
|
UTFLexicon(java.lang.String lexiconName)
Constructs an instace of Lexicon and opens the corresponding file. |
|
UTFLexicon(java.lang.String path,
java.lang.String prefix)
|
Method Summary | |
---|---|
boolean |
findTerm(int _termId)
Finds the term given its term code. |
boolean |
findTerm(java.lang.String _term)
Performs a binary search in the lexicon in order to locate the given term. |
LexiconEntry |
getLexiconEntry(int termid)
Returns a LexiconEntry describing all the information in the lexicon about the term denoted by termid |
LexiconEntry |
getLexiconEntry(java.lang.String _term)
Returns a LexiconEntry describing all the information in the lexicon about the term denoted by _term |
static int |
numberOfEntries(java.io.File f)
|
static int |
numberOfEntries(java.lang.String filename)
|
boolean |
seekEntry(int i)
Seeks the i-th entry of the lexicon. |
boolean |
updateEntry(int i,
int frequency,
long endOffset,
byte endBitOffset)
Deprecated. The Lexicon class is only used for reading the lexicon file, and not for writing any information. |
Methods inherited from class uk.ac.gla.terrier.structures.Lexicon |
---|
close, getEndBitOffset, getEndOffset, getIthLexiconEntry, getNt, getNumberOfLexiconEntries, getStartBitOffset, getStartOffset, getTerm, getTermId, getTF, iterator, print |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int lexiconEntryLength
Constructor Detail |
---|
public UTFLexicon()
public UTFLexicon(java.lang.String path, java.lang.String prefix)
public UTFLexicon(java.lang.String lexiconName)
lexiconName
- the name of the lexicon file.Method Detail |
---|
public boolean findTerm(int _termId)
findTerm
in class Lexicon
_termId
- the term's identifier
public boolean findTerm(java.lang.String _term)
findTerm
in class Lexicon
_term
- The term to search for.
public boolean seekEntry(int i)
seekEntry
in class Lexicon
i
- The index of the entry we are looking for.
public LexiconEntry getLexiconEntry(int termid)
getLexiconEntry
in class Lexicon
termid
- the termid of the term of interest
public LexiconEntry getLexiconEntry(java.lang.String _term)
getLexiconEntry
in class Lexicon
_term
- the String term that is of interest
public boolean updateEntry(int i, int frequency, long endOffset, byte endBitOffset)
updateEntry
in class Lexicon
i
- the i-th entryfrequency
- the term's FrequencyendOffset
- the offset of the ending byte in the inverted fileendBitOffset
- the offset in bits in the ending byte
in the term's entry in inverted file
public static int numberOfEntries(java.io.File f)
public static int numberOfEntries(java.lang.String filename)
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |