Package org.terrier.structures
Class BasicLexiconEntry
- java.lang.Object
-
- org.terrier.structures.LexiconEntry
-
- org.terrier.structures.BasicLexiconEntry
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.hadoop.io.Writable,BitFilePosition,BitIndexPointer,EntryStatistics,Pointer
- Direct Known Subclasses:
FieldLexiconEntry
public class BasicLexiconEntry extends LexiconEntry implements BitIndexPointer
Contains all the information about one entry in the Lexicon. Created to make thread-safe lookups in the Lexicon easier.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBasicLexiconEntry.FactoryFactory for creating LexiconEntry objects
-
Field Summary
Fields Modifier and Type Field Description intmaxtfintn_tthe number of document that this entry occurs inbytestartBitOffsetthe start bit offset of the entry in the inverted indexlongstartOffsetthe start offset of the entry in the inverted indexinttermIdthe termid of this entryintTFthe total number of occurrences of the term in the index-
Fields inherited from interface org.terrier.structures.BitIndexPointer
BIT_MASK, FILE_SHIFT, MAX_FILE_ID
-
-
Constructor Summary
Constructors Constructor Description BasicLexiconEntry()Create an empty LexiconEntryBasicLexiconEntry(int tid, int _n_t, int _TF)Create a lexicon entry with the following information.BasicLexiconEntry(int tid, int _n_t, int _TF, byte fileId, long _startOffset, byte _startBitOffset)Create a lexicon entry with the following information.BasicLexiconEntry(int tid, int _n_t, int _TF, int _maxtf, byte fileId, BitFilePosition offset)Create a lexicon entry with the following information.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(EntryStatistics le)increment this lexicon entry by anotherintgetDocumentFrequency()Return the number of documents that the term occurs in.bytegetFileNumber()Returns the file number (byte value in the 0-31 range)intgetFrequency()Return the frequency (total number of occurrences) of the term.intgetMaxFrequencyInDocuments()Return the maximum in-document term frequency of the term among all documents the terms appears in.intgetNumberOfEntries()Pointer implementation: how many entries in the inverted index.longgetOffset()Return the number of bytes offset.bytegetOffsetBits()Return the number of bits offset.intgetTermId()Return the id of the term.java.lang.StringpointerToString()Returns a textual representation of the pointer alonevoidreadFields(java.io.DataInput in)voidsetBitIndexPointer(BitIndexPointer pointer)Update this pointer to reflect the same values as the specified pointervoidsetDocumentFrequency(int nt)Set the number of documents that the term occurs in.voidsetFileNumber(byte fileId)Set the file number.voidsetFrequency(int F)Set the frequency (total number of occurrences) of the term.voidsetMaxFrequencyInDocuments(int max)Set the maximum in-document term frequency of the term among all documents the terms appears in.voidsetNumberOfEntries(int n)Update the number of entries in the pointervoidsetOffset(long bytes, byte bits)Set the offset in number of bytes and number of bits.voidsetOffset(BitFilePosition pos)Sets the bit file position within this object to that represented by the specified bit file position.voidsetPointer(Pointer p)Update the pointervoidsetStatistics(int _n_t, int _TF)Set the term statistics, in particular, the number of documents that this term appears in and the total number of occurrences of the term.voidsetTermId(int newTermId)Sets the ID for this termvoidsubtract(EntryStatistics le)alter this lexicon entry to subtract another lexicon entryvoidwrite(java.io.DataOutput out)-
Methods inherited from class org.terrier.structures.LexiconEntry
equals, getWritableEntryStatistics, hashCode, toString
-
-
-
-
Field Detail
-
maxtf
public int maxtf
-
termId
public int termId
the termid of this entry
-
n_t
public int n_t
the number of document that this entry occurs in
-
TF
public int TF
the total number of occurrences of the term in the index
-
startOffset
public long startOffset
the start offset of the entry in the inverted index
-
startBitOffset
public byte startBitOffset
the start bit offset of the entry in the inverted index
-
-
Constructor Detail
-
BasicLexiconEntry
public BasicLexiconEntry()
Create an empty LexiconEntry
-
BasicLexiconEntry
public BasicLexiconEntry(int tid, int _n_t, int _TF)Create a lexicon entry with the following information.- Parameters:
tid- the term id_n_t- the number of documents the term occurs in (document frequency)_TF- the total count of therm t in the collection
-
BasicLexiconEntry
public BasicLexiconEntry(int tid, int _n_t, int _TF, byte fileId, long _startOffset, byte _startBitOffset)Create a lexicon entry with the following information.- Parameters:
tid-_n_t-_TF-fileId-_startOffset-_startBitOffset-
-
BasicLexiconEntry
public BasicLexiconEntry(int tid, int _n_t, int _TF, int _maxtf, byte fileId, BitFilePosition offset)Create a lexicon entry with the following information.- Parameters:
tid-_n_t-_TF-fileId-offset-
-
-
Method Detail
-
setStatistics
public void setStatistics(int _n_t, int _TF)Set the term statistics, in particular, the number of documents that this term appears in and the total number of occurrences of the term.- Specified by:
setStatisticsin classLexiconEntry
-
add
public void add(EntryStatistics le)
increment this lexicon entry by another- Specified by:
addin interfaceEntryStatistics- Parameters:
le- the other object whose statistics are used to increment the statistics of this object.
-
subtract
public void subtract(EntryStatistics le)
alter this lexicon entry to subtract another lexicon entry- Specified by:
subtractin interfaceEntryStatistics- Parameters:
le- the other object whose statistics are used to decrement the statistics of this object.
-
getDocumentFrequency
public int getDocumentFrequency()
Return the number of documents that the term occurs in.- Specified by:
getDocumentFrequencyin interfaceEntryStatistics- Returns:
- the number of documents that the term occurs in.
-
getFrequency
public int getFrequency()
Return the frequency (total number of occurrences) of the term.- Specified by:
getFrequencyin interfaceEntryStatistics- Returns:
- the frequency (total number of occurrences) of the entry (term).
-
getTermId
public int getTermId()
Return the id of the term.- Specified by:
getTermIdin interfaceEntryStatistics- Returns:
- the id of the term.
-
getNumberOfEntries
public int getNumberOfEntries()
Pointer implementation: how many entries in the inverted index. Usually the same as getDocumentFrequency().- Specified by:
getNumberOfEntriesin interfacePointer- Overrides:
getNumberOfEntriesin classLexiconEntry- Returns:
- the number of "things" that this pointer refers to.
-
getOffsetBits
public byte getOffsetBits()
Return the number of bits offset.- Specified by:
getOffsetBitsin interfaceBitFilePosition- Returns:
- the number of bits offset.
-
getOffset
public long getOffset()
Return the number of bytes offset.- Specified by:
getOffsetin interfaceBitFilePosition- Returns:
- the number of bytes offset.
-
getFileNumber
public byte getFileNumber()
Returns the file number (byte value in the 0-31 range)- Specified by:
getFileNumberin interfaceBitIndexPointer- Returns:
- the file number (byte value in the 0-31 range)
-
setFileNumber
public void setFileNumber(byte fileId)
Set the file number.- Specified by:
setFileNumberin interfaceBitIndexPointer- Parameters:
fileId- the file number.
-
setTermId
public void setTermId(int newTermId)
Sets the ID for this term- Specified by:
setTermIdin classLexiconEntry
-
getMaxFrequencyInDocuments
public int getMaxFrequencyInDocuments()
Description copied from interface:EntryStatisticsReturn the maximum in-document term frequency of the term among all documents the terms appears in.- Specified by:
getMaxFrequencyInDocumentsin interfaceEntryStatistics- Returns:
- the maximum in-document term frequency of the term among all documents the terms appears in.
-
setMaxFrequencyInDocuments
public void setMaxFrequencyInDocuments(int max)
Description copied from interface:EntryStatisticsSet the maximum in-document term frequency of the term among all documents the terms appears in.- Specified by:
setMaxFrequencyInDocumentsin interfaceEntryStatistics- Parameters:
max- the maximum in-document term frequency of the term among all documents the terms appears in.
-
setOffset
public void setOffset(long bytes, byte bits)Set the offset in number of bytes and number of bits.- Specified by:
setOffsetin interfaceBitFilePosition- Parameters:
bytes- the number of bytes to set.bits- the number of bits to set.
-
setBitIndexPointer
public void setBitIndexPointer(BitIndexPointer pointer)
Update this pointer to reflect the same values as the specified pointer- Specified by:
setBitIndexPointerin interfaceBitIndexPointer- Parameters:
pointer- the pointer to use to set the byte offset, bit offset and file number parameters.
-
setOffset
public void setOffset(BitFilePosition pos)
Sets the bit file position within this object to that represented by the specified bit file position.- Specified by:
setOffsetin interfaceBitFilePosition- Parameters:
pos- other bit file position to update the bit file position in this object.
-
readFields
public void readFields(java.io.DataInput in) throws java.io.IOException- Specified by:
readFieldsin interfaceorg.apache.hadoop.io.Writable- Throws:
java.io.IOException
-
write
public void write(java.io.DataOutput out) throws java.io.IOException- Specified by:
writein interfaceorg.apache.hadoop.io.Writable- Throws:
java.io.IOException
-
setNumberOfEntries
public void setNumberOfEntries(int n)
Update the number of entries in the pointer- Specified by:
setNumberOfEntriesin interfacePointer- Overrides:
setNumberOfEntriesin classLexiconEntry- Parameters:
n- the number of "things" that the pointer refers to.
-
pointerToString
public java.lang.String pointerToString()
Returns a textual representation of the pointer alone- Specified by:
pointerToStringin interfacePointer- Overrides:
pointerToStringin classLexiconEntry
-
setPointer
public void setPointer(Pointer p)
Update the pointer- Specified by:
setPointerin interfacePointer- Overrides:
setPointerin classLexiconEntry- Parameters:
p- other pointer to update the pointer in this object.
-
setFrequency
public void setFrequency(int F)
Description copied from interface:EntryStatisticsSet the frequency (total number of occurrences) of the term.- Specified by:
setFrequencyin interfaceEntryStatistics- Parameters:
F- the frequency (total number of occurrences) of the entry (term).
-
setDocumentFrequency
public void setDocumentFrequency(int nt)
Description copied from interface:EntryStatisticsSet the number of documents that the term occurs in.- Specified by:
setDocumentFrequencyin interfaceEntryStatistics- Parameters:
nt- the number of documents that the term occurs in.
-
-