Package org.terrier.structures
Class BasicLexiconEntry
- java.lang.Object
-
- org.terrier.structures.LexiconEntry
-
- org.terrier.structures.BasicLexiconEntry
-
- All Implemented Interfaces:
java.io.Serializable
,org.apache.hadoop.io.Writable
,BitFilePosition
,BitIndexPointer
,EntryStatistics
,Pointer
- Direct Known Subclasses:
FieldLexiconEntry
public class BasicLexiconEntry extends LexiconEntry implements BitIndexPointer
Contains all the information about one entry in the Lexicon. Created to make thread-safe lookups in the Lexicon easier.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BasicLexiconEntry.Factory
Factory for creating LexiconEntry objects
-
Field Summary
Fields Modifier and Type Field Description int
maxtf
int
n_t
the number of document that this entry occurs inbyte
startBitOffset
the start bit offset of the entry in the inverted indexlong
startOffset
the start offset of the entry in the inverted indexint
termId
the termid of this entryint
TF
the total number of occurrences of the term in the index-
Fields inherited from interface org.terrier.structures.BitIndexPointer
BIT_MASK, FILE_SHIFT, MAX_FILE_ID
-
-
Constructor Summary
Constructors Constructor Description BasicLexiconEntry()
Create an empty LexiconEntryBasicLexiconEntry(int tid, int _n_t, int _TF)
Create a lexicon entry with the following information.BasicLexiconEntry(int tid, int _n_t, int _TF, byte fileId, long _startOffset, byte _startBitOffset)
Create a lexicon entry with the following information.BasicLexiconEntry(int tid, int _n_t, int _TF, int _maxtf, byte fileId, BitFilePosition offset)
Create a lexicon entry with the following information.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(EntryStatistics le)
increment this lexicon entry by anotherint
getDocumentFrequency()
Return the number of documents that the term occurs in.byte
getFileNumber()
Returns the file number (byte value in the 0-31 range)int
getFrequency()
Return the frequency (total number of occurrences) of the term.int
getMaxFrequencyInDocuments()
Return the maximum in-document term frequency of the term among all documents the terms appears in.int
getNumberOfEntries()
Pointer implementation: how many entries in the inverted index.long
getOffset()
Return the number of bytes offset.byte
getOffsetBits()
Return the number of bits offset.int
getTermId()
Return the id of the term.java.lang.String
pointerToString()
Returns a textual representation of the pointer alonevoid
readFields(java.io.DataInput in)
void
setBitIndexPointer(BitIndexPointer pointer)
Update this pointer to reflect the same values as the specified pointervoid
setDocumentFrequency(int nt)
Set the number of documents that the term occurs in.void
setFileNumber(byte fileId)
Set the file number.void
setFrequency(int F)
Set the frequency (total number of occurrences) of the term.void
setMaxFrequencyInDocuments(int max)
Set the maximum in-document term frequency of the term among all documents the terms appears in.void
setNumberOfEntries(int n)
Update the number of entries in the pointervoid
setOffset(long bytes, byte bits)
Set the offset in number of bytes and number of bits.void
setOffset(BitFilePosition pos)
Sets the bit file position within this object to that represented by the specified bit file position.void
setPointer(Pointer p)
Update the pointervoid
setStatistics(int _n_t, int _TF)
Set the term statistics, in particular, the number of documents that this term appears in and the total number of occurrences of the term.void
setTermId(int newTermId)
Sets the ID for this termvoid
subtract(EntryStatistics le)
alter this lexicon entry to subtract another lexicon entryvoid
write(java.io.DataOutput out)
-
Methods inherited from class org.terrier.structures.LexiconEntry
equals, getWritableEntryStatistics, hashCode, toString
-
-
-
-
Field Detail
-
maxtf
public int maxtf
-
termId
public int termId
the termid of this entry
-
n_t
public int n_t
the number of document that this entry occurs in
-
TF
public int TF
the total number of occurrences of the term in the index
-
startOffset
public long startOffset
the start offset of the entry in the inverted index
-
startBitOffset
public byte startBitOffset
the start bit offset of the entry in the inverted index
-
-
Constructor Detail
-
BasicLexiconEntry
public BasicLexiconEntry()
Create an empty LexiconEntry
-
BasicLexiconEntry
public BasicLexiconEntry(int tid, int _n_t, int _TF)
Create a lexicon entry with the following information.- Parameters:
tid
- the term id_n_t
- the number of documents the term occurs in (document frequency)_TF
- the total count of therm t in the collection
-
BasicLexiconEntry
public BasicLexiconEntry(int tid, int _n_t, int _TF, byte fileId, long _startOffset, byte _startBitOffset)
Create a lexicon entry with the following information.- Parameters:
tid
-_n_t
-_TF
-fileId
-_startOffset
-_startBitOffset
-
-
BasicLexiconEntry
public BasicLexiconEntry(int tid, int _n_t, int _TF, int _maxtf, byte fileId, BitFilePosition offset)
Create a lexicon entry with the following information.- Parameters:
tid
-_n_t
-_TF
-fileId
-offset
-
-
-
Method Detail
-
setStatistics
public void setStatistics(int _n_t, int _TF)
Set the term statistics, in particular, the number of documents that this term appears in and the total number of occurrences of the term.- Specified by:
setStatistics
in classLexiconEntry
-
add
public void add(EntryStatistics le)
increment this lexicon entry by another- Specified by:
add
in interfaceEntryStatistics
- Parameters:
le
- the other object whose statistics are used to increment the statistics of this object.
-
subtract
public void subtract(EntryStatistics le)
alter this lexicon entry to subtract another lexicon entry- Specified by:
subtract
in interfaceEntryStatistics
- Parameters:
le
- the other object whose statistics are used to decrement the statistics of this object.
-
getDocumentFrequency
public int getDocumentFrequency()
Return the number of documents that the term occurs in.- Specified by:
getDocumentFrequency
in interfaceEntryStatistics
- Returns:
- the number of documents that the term occurs in.
-
getFrequency
public int getFrequency()
Return the frequency (total number of occurrences) of the term.- Specified by:
getFrequency
in interfaceEntryStatistics
- Returns:
- the frequency (total number of occurrences) of the entry (term).
-
getTermId
public int getTermId()
Return the id of the term.- Specified by:
getTermId
in interfaceEntryStatistics
- Returns:
- the id of the term.
-
getNumberOfEntries
public int getNumberOfEntries()
Pointer implementation: how many entries in the inverted index. Usually the same as getDocumentFrequency().- Specified by:
getNumberOfEntries
in interfacePointer
- Overrides:
getNumberOfEntries
in classLexiconEntry
- Returns:
- the number of "things" that this pointer refers to.
-
getOffsetBits
public byte getOffsetBits()
Return the number of bits offset.- Specified by:
getOffsetBits
in interfaceBitFilePosition
- Returns:
- the number of bits offset.
-
getOffset
public long getOffset()
Return the number of bytes offset.- Specified by:
getOffset
in interfaceBitFilePosition
- Returns:
- the number of bytes offset.
-
getFileNumber
public byte getFileNumber()
Returns the file number (byte value in the 0-31 range)- Specified by:
getFileNumber
in interfaceBitIndexPointer
- Returns:
- the file number (byte value in the 0-31 range)
-
setFileNumber
public void setFileNumber(byte fileId)
Set the file number.- Specified by:
setFileNumber
in interfaceBitIndexPointer
- Parameters:
fileId
- the file number.
-
setTermId
public void setTermId(int newTermId)
Sets the ID for this term- Specified by:
setTermId
in classLexiconEntry
-
getMaxFrequencyInDocuments
public int getMaxFrequencyInDocuments()
Description copied from interface:EntryStatistics
Return the maximum in-document term frequency of the term among all documents the terms appears in.- Specified by:
getMaxFrequencyInDocuments
in interfaceEntryStatistics
- Returns:
- the maximum in-document term frequency of the term among all documents the terms appears in.
-
setMaxFrequencyInDocuments
public void setMaxFrequencyInDocuments(int max)
Description copied from interface:EntryStatistics
Set the maximum in-document term frequency of the term among all documents the terms appears in.- Specified by:
setMaxFrequencyInDocuments
in interfaceEntryStatistics
- Parameters:
max
- the maximum in-document term frequency of the term among all documents the terms appears in.
-
setOffset
public void setOffset(long bytes, byte bits)
Set the offset in number of bytes and number of bits.- Specified by:
setOffset
in interfaceBitFilePosition
- Parameters:
bytes
- the number of bytes to set.bits
- the number of bits to set.
-
setBitIndexPointer
public void setBitIndexPointer(BitIndexPointer pointer)
Update this pointer to reflect the same values as the specified pointer- Specified by:
setBitIndexPointer
in interfaceBitIndexPointer
- Parameters:
pointer
- the pointer to use to set the byte offset, bit offset and file number parameters.
-
setOffset
public void setOffset(BitFilePosition pos)
Sets the bit file position within this object to that represented by the specified bit file position.- Specified by:
setOffset
in interfaceBitFilePosition
- Parameters:
pos
- other bit file position to update the bit file position in this object.
-
readFields
public void readFields(java.io.DataInput in) throws java.io.IOException
- Specified by:
readFields
in interfaceorg.apache.hadoop.io.Writable
- Throws:
java.io.IOException
-
write
public void write(java.io.DataOutput out) throws java.io.IOException
- Specified by:
write
in interfaceorg.apache.hadoop.io.Writable
- Throws:
java.io.IOException
-
setNumberOfEntries
public void setNumberOfEntries(int n)
Update the number of entries in the pointer- Specified by:
setNumberOfEntries
in interfacePointer
- Overrides:
setNumberOfEntries
in classLexiconEntry
- Parameters:
n
- the number of "things" that the pointer refers to.
-
pointerToString
public java.lang.String pointerToString()
Returns a textual representation of the pointer alone- Specified by:
pointerToString
in interfacePointer
- Overrides:
pointerToString
in classLexiconEntry
-
setPointer
public void setPointer(Pointer p)
Update the pointer- Specified by:
setPointer
in interfacePointer
- Overrides:
setPointer
in classLexiconEntry
- Parameters:
p
- other pointer to update the pointer in this object.
-
setFrequency
public void setFrequency(int F)
Description copied from interface:EntryStatistics
Set the frequency (total number of occurrences) of the term.- Specified by:
setFrequency
in interfaceEntryStatistics
- Parameters:
F
- the frequency (total number of occurrences) of the entry (term).
-
setDocumentFrequency
public void setDocumentFrequency(int nt)
Description copied from interface:EntryStatistics
Set the number of documents that the term occurs in.- Specified by:
setDocumentFrequency
in interfaceEntryStatistics
- Parameters:
nt
- the number of documents that the term occurs in.
-
-