Package org.terrier.structures.indexing
Class DocumentPostingList
- java.lang.Object
-
- org.terrier.structures.indexing.DocumentPostingList
-
- All Implemented Interfaces:
java.io.Serializable
,org.apache.hadoop.io.Writable
- Direct Known Subclasses:
BlockDocumentPostingList
,FieldDocumentPostingList
public class DocumentPostingList extends java.lang.Object implements org.apache.hadoop.io.Writable, java.io.Serializable
Represents the postings of one document. Uses HashMaps internally.Properties:
- indexing.avg.unique.terms.per.doc - number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
DocumentPostingList.postingIterator
-
Field Summary
Fields Modifier and Type Field Description protected static int
AVG_DOCUMENT_UNIQUE_TERMS
number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.protected int
documentLength
length of the document so far.protected gnu.trove.TObjectIntHashMap<java.lang.String>
occurrences
mapping term to tf mapping
-
Constructor Summary
Constructors Constructor Description DocumentPostingList()
Create a new DocumentPostingList object
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
clear()
Removes all postings from this documentvoid
forEachTerm(gnu.trove.TObjectIntProcedure<java.lang.String> proc)
Execute the specifed method for each term.int
getDocumentLength()
Returns the total number of tokens in this documentDocumentIndexEntry
getDocumentStatistics()
Return a DocumentIndexEntry for this documentint
getFrequency(java.lang.String term)
Return the frequency of the specified term in this documentint
getNumberOfPointers()
Returns the number of unique terms in this document.int[][]
getPostings(TermCodes termCodes)
Returns the postings suitable to be written into the direct index.IterablePosting
getPostings2(TermCodes termCodes)
Returns a posting iterator suitable to be written into the direct index.void
insert(int tf, java.lang.String term)
Insert a term into the posting list of this documentvoid
insert(java.lang.String term)
Insert a term into the posting list of this documentprotected IterablePosting
makePostingIterator(java.lang.String[] _terms, int[] termIds)
void
readFields(java.io.DataInput in)
java.lang.String[]
termSet()
Returns all terms in this posting listvoid
write(java.io.DataOutput out)
-
-
-
Field Detail
-
AVG_DOCUMENT_UNIQUE_TERMS
protected static final int AVG_DOCUMENT_UNIQUE_TERMS
number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.
-
documentLength
protected int documentLength
length of the document so far. Sum of the term frequencies inserted so far.
-
occurrences
protected final gnu.trove.TObjectIntHashMap<java.lang.String> occurrences
mapping term to tf mapping
-
-
Method Detail
-
termSet
public java.lang.String[] termSet()
Returns all terms in this posting list
-
getFrequency
public int getFrequency(java.lang.String term)
Return the frequency of the specified term in this document
-
clear
public void clear()
Removes all postings from this document
-
getDocumentLength
public int getDocumentLength()
Returns the total number of tokens in this document
-
getNumberOfPointers
public int getNumberOfPointers()
Returns the number of unique terms in this document.
-
insert
public void insert(java.lang.String term)
Insert a term into the posting list of this document- Parameters:
term
- the Term being inserted
-
insert
public void insert(int tf, java.lang.String term)
Insert a term into the posting list of this document- Parameters:
tf
- frequencyterm
- the Term being inserted
-
getDocumentStatistics
public DocumentIndexEntry getDocumentStatistics()
Return a DocumentIndexEntry for this document
-
forEachTerm
public void forEachTerm(gnu.trove.TObjectIntProcedure<java.lang.String> proc)
Execute the specifed method for each term.
-
getPostings
public int[][] getPostings(TermCodes termCodes)
Returns the postings suitable to be written into the direct index. During this, TermIds are assigned.
-
getPostings2
public IterablePosting getPostings2(TermCodes termCodes)
Returns a posting iterator suitable to be written into the direct index. During this, TermIds are assigned, using getTermId() method.
-
makePostingIterator
protected IterablePosting makePostingIterator(java.lang.String[] _terms, int[] termIds)
-
readFields
public void readFields(java.io.DataInput in) throws java.io.IOException
- Specified by:
readFields
in interfaceorg.apache.hadoop.io.Writable
- Throws:
java.io.IOException
-
write
public void write(java.io.DataOutput out) throws java.io.IOException
- Specified by:
write
in interfaceorg.apache.hadoop.io.Writable
- Throws:
java.io.IOException
-
-