Package org.terrier.structures.indexing
Class DocumentPostingList
- java.lang.Object
-
- org.terrier.structures.indexing.DocumentPostingList
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.hadoop.io.Writable
- Direct Known Subclasses:
BlockDocumentPostingList,FieldDocumentPostingList
public class DocumentPostingList extends java.lang.Object implements org.apache.hadoop.io.Writable, java.io.SerializableRepresents the postings of one document. Uses HashMaps internally.Properties:
- indexing.avg.unique.terms.per.doc - number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected classDocumentPostingList.postingIterator
-
Field Summary
Fields Modifier and Type Field Description protected static intAVG_DOCUMENT_UNIQUE_TERMSnumber of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.protected intdocumentLengthlength of the document so far.protected gnu.trove.TObjectIntHashMap<java.lang.String>occurrencesmapping term to tf mapping
-
Constructor Summary
Constructors Constructor Description DocumentPostingList()Create a new DocumentPostingList object
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclear()Removes all postings from this documentvoidforEachTerm(gnu.trove.TObjectIntProcedure<java.lang.String> proc)Execute the specifed method for each term.intgetDocumentLength()Returns the total number of tokens in this documentDocumentIndexEntrygetDocumentStatistics()Return a DocumentIndexEntry for this documentintgetFrequency(java.lang.String term)Return the frequency of the specified term in this documentintgetNumberOfPointers()Returns the number of unique terms in this document.int[][]getPostings(TermCodes termCodes)Returns the postings suitable to be written into the direct index.IterablePostinggetPostings2(TermCodes termCodes)Returns a posting iterator suitable to be written into the direct index.voidinsert(int tf, java.lang.String term)Insert a term into the posting list of this documentvoidinsert(java.lang.String term)Insert a term into the posting list of this documentprotected IterablePostingmakePostingIterator(java.lang.String[] _terms, int[] termIds)voidreadFields(java.io.DataInput in)java.lang.String[]termSet()Returns all terms in this posting listvoidwrite(java.io.DataOutput out)
-
-
-
Field Detail
-
AVG_DOCUMENT_UNIQUE_TERMS
protected static final int AVG_DOCUMENT_UNIQUE_TERMS
number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.
-
documentLength
protected int documentLength
length of the document so far. Sum of the term frequencies inserted so far.
-
occurrences
protected final gnu.trove.TObjectIntHashMap<java.lang.String> occurrences
mapping term to tf mapping
-
-
Method Detail
-
termSet
public java.lang.String[] termSet()
Returns all terms in this posting list
-
getFrequency
public int getFrequency(java.lang.String term)
Return the frequency of the specified term in this document
-
clear
public void clear()
Removes all postings from this document
-
getDocumentLength
public int getDocumentLength()
Returns the total number of tokens in this document
-
getNumberOfPointers
public int getNumberOfPointers()
Returns the number of unique terms in this document.
-
insert
public void insert(java.lang.String term)
Insert a term into the posting list of this document- Parameters:
term- the Term being inserted
-
insert
public void insert(int tf, java.lang.String term)Insert a term into the posting list of this document- Parameters:
tf- frequencyterm- the Term being inserted
-
getDocumentStatistics
public DocumentIndexEntry getDocumentStatistics()
Return a DocumentIndexEntry for this document
-
forEachTerm
public void forEachTerm(gnu.trove.TObjectIntProcedure<java.lang.String> proc)
Execute the specifed method for each term.
-
getPostings
public int[][] getPostings(TermCodes termCodes)
Returns the postings suitable to be written into the direct index. During this, TermIds are assigned.
-
getPostings2
public IterablePosting getPostings2(TermCodes termCodes)
Returns a posting iterator suitable to be written into the direct index. During this, TermIds are assigned, using getTermId() method.
-
makePostingIterator
protected IterablePosting makePostingIterator(java.lang.String[] _terms, int[] termIds)
-
readFields
public void readFields(java.io.DataInput in) throws java.io.IOException- Specified by:
readFieldsin interfaceorg.apache.hadoop.io.Writable- Throws:
java.io.IOException
-
write
public void write(java.io.DataOutput out) throws java.io.IOException- Specified by:
writein interfaceorg.apache.hadoop.io.Writable- Throws:
java.io.IOException
-
-