org.terrier.structures.indexing
Class DocumentPostingList

java.lang.Object
  extended by org.terrier.structures.indexing.DocumentPostingList
Direct Known Subclasses:
BlockDocumentPostingList, FieldDocumentPostingList

public class DocumentPostingList
extends java.lang.Object

Represents the postings of one document. Uses HashMaps internally.

Properties:


Nested Class Summary
protected  class DocumentPostingList.postingIterator
           
 
Field Summary
protected static int AVG_DOCUMENT_UNIQUE_TERMS
          number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.
protected  int documentLength
          length of the document so far.
protected  gnu.trove.TObjectIntHashMap<java.lang.String> occurrences
          mapping term to tf mapping
 
Constructor Summary
DocumentPostingList()
          Create a new DocumentPostingList object
 
Method Summary
 void clear()
          Removes all postings from this document
 void forEachTerm(gnu.trove.TObjectIntProcedure<java.lang.String> proc)
          Execute the specifed method for each term.
 int getDocumentLength()
          Returns the total number of tokens in this document
 DocumentIndexEntry getDocumentStatistics()
          Return a DocumentIndexEntry for this document
 int getFrequency(java.lang.String term)
          Return the frequency of the specified term in this document
 int getNumberOfPointers()
          Returns the number of unique terms in this document.
 int[][] getPostings()
          Returns the postings suitable to be written into the direct index.
 IterablePosting getPostings2()
          Returns a posting iterator suitable to be written into the direct index.
protected  int getTermId(java.lang.String term)
          Used by getPostings() and getPostings2() to obtain the term id of the term.
 void insert(int tf, java.lang.String term)
          Insert a term into the posting list of this document
 void insert(java.lang.String term)
          Insert a term into the posting list of this document
protected  IterablePosting makePostingIterator(java.lang.String[] _terms, int[] termIds)
           
 java.lang.String[] termSet()
          Returns all terms in this posting list
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

AVG_DOCUMENT_UNIQUE_TERMS

protected static final int AVG_DOCUMENT_UNIQUE_TERMS
number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.


documentLength

protected int documentLength
length of the document so far. Sum of the term frequencies inserted so far.


occurrences

protected final gnu.trove.TObjectIntHashMap<java.lang.String> occurrences
mapping term to tf mapping

Constructor Detail

DocumentPostingList

public DocumentPostingList()
Create a new DocumentPostingList object

Method Detail

termSet

public java.lang.String[] termSet()
Returns all terms in this posting list


getFrequency

public int getFrequency(java.lang.String term)
Return the frequency of the specified term in this document


clear

public void clear()
Removes all postings from this document


getDocumentLength

public int getDocumentLength()
Returns the total number of tokens in this document


getNumberOfPointers

public int getNumberOfPointers()
Returns the number of unique terms in this document.


insert

public void insert(java.lang.String term)
Insert a term into the posting list of this document

Parameters:
term - the Term being inserted

insert

public void insert(int tf,
                   java.lang.String term)
Insert a term into the posting list of this document

Parameters:
tf - frequency
term - the Term being inserted

getDocumentStatistics

public DocumentIndexEntry getDocumentStatistics()
Return a DocumentIndexEntry for this document


forEachTerm

public void forEachTerm(gnu.trove.TObjectIntProcedure<java.lang.String> proc)
Execute the specifed method for each term.


getTermId

protected int getTermId(java.lang.String term)
Used by getPostings() and getPostings2() to obtain the term id of the term. This implementation uses the TermCodes class.


getPostings

public int[][] getPostings()
Returns the postings suitable to be written into the direct index. During this, TermIds are assigned.


getPostings2

public IterablePosting getPostings2()
Returns a posting iterator suitable to be written into the direct index. During this, TermIds are assigned, using getTermId() method.


makePostingIterator

protected IterablePosting makePostingIterator(java.lang.String[] _terms,
                                              int[] termIds)


Terrier 3.5. Copyright © 2004-2011 University of Glasgow