public class DocumentPostingList extends Object implements org.apache.hadoop.io.Writable
Properties:
Modifier and Type | Class and Description |
---|---|
protected class |
DocumentPostingList.postingIterator |
Modifier and Type | Field and Description |
---|---|
protected static int |
AVG_DOCUMENT_UNIQUE_TERMS
number of unique terms per doc on average, used to tune the initial size of the hashmaps used in this class.
|
protected int |
documentLength
length of the document so far.
|
protected gnu.trove.TObjectIntHashMap<String> |
occurrences
mapping term to tf mapping
|
Constructor and Description |
---|
DocumentPostingList()
Create a new DocumentPostingList object
|
Modifier and Type | Method and Description |
---|---|
void |
clear()
Removes all postings from this document
|
void |
forEachTerm(gnu.trove.TObjectIntProcedure<String> proc)
Execute the specifed method for each term.
|
int |
getDocumentLength()
Returns the total number of tokens in this document
|
DocumentIndexEntry |
getDocumentStatistics()
Return a DocumentIndexEntry for this document
|
int |
getFrequency(String term)
Return the frequency of the specified term in this document
|
int |
getNumberOfPointers()
Returns the number of unique terms in this document.
|
int[][] |
getPostings()
Returns the postings suitable to be written into the direct index.
|
IterablePosting |
getPostings2()
Returns a posting iterator suitable to be written into the direct index.
|
protected int |
getTermId(String term)
Used by getPostings() and getPostings2() to obtain the term id of the term.
|
void |
insert(int tf,
String term)
Insert a term into the posting list of this document
|
void |
insert(String term)
Insert a term into the posting list of this document
|
protected IterablePosting |
makePostingIterator(String[] _terms,
int[] termIds) |
void |
readFields(DataInput in) |
String[] |
termSet()
Returns all terms in this posting list
|
void |
write(DataOutput out) |
protected static final int AVG_DOCUMENT_UNIQUE_TERMS
protected int documentLength
protected final gnu.trove.TObjectIntHashMap<String> occurrences
public DocumentPostingList()
public String[] termSet()
public int getFrequency(String term)
public void clear()
public int getDocumentLength()
public int getNumberOfPointers()
public void insert(String term)
term
- the Term being insertedpublic void insert(int tf, String term)
tf
- frequencyterm
- the Term being insertedpublic DocumentIndexEntry getDocumentStatistics()
public void forEachTerm(gnu.trove.TObjectIntProcedure<String> proc)
protected int getTermId(String term)
public int[][] getPostings()
public IterablePosting getPostings2()
protected IterablePosting makePostingIterator(String[] _terms, int[] termIds)
public void readFields(DataInput in) throws IOException
readFields
in interface org.apache.hadoop.io.Writable
IOException
public void write(DataOutput out) throws IOException
write
in interface org.apache.hadoop.io.Writable
IOException
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow