Package org.terrier.structures.indexing
Class BlockDocumentPostingList
- java.lang.Object
-
- org.terrier.structures.indexing.DocumentPostingList
-
- org.terrier.structures.indexing.BlockDocumentPostingList
-
- All Implemented Interfaces:
java.io.Serializable
,org.apache.hadoop.io.Writable
public class BlockDocumentPostingList extends DocumentPostingList
Represents the postings of one document, and saves block (term position) information. Uses HashMaps internally.Properties:
- indexing.avg.unique.terms.per.doc - number of unique terms per doc on average, used to tune the initial size of the haashmaps used in this class.
- See Also:
DocumentPostingList
, Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.terrier.structures.indexing.DocumentPostingList
DocumentPostingList.postingIterator
-
-
Field Summary
Fields Modifier and Type Field Description protected int
blockCount
number of blocks in this document.protected gnu.trove.THashMap<java.lang.String,gnu.trove.TIntHashSet>
term_blocks
mapping term to blockids in this document-
Fields inherited from class org.terrier.structures.indexing.DocumentPostingList
AVG_DOCUMENT_UNIQUE_TERMS, documentLength, occurrences
-
-
Constructor Summary
Constructors Constructor Description BlockDocumentPostingList()
Instantiate a new block document posting list.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
clear()
Removes all postings from this documentint[]
getBlocks(java.lang.String term)
return blocksint[][]
getPostings(TermCodes termCodes)
returns the postings suitable to be written into the block direct indexvoid
insert(java.lang.String t, int blockId)
Insert a term into this document, occurs at given block idprotected IterablePosting
makePostingIterator(java.lang.String[] _terms, int[] termIds)
void
readFields(java.io.DataInput in)
void
write(java.io.DataOutput out)
-
Methods inherited from class org.terrier.structures.indexing.DocumentPostingList
forEachTerm, getDocumentLength, getDocumentStatistics, getFrequency, getNumberOfPointers, getPostings2, insert, insert, termSet
-
-
-
-
Method Detail
-
insert
public void insert(java.lang.String t, int blockId)
Insert a term into this document, occurs at given block id
-
getBlocks
public int[] getBlocks(java.lang.String term)
return blocks- Parameters:
term
-- Returns:
- int[]
-
getPostings
public int[][] getPostings(TermCodes termCodes)
returns the postings suitable to be written into the block direct index- Overrides:
getPostings
in classDocumentPostingList
-
makePostingIterator
protected IterablePosting makePostingIterator(java.lang.String[] _terms, int[] termIds)
- Overrides:
makePostingIterator
in classDocumentPostingList
-
clear
public void clear()
Description copied from class:DocumentPostingList
Removes all postings from this document- Overrides:
clear
in classDocumentPostingList
-
readFields
public void readFields(java.io.DataInput in) throws java.io.IOException
- Specified by:
readFields
in interfaceorg.apache.hadoop.io.Writable
- Overrides:
readFields
in classDocumentPostingList
- Throws:
java.io.IOException
-
write
public void write(java.io.DataOutput out) throws java.io.IOException
- Specified by:
write
in interfaceorg.apache.hadoop.io.Writable
- Overrides:
write
in classDocumentPostingList
- Throws:
java.io.IOException
-
-