Class BlockDocumentPostingList

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.hadoop.io.Writable

    public class BlockDocumentPostingList
    extends DocumentPostingList
    Represents the postings of one document, and saves block (term position) information. Uses HashMaps internally.

    Properties:

    • indexing.avg.unique.terms.per.doc - number of unique terms per doc on average, used to tune the initial size of the haashmaps used in this class.
    See Also:
    DocumentPostingList, Serialized Form
    • Field Detail

      • term_blocks

        protected final gnu.trove.THashMap<java.lang.String,​gnu.trove.TIntHashSet> term_blocks
        mapping term to blockids in this document
      • blockCount

        protected int blockCount
        number of blocks in this document. usually equal to document length, but perhaps less
    • Constructor Detail

      • BlockDocumentPostingList

        public BlockDocumentPostingList()
        Instantiate a new block document posting list. Saves block information, but no fields
    • Method Detail

      • insert

        public void insert​(java.lang.String t,
                           int blockId)
        Insert a term into this document, occurs at given block id
      • getBlocks

        public int[] getBlocks​(java.lang.String term)
        return blocks
        Parameters:
        term -
        Returns:
        int[]
      • readFields

        public void readFields​(java.io.DataInput in)
                        throws java.io.IOException
        Specified by:
        readFields in interface org.apache.hadoop.io.Writable
        Overrides:
        readFields in class DocumentPostingList
        Throws:
        java.io.IOException
      • write

        public void write​(java.io.DataOutput out)
                   throws java.io.IOException
        Specified by:
        write in interface org.apache.hadoop.io.Writable
        Overrides:
        write in class DocumentPostingList
        Throws:
        java.io.IOException