Class Inverted2DirectIndexBuilder

  • Direct Known Subclasses:
    BlockInverted2DirectIndexBuilder

    public class Inverted2DirectIndexBuilder
    extends java.lang.Object
    Create a direct index from an InvertedIndex. The algorithm is similar to that followed by InvertedIndexBuilder. To summarise, InvertedIndexBuilder builds an InvertedIndex from a DirectIndex. This class does the opposite, building a DirectIndex from an InvertedIndex.

    Algorithm:
    For a selection of document ids -(Scan the inverted index looking for postings with these document ids) -For each term in the inverted index --Select required postings from all the postings of that term --Add these to posting objects that represents each document -For each posting object --Write out the postings for that document

    Notes:
    This algorithm assumes that termids start at 0 and are strictly increasing. This assumption holds true only for inverted indices generated by the single pass indexing method.

    Properties:

    • inverted2direct.processtokens - total number of tokens to attempt each iteration. Defaults to 100000000. Memory usage would more likely be linked to the number of pointers, however as the document index does not contain the number of unique terms in each document, the pointers calculation is impossible to make.
    Since:
    2.0
    Author:
    Craig Macdonald
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void createDirectIndex()
      create the direct index when the collection contains an existing inverted index
      protected org.terrier.structures.indexing.singlepass.PostingInRun getPostingReader()
      returns the SPIR implementation that should be used for reading the postings written earlier
      protected org.terrier.structures.indexing.singlepass.Posting[] getPostings​(int count)
      get an array of posting object of the specified size.
      static void main​(java.lang.String[] args)  
      protected static int scanDocumentIndexForTokens​(long _processTokens, java.util.Iterator<DocumentIndexEntry> docidStream)
      Iterates through the document index, until it has reached the given number of terms
      protected long traverseInvertedFile​(PostingIndexInputStream iiis, int firstDocid, int countDocuments, org.terrier.structures.indexing.singlepass.Posting[] directPostings)
      traverse the inverted file, looking for all occurrences of documents in the given range
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • logger

        protected static final org.slf4j.Logger logger
        The logger used
      • index

        protected IndexOnDisk index
        index currently being used
      • fieldCount

        protected final int fieldCount
        The number of different fields that are used for indexing field information.
      • saveTagInformation

        protected final boolean saveTagInformation
        Indicates whether field information is used.
      • directIndexClass

        protected java.lang.String directIndexClass
        Class to read the generated direct index
      • directIndexInputStreamClass

        protected java.lang.String directIndexInputStreamClass
        Class to read the generated inverted index
      • basicDirectIndexPostingIteratorClass

        protected java.lang.String basicDirectIndexPostingIteratorClass
      • fieldDirectIndexPostingIteratorClass

        protected java.lang.String fieldDirectIndexPostingIteratorClass
      • processTokens

        protected long processTokens
        number of tokens limit per iteration
      • sourceStructure

        protected java.lang.String sourceStructure
      • destinationStructure

        protected java.lang.String destinationStructure
    • Constructor Detail

      • Inverted2DirectIndexBuilder

        public Inverted2DirectIndexBuilder​(IndexOnDisk i)
        Construct a new instance of this builder class
    • Method Detail

      • createDirectIndex

        public void createDirectIndex()
        create the direct index when the collection contains an existing inverted index
      • getPostings

        protected org.terrier.structures.indexing.singlepass.Posting[] getPostings​(int count)
        get an array of posting object of the specified size. These will be used to hold the postings for a range of documents
      • getPostingReader

        protected org.terrier.structures.indexing.singlepass.PostingInRun getPostingReader()
        returns the SPIR implementation that should be used for reading the postings written earlier
      • traverseInvertedFile

        protected long traverseInvertedFile​(PostingIndexInputStream iiis,
                                            int firstDocid,
                                            int countDocuments,
                                            org.terrier.structures.indexing.singlepass.Posting[] directPostings)
                                     throws java.io.IOException
        traverse the inverted file, looking for all occurrences of documents in the given range
        Returns:
        the number of tokens found in all of the document.
        Throws:
        java.io.IOException
      • scanDocumentIndexForTokens

        protected static int scanDocumentIndexForTokens​(long _processTokens,
                                                        java.util.Iterator<DocumentIndexEntry> docidStream)
                                                 throws java.io.IOException
        Iterates through the document index, until it has reached the given number of terms
        Parameters:
        _processTokens - Number of tokens to stop reading the documentindex after
        docidStream - the document index stream to read
        Returns:
        the number of documents to process
        Throws:
        java.io.IOException
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Throws:
        java.lang.Exception