Class StructureMerger

  • Direct Known Subclasses:
    BlockStructureMerger

    public class StructureMerger
    extends java.lang.Object
    This class merges the structures created by Terrier, so that we use fewer and larger inverted and direct files.

    Properties:

    • lexicon.use.hash - build a lexicon hash file for new index. Set to true by default.
    • merge.direct - merge the direct indices if both indices have them. Set to true by default.
    Author:
    Vassilis Plachouras and Craig Macdonald
    • Field Detail

      • logger

        protected static final org.slf4j.Logger logger
        the logger used
      • termcodeHashmap

        protected gnu.trove.TIntIntHashMap termcodeHashmap
        A hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures.
      • keepTermCodeMap

        protected boolean keepTermCodeMap
      • numberOfDocuments

        protected int numberOfDocuments
        The number of documents in the merged structures.
      • numberOfPointers

        protected long numberOfPointers
        The number of pointers in the merged structures.
      • numberOfTerms

        protected int numberOfTerms
        The number of terms in the collection.
      • MetaReverse

        protected boolean MetaReverse
      • srcIndex1

        protected IndexOnDisk srcIndex1
        source index 1
      • srcIndex2

        protected IndexOnDisk srcIndex2
        source index 2
      • destIndex

        protected IndexOnDisk destIndex
        destination index
      • directFileOutputStreamClass

        protected java.lang.Class<? extends DirectInvertedOutputStream> directFileOutputStreamClass
        class to use to write direct file
      • fieldDirectFileOutputStreamClass

        protected java.lang.Class<? extends DirectInvertedOutputStream> fieldDirectFileOutputStreamClass
      • fieldCount

        protected final int fieldCount
      • basicInvertedIndexPostingIteratorClass

        protected java.lang.String basicInvertedIndexPostingIteratorClass
      • fieldInvertedIndexPostingIteratorClass

        protected java.lang.String fieldInvertedIndexPostingIteratorClass
      • basicDirectIndexPostingIteratorClass

        protected java.lang.String basicDirectIndexPostingIteratorClass
      • fieldDirectIndexPostingIteratorClass

        protected java.lang.String fieldDirectIndexPostingIteratorClass
    • Constructor Detail

      • StructureMerger

        public StructureMerger​(IndexOnDisk _srcIndex1,
                               IndexOnDisk _srcIndex2,
                               IndexOnDisk _destIndex)
        constructor
        Parameters:
        _srcIndex1 -
        _srcIndex2 -
        _destIndex -
    • Method Detail

      • setOutputIndex

        public void setOutputIndex​(IndexOnDisk _outputIndex)
        Sets the output index. This index should have no documents
        Parameters:
        _outputIndex - the index to be merged to
      • mergeInvertedFiles

        protected void mergeInvertedFiles()
        Merges the two lexicons into one. After this stage, the offsets in the lexicon are ot correct. They will be updated only after creating the inverted file.
      • mergeDirectFiles

        protected void mergeDirectFiles()
        Merges the two direct files and the corresponding document id files.
      • getInterfaces

        protected static java.lang.Class<?>[] getInterfaces​(java.lang.Object o)
      • mergeDocumentIndexFiles

        protected void mergeDocumentIndexFiles()
        Merges the two document index files, and the meta files.
      • createLexidFile

        protected void createLexidFile()
        creates the final term code to offset file, and the lexicon hash if enabled.
      • setReverseMeta

        public void setReverseMeta​(boolean value)
      • mergeStructures

        public void mergeStructures()
        Merges the structures created by terrier.
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Throws:
        java.lang.Exception