org.terrier.structures.merging
Class StructureMerger

java.lang.Object
  extended by org.terrier.structures.merging.StructureMerger
Direct Known Subclasses:
BlockStructureMerger

public class StructureMerger
extends java.lang.Object

This class merges the structures created by Terrier, so that we use fewer and larger inverted and direct files.

Properties:<ul>

  • lexicon.use.hash - build a lexicon hash file for new index. Set to true by default.
  • merge.direct - merge the direct indices if both indices have them. Set to true by default.
  • Author:
    Vassilis Plachouras and Craig Macdonald

    Field Summary
    protected  java.lang.String basicDirectIndexPostingIteratorClass
               
    protected  java.lang.String basicInvertedIndexPostingIteratorClass
               
    protected  Index destIndex
              destination index
    protected  java.lang.String directFileInputClass
              class to use to read the direct file
    protected  java.lang.String directFileInputStreamClass
              class to use to read the direct file as a stream
    protected  java.lang.Class<? extends DirectInvertedOutputStream> directFileOutputStreamClass
              class to use to write direct file
    protected  java.lang.Class<? extends DirectInvertedOutputStream> fieldDirectFileOutputStreamClass
               
    protected  java.lang.String fieldDirectIndexPostingIteratorClass
               
    protected  java.lang.Class<? extends DirectInvertedOutputStream> fieldInvertedFileOutputStreamClass
              class to use to write inverted file
    protected  java.lang.String fieldInvertedIndexPostingIteratorClass
               
    protected  java.lang.String invertedFileInputClass
              class to use to read the inverted file
    protected  java.lang.String invertedFileInputStreamClass
              class to use to read the inverted file as a stream
    protected  java.lang.Class<? extends DirectInvertedOutputStream> invertedFileOutputStreamClass
              class to use to write inverted file
    protected  boolean keepTermCodeMap
               
    protected static org.apache.log4j.Logger logger
              the logger used
    protected  boolean MetaReverse
               
    protected  int numberOfDocuments
              The number of documents in the merged structures.
    protected  long numberOfPointers
              The number of pointers in the merged structures.
    protected  int numberOfTerms
              The number of terms in the collection.
    protected  Index srcIndex1
              source index 1
    protected  Index srcIndex2
              source index 2
    protected  gnu.trove.TIntIntHashMap termcodeHashmap
              A hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures.
     
    Constructor Summary
    StructureMerger(Index _srcIndex1, Index _srcIndex2, Index _destIndex)
              constructor
     
    Method Summary
    protected  void createLexidFile()
              creates the final term code to offset file, and the lexicon hash if enabled.
    protected static java.lang.Class<?>[] getInterfaces(java.lang.Object o)
               
    static void main(java.lang.String[] args)
              Usage: java org.terrier.structures.merging.StructureMerger [binary bits] [inverted file 1] [inverted file 2] [output inverted file]
    protected  void mergeDirectFiles()
              Merges the two direct files and the corresponding document id files.
    protected  void mergeDocumentIndexFiles()
              Merges the two document index files, and the meta files.
    protected  void mergeInvertedFiles()
              Merges the two lexicons into one.
     void mergeStructures()
              Merges the structures created by terrier.
     void setOutputIndex(Index _outputIndex)
              Sets the output index.
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    logger

    protected static final org.apache.log4j.Logger logger
    the logger used


    termcodeHashmap

    protected gnu.trove.TIntIntHashMap termcodeHashmap
    A hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures.


    keepTermCodeMap

    protected boolean keepTermCodeMap

    numberOfDocuments

    protected int numberOfDocuments
    The number of documents in the merged structures.


    numberOfPointers

    protected long numberOfPointers
    The number of pointers in the merged structures.


    numberOfTerms

    protected int numberOfTerms
    The number of terms in the collection.


    MetaReverse

    protected boolean MetaReverse

    srcIndex1

    protected Index srcIndex1
    source index 1


    srcIndex2

    protected Index srcIndex2
    source index 2


    destIndex

    protected Index destIndex
    destination index


    directFileOutputStreamClass

    protected java.lang.Class<? extends DirectInvertedOutputStream> directFileOutputStreamClass
    class to use to write direct file


    fieldDirectFileOutputStreamClass

    protected java.lang.Class<? extends DirectInvertedOutputStream> fieldDirectFileOutputStreamClass

    invertedFileOutputStreamClass

    protected java.lang.Class<? extends DirectInvertedOutputStream> invertedFileOutputStreamClass
    class to use to write inverted file


    fieldInvertedFileOutputStreamClass

    protected java.lang.Class<? extends DirectInvertedOutputStream> fieldInvertedFileOutputStreamClass
    class to use to write inverted file


    directFileInputClass

    protected java.lang.String directFileInputClass
    class to use to read the direct file


    directFileInputStreamClass

    protected java.lang.String directFileInputStreamClass
    class to use to read the direct file as a stream


    invertedFileInputClass

    protected java.lang.String invertedFileInputClass
    class to use to read the inverted file


    invertedFileInputStreamClass

    protected java.lang.String invertedFileInputStreamClass
    class to use to read the inverted file as a stream


    basicInvertedIndexPostingIteratorClass

    protected java.lang.String basicInvertedIndexPostingIteratorClass

    fieldInvertedIndexPostingIteratorClass

    protected java.lang.String fieldInvertedIndexPostingIteratorClass

    basicDirectIndexPostingIteratorClass

    protected java.lang.String basicDirectIndexPostingIteratorClass

    fieldDirectIndexPostingIteratorClass

    protected java.lang.String fieldDirectIndexPostingIteratorClass
    Constructor Detail

    StructureMerger

    public StructureMerger(Index _srcIndex1,
                           Index _srcIndex2,
                           Index _destIndex)
    constructor

    Parameters:
    _srcIndex1 -
    _srcIndex2 -
    _destIndex -
    Method Detail

    setOutputIndex

    public void setOutputIndex(Index _outputIndex)
    Sets the output index. This index should have no documents

    Parameters:
    _outputIndex - the index to be merged to

    mergeInvertedFiles

    protected void mergeInvertedFiles()
    Merges the two lexicons into one. After this stage, the offsets in the lexicon are ot correct. They will be updated only after creating the inverted file.


    mergeDirectFiles

    protected void mergeDirectFiles()
    Merges the two direct files and the corresponding document id files.


    getInterfaces

    protected static java.lang.Class<?>[] getInterfaces(java.lang.Object o)

    mergeDocumentIndexFiles

    protected void mergeDocumentIndexFiles()
    Merges the two document index files, and the meta files.


    createLexidFile

    protected void createLexidFile()
    creates the final term code to offset file, and the lexicon hash if enabled.


    mergeStructures

    public void mergeStructures()
    Merges the structures created by terrier.


    main

    public static void main(java.lang.String[] args)
                     throws java.lang.Exception
    Usage: java org.terrier.structures.merging.StructureMerger [binary bits] [inverted file 1] [inverted file 2] [output inverted file]

    Binary bits concerns the number of fields in use in the index.

    Throws:
    java.lang.Exception


    Terrier 3.5. Copyright © 2004-2011 University of Glasgow