Package org.terrier.structures.merging
Class StructureMerger
- java.lang.Object
-
- org.terrier.structures.merging.StructureMerger
-
- Direct Known Subclasses:
BlockStructureMerger
public class StructureMerger extends java.lang.Object
This class merges the structures created by Terrier, so that we use fewer and larger inverted and direct files.Properties:
- lexicon.use.hash - build a lexicon hash file for new index. Set to true by default.
- merge.direct - merge the direct indices if both indices have them. Set to true by default.
- Author:
- Vassilis Plachouras and Craig Macdonald
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
StructureMerger.Command
static class
StructureMerger.NullDocumentIndex
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.String
basicDirectIndexPostingIteratorClass
protected java.lang.String
basicInvertedIndexPostingIteratorClass
protected CompressionFactory.CompressionConfiguration
compressionDirectConfig
protected CompressionFactory.CompressionConfiguration
compressionInvertedConfig
protected IndexOnDisk
destIndex
destination indexprotected java.lang.Class<? extends DirectInvertedOutputStream>
directFileOutputStreamClass
class to use to write direct fileprotected int
fieldCount
protected java.lang.Class<? extends DirectInvertedOutputStream>
fieldDirectFileOutputStreamClass
protected java.lang.String
fieldDirectIndexPostingIteratorClass
protected java.lang.String
fieldInvertedIndexPostingIteratorClass
protected boolean
keepTermCodeMap
protected static org.slf4j.Logger
logger
the logger usedprotected boolean
MetaReverse
protected int
numberOfDocuments
The number of documents in the merged structures.protected long
numberOfPointers
The number of pointers in the merged structures.protected int
numberOfTerms
The number of terms in the collection.protected IndexOnDisk
srcIndex1
source index 1protected IndexOnDisk
srcIndex2
source index 2protected gnu.trove.TIntIntHashMap
termcodeHashmap
A hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures.
-
Constructor Summary
Constructors Constructor Description StructureMerger(IndexOnDisk _srcIndex1, IndexOnDisk _srcIndex2, IndexOnDisk _destIndex)
constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
createLexidFile()
creates the final term code to offset file, and the lexicon hash if enabled.protected static java.lang.Class<?>[]
getInterfaces(java.lang.Object o)
static void
main(java.lang.String[] args)
protected void
mergeDirectFiles()
Merges the two direct files and the corresponding document id files.protected void
mergeDocumentIndexFiles()
Merges the two document index files, and the meta files.protected void
mergeInvertedFiles()
Merges the two lexicons into one.void
mergeStructures()
Merges the structures created by terrier.void
setOutputIndex(IndexOnDisk _outputIndex)
Sets the output index.void
setReverseMeta(boolean value)
-
-
-
Field Detail
-
logger
protected static final org.slf4j.Logger logger
the logger used
-
termcodeHashmap
protected gnu.trove.TIntIntHashMap termcodeHashmap
A hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures.
-
keepTermCodeMap
protected boolean keepTermCodeMap
-
numberOfDocuments
protected int numberOfDocuments
The number of documents in the merged structures.
-
numberOfPointers
protected long numberOfPointers
The number of pointers in the merged structures.
-
numberOfTerms
protected int numberOfTerms
The number of terms in the collection.
-
compressionDirectConfig
protected CompressionFactory.CompressionConfiguration compressionDirectConfig
-
compressionInvertedConfig
protected CompressionFactory.CompressionConfiguration compressionInvertedConfig
-
MetaReverse
protected boolean MetaReverse
-
srcIndex1
protected IndexOnDisk srcIndex1
source index 1
-
srcIndex2
protected IndexOnDisk srcIndex2
source index 2
-
destIndex
protected IndexOnDisk destIndex
destination index
-
directFileOutputStreamClass
protected java.lang.Class<? extends DirectInvertedOutputStream> directFileOutputStreamClass
class to use to write direct file
-
fieldDirectFileOutputStreamClass
protected java.lang.Class<? extends DirectInvertedOutputStream> fieldDirectFileOutputStreamClass
-
fieldCount
protected final int fieldCount
-
basicInvertedIndexPostingIteratorClass
protected java.lang.String basicInvertedIndexPostingIteratorClass
-
fieldInvertedIndexPostingIteratorClass
protected java.lang.String fieldInvertedIndexPostingIteratorClass
-
basicDirectIndexPostingIteratorClass
protected java.lang.String basicDirectIndexPostingIteratorClass
-
fieldDirectIndexPostingIteratorClass
protected java.lang.String fieldDirectIndexPostingIteratorClass
-
-
Constructor Detail
-
StructureMerger
public StructureMerger(IndexOnDisk _srcIndex1, IndexOnDisk _srcIndex2, IndexOnDisk _destIndex)
constructor- Parameters:
_srcIndex1
-_srcIndex2
-_destIndex
-
-
-
Method Detail
-
setOutputIndex
public void setOutputIndex(IndexOnDisk _outputIndex)
Sets the output index. This index should have no documents- Parameters:
_outputIndex
- the index to be merged to
-
mergeInvertedFiles
protected void mergeInvertedFiles()
Merges the two lexicons into one. After this stage, the offsets in the lexicon are ot correct. They will be updated only after creating the inverted file.
-
mergeDirectFiles
protected void mergeDirectFiles()
Merges the two direct files and the corresponding document id files.
-
getInterfaces
protected static java.lang.Class<?>[] getInterfaces(java.lang.Object o)
-
mergeDocumentIndexFiles
protected void mergeDocumentIndexFiles()
Merges the two document index files, and the meta files.
-
createLexidFile
protected void createLexidFile()
creates the final term code to offset file, and the lexicon hash if enabled.
-
setReverseMeta
public void setReverseMeta(boolean value)
-
mergeStructures
public void mergeStructures()
Merges the structures created by terrier.
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception
- Throws:
java.lang.Exception
-
-