Package org.terrier.structures.merging
Class StructureMerger
- java.lang.Object
-
- org.terrier.structures.merging.StructureMerger
-
- Direct Known Subclasses:
BlockStructureMerger
public class StructureMerger extends java.lang.ObjectThis class merges the structures created by Terrier, so that we use fewer and larger inverted and direct files.Properties:
- lexicon.use.hash - build a lexicon hash file for new index. Set to true by default.
- merge.direct - merge the direct indices if both indices have them. Set to true by default.
- Author:
- Vassilis Plachouras and Craig Macdonald
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classStructureMerger.Commandstatic classStructureMerger.NullDocumentIndex
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.StringbasicDirectIndexPostingIteratorClassprotected java.lang.StringbasicInvertedIndexPostingIteratorClassprotected CompressionFactory.CompressionConfigurationcompressionDirectConfigprotected CompressionFactory.CompressionConfigurationcompressionInvertedConfigprotected IndexOnDiskdestIndexdestination indexprotected java.lang.Class<? extends DirectInvertedOutputStream>directFileOutputStreamClassclass to use to write direct fileprotected intfieldCountprotected java.lang.Class<? extends DirectInvertedOutputStream>fieldDirectFileOutputStreamClassprotected java.lang.StringfieldDirectIndexPostingIteratorClassprotected java.lang.StringfieldInvertedIndexPostingIteratorClassprotected booleankeepTermCodeMapprotected static org.slf4j.Loggerloggerthe logger usedprotected booleanMetaReverseprotected intnumberOfDocumentsThe number of documents in the merged structures.protected longnumberOfPointersThe number of pointers in the merged structures.protected intnumberOfTermsThe number of terms in the collection.protected IndexOnDisksrcIndex1source index 1protected IndexOnDisksrcIndex2source index 2protected gnu.trove.TIntIntHashMaptermcodeHashmapA hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures.
-
Constructor Summary
Constructors Constructor Description StructureMerger(IndexOnDisk _srcIndex1, IndexOnDisk _srcIndex2, IndexOnDisk _destIndex)constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcreateLexidFile()creates the final term code to offset file, and the lexicon hash if enabled.protected static java.lang.Class<?>[]getInterfaces(java.lang.Object o)static voidmain(java.lang.String[] args)protected voidmergeDirectFiles()Merges the two direct files and the corresponding document id files.protected voidmergeDocumentIndexFiles()Merges the two document index files, and the meta files.protected voidmergeInvertedFiles()Merges the two lexicons into one.voidmergeStructures()Merges the structures created by terrier.voidsetOutputIndex(IndexOnDisk _outputIndex)Sets the output index.voidsetReverseMeta(boolean value)
-
-
-
Field Detail
-
logger
protected static final org.slf4j.Logger logger
the logger used
-
termcodeHashmap
protected gnu.trove.TIntIntHashMap termcodeHashmap
A hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures.
-
keepTermCodeMap
protected boolean keepTermCodeMap
-
numberOfDocuments
protected int numberOfDocuments
The number of documents in the merged structures.
-
numberOfPointers
protected long numberOfPointers
The number of pointers in the merged structures.
-
numberOfTerms
protected int numberOfTerms
The number of terms in the collection.
-
compressionDirectConfig
protected CompressionFactory.CompressionConfiguration compressionDirectConfig
-
compressionInvertedConfig
protected CompressionFactory.CompressionConfiguration compressionInvertedConfig
-
MetaReverse
protected boolean MetaReverse
-
srcIndex1
protected IndexOnDisk srcIndex1
source index 1
-
srcIndex2
protected IndexOnDisk srcIndex2
source index 2
-
destIndex
protected IndexOnDisk destIndex
destination index
-
directFileOutputStreamClass
protected java.lang.Class<? extends DirectInvertedOutputStream> directFileOutputStreamClass
class to use to write direct file
-
fieldDirectFileOutputStreamClass
protected java.lang.Class<? extends DirectInvertedOutputStream> fieldDirectFileOutputStreamClass
-
fieldCount
protected final int fieldCount
-
basicInvertedIndexPostingIteratorClass
protected java.lang.String basicInvertedIndexPostingIteratorClass
-
fieldInvertedIndexPostingIteratorClass
protected java.lang.String fieldInvertedIndexPostingIteratorClass
-
basicDirectIndexPostingIteratorClass
protected java.lang.String basicDirectIndexPostingIteratorClass
-
fieldDirectIndexPostingIteratorClass
protected java.lang.String fieldDirectIndexPostingIteratorClass
-
-
Constructor Detail
-
StructureMerger
public StructureMerger(IndexOnDisk _srcIndex1, IndexOnDisk _srcIndex2, IndexOnDisk _destIndex)
constructor- Parameters:
_srcIndex1-_srcIndex2-_destIndex-
-
-
Method Detail
-
setOutputIndex
public void setOutputIndex(IndexOnDisk _outputIndex)
Sets the output index. This index should have no documents- Parameters:
_outputIndex- the index to be merged to
-
mergeInvertedFiles
protected void mergeInvertedFiles()
Merges the two lexicons into one. After this stage, the offsets in the lexicon are ot correct. They will be updated only after creating the inverted file.
-
mergeDirectFiles
protected void mergeDirectFiles()
Merges the two direct files and the corresponding document id files.
-
getInterfaces
protected static java.lang.Class<?>[] getInterfaces(java.lang.Object o)
-
mergeDocumentIndexFiles
protected void mergeDocumentIndexFiles()
Merges the two document index files, and the meta files.
-
createLexidFile
protected void createLexidFile()
creates the final term code to offset file, and the lexicon hash if enabled.
-
setReverseMeta
public void setReverseMeta(boolean value)
-
mergeStructures
public void mergeStructures()
Merges the structures created by terrier.
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception- Throws:
java.lang.Exception
-
-