|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.structures.merging.StructureMerger
public class StructureMerger
This class merges the structures created by Terrier, so that we use fewer and larger inverted and direct files.
Properties:<ul>
Field Summary | |
---|---|
protected java.lang.String |
basicDirectIndexPostingIteratorClass
|
protected java.lang.String |
basicInvertedIndexPostingIteratorClass
|
protected Index |
destIndex
destination index |
protected java.lang.String |
directFileInputClass
class to use to read the direct file |
protected java.lang.String |
directFileInputStreamClass
class to use to read the direct file as a stream |
protected java.lang.Class<? extends DirectInvertedOutputStream> |
directFileOutputStreamClass
class to use to write direct file |
protected java.lang.Class<? extends DirectInvertedOutputStream> |
fieldDirectFileOutputStreamClass
|
protected java.lang.String |
fieldDirectIndexPostingIteratorClass
|
protected java.lang.Class<? extends DirectInvertedOutputStream> |
fieldInvertedFileOutputStreamClass
class to use to write inverted file |
protected java.lang.String |
fieldInvertedIndexPostingIteratorClass
|
protected java.lang.String |
invertedFileInputClass
class to use to read the inverted file |
protected java.lang.String |
invertedFileInputStreamClass
class to use to read the inverted file as a stream |
protected java.lang.Class<? extends DirectInvertedOutputStream> |
invertedFileOutputStreamClass
class to use to write inverted file |
protected boolean |
keepTermCodeMap
|
protected static org.apache.log4j.Logger |
logger
the logger used |
protected boolean |
MetaReverse
|
protected int |
numberOfDocuments
The number of documents in the merged structures. |
protected long |
numberOfPointers
The number of pointers in the merged structures. |
protected int |
numberOfTerms
The number of terms in the collection. |
protected Index |
srcIndex1
source index 1 |
protected Index |
srcIndex2
source index 2 |
protected gnu.trove.TIntIntHashMap |
termcodeHashmap
A hashmap for converting the codes of terms appearing only in the vocabulary of the second set of data structures into a new set of term codes for the merged set of data structures. |
Constructor Summary | |
---|---|
StructureMerger(Index _srcIndex1,
Index _srcIndex2,
Index _destIndex)
constructor |
Method Summary | |
---|---|
protected void |
createLexidFile()
creates the final term code to offset file, and the lexicon hash if enabled. |
protected static java.lang.Class<?>[] |
getInterfaces(java.lang.Object o)
|
static void |
main(java.lang.String[] args)
Usage: java org.terrier.structures.merging.StructureMerger [binary bits] [inverted file 1] [inverted file 2] [output inverted file] |
protected void |
mergeDirectFiles()
Merges the two direct files and the corresponding document id files. |
protected void |
mergeDocumentIndexFiles()
Merges the two document index files, and the meta files. |
protected void |
mergeInvertedFiles()
Merges the two lexicons into one. |
void |
mergeStructures()
Merges the structures created by terrier. |
void |
setOutputIndex(Index _outputIndex)
Sets the output index. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final org.apache.log4j.Logger logger
protected gnu.trove.TIntIntHashMap termcodeHashmap
protected boolean keepTermCodeMap
protected int numberOfDocuments
protected long numberOfPointers
protected int numberOfTerms
protected boolean MetaReverse
protected Index srcIndex1
protected Index srcIndex2
protected Index destIndex
protected java.lang.Class<? extends DirectInvertedOutputStream> directFileOutputStreamClass
protected java.lang.Class<? extends DirectInvertedOutputStream> fieldDirectFileOutputStreamClass
protected java.lang.Class<? extends DirectInvertedOutputStream> invertedFileOutputStreamClass
protected java.lang.Class<? extends DirectInvertedOutputStream> fieldInvertedFileOutputStreamClass
protected java.lang.String directFileInputClass
protected java.lang.String directFileInputStreamClass
protected java.lang.String invertedFileInputClass
protected java.lang.String invertedFileInputStreamClass
protected java.lang.String basicInvertedIndexPostingIteratorClass
protected java.lang.String fieldInvertedIndexPostingIteratorClass
protected java.lang.String basicDirectIndexPostingIteratorClass
protected java.lang.String fieldDirectIndexPostingIteratorClass
Constructor Detail |
---|
public StructureMerger(Index _srcIndex1, Index _srcIndex2, Index _destIndex)
_srcIndex1
- _srcIndex2
- _destIndex
- Method Detail |
---|
public void setOutputIndex(Index _outputIndex)
_outputIndex
- the index to be merged toprotected void mergeInvertedFiles()
protected void mergeDirectFiles()
protected static java.lang.Class<?>[] getInterfaces(java.lang.Object o)
protected void mergeDocumentIndexFiles()
protected void createLexidFile()
public void mergeStructures()
public static void main(java.lang.String[] args) throws java.lang.Exception
Binary bits concerns the number of fields in use in the index.
java.lang.Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |