|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.terrier.indexing.Indexer
public abstract class Indexer
Properties:
Field Summary | |
---|---|
protected java.lang.String |
basicDirectIndexPostingIteratorClass
|
protected java.util.HashSet<java.lang.String> |
BUILDER_BOUNDARY_DOCUMENTS
The DOCNO of documents to force builder boundaries |
protected Index |
currentIndex
The index being worked on, denoted by path and prefix |
protected DirectInvertedOutputStream |
directIndexBuilder
The builder that creates the direct index. |
protected DocumentIndexBuilder |
docIndexBuilder
The builder that creates the document index. |
protected DocumentIndexEntry |
emptyDocIndexEntry
|
protected java.lang.String |
fieldDirectIndexPostingIteratorClass
|
protected gnu.trove.TObjectIntHashMap<java.lang.String> |
fieldNames
mapping: field name -> field id, returns 0 for no mapping |
protected java.lang.String |
fileNameNoExtension
The common prefix of the data structures filenames. |
protected boolean |
IndexEmptyDocuments
Indicates whether an entry for empty documents is stored in the document index, or empty documents should be ignored. |
protected InvertedIndexBuilder |
invertedIndexBuilder
The builder that creates the inverted index. |
protected LexiconBuilder |
lexiconBuilder
The builder that creates the lexicon. |
protected static org.apache.log4j.Logger |
logger
the logger for this class |
protected int |
MAX_DOCS_PER_BUILDER
The number of documents indexed with a set of builders. |
protected int |
MAX_TOKENS_IN_DOCUMENT
The maximum number of tokens in a document. |
protected MetaIndexBuilder |
metaBuilder
|
protected int |
numFields
the number of fields |
protected java.lang.String |
path
The path in which the data structures are stored. |
protected TermPipeline |
pipeline_first
The first component of the term pipeline. |
protected java.lang.String |
prefix
The prefix of the data structures, ie the first part of the filename |
protected boolean |
useFieldInformation
Indicates whether field information should be saved in the created data structures. |
Constructor Summary | |
---|---|
|
Indexer()
Creates an indexer at the location ApplicationSetup.TERRIER_INDEX_PATH and ApplicationSetup.TERRIER_INDEX_PREFIX |
protected |
Indexer(long a,
long b,
long c)
Protected do-nothing constructor for use by child classes |
|
Indexer(java.lang.String _path,
java.lang.String _prefix)
Creates an instance of the class. |
Method Summary | |
---|---|
abstract void |
createDirectIndex(Collection[] collections)
An abstract method for creating the direct index, the document index and the lexicon for the given collections. |
abstract void |
createInvertedIndex()
An abstract method for creating the inverted index, given that the the direct index, the document index and the lexicon have already been created. |
protected MetaIndexBuilder |
createMetaIndexBuilder()
|
protected void |
finishedDirectIndexBuild()
event method to be overridden by child classes |
protected void |
finishedInvertedIndexBuild()
event method to be overridden by child classes |
protected abstract TermPipeline |
getEndOfPipeline()
An abstract method that returns the last component of the term pipeline. |
void |
index(Collection[] collections)
Creates the data structures for a set of collections. |
protected void |
indexEmpty(java.util.Map<java.lang.String,java.lang.String> docProperties)
Adds an entry to document index for empty document @param docid, only if IndexEmptyDocuments is set to true. |
protected void |
init()
This method must be called by anything which directly extends Indexer. |
protected void |
load_builder_boundary_documents()
Loads the builder boundary documents from the property indexing.builder.boundary.docnos, comma delimited. |
protected void |
load_field_ids()
loads a mapping of field name -> field id |
protected void |
load_indexer_properties()
|
protected void |
load_pipeline()
Creates the term pipeline, as specified by the property termpipelines in the properties file. |
static void |
main(java.lang.String[] args)
Utility method for merging indices |
static void |
merge(java.lang.String mpath,
java.lang.String mprefix,
int lowest,
int highest)
Merge a series of numbered indices in the same path/prefix area. |
static void |
merge(java.lang.String mpath,
java.lang.String mprefix,
java.util.LinkedList<java.lang.String[]> llist,
int counterMerged)
Merge a series of indices, in pair-wise fashion |
protected static void |
mergeTwoIndices(java.lang.String[] index1,
java.lang.String[] index2,
java.lang.String[] outputIndex)
Merge two indices. |
protected static int[] |
parseInts(java.lang.String[] in)
|
boolean |
useFieldInformation()
Returns the is the index will record fields |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final org.apache.log4j.Logger logger
protected int MAX_DOCS_PER_BUILDER
protected int MAX_TOKENS_IN_DOCUMENT
protected final java.util.HashSet<java.lang.String> BUILDER_BOUNDARY_DOCUMENTS
protected boolean useFieldInformation
protected TermPipeline pipeline_first
protected boolean IndexEmptyDocuments
protected DirectInvertedOutputStream directIndexBuilder
protected DocumentIndexBuilder docIndexBuilder
protected InvertedIndexBuilder invertedIndexBuilder
protected LexiconBuilder lexiconBuilder
protected MetaIndexBuilder metaBuilder
protected java.lang.String fileNameNoExtension
protected java.lang.String path
protected java.lang.String prefix
protected Index currentIndex
protected java.lang.String basicDirectIndexPostingIteratorClass
protected java.lang.String fieldDirectIndexPostingIteratorClass
protected gnu.trove.TObjectIntHashMap<java.lang.String> fieldNames
protected int numFields
protected DocumentIndexEntry emptyDocIndexEntry
Constructor Detail |
---|
public Indexer()
public Indexer(java.lang.String _path, java.lang.String _prefix)
_path
- String the path where the generated data structures will be saved._prefix
- String the filename that the data structures will have.protected Indexer(long a, long b, long c)
Method Detail |
---|
protected void init()
public abstract void createDirectIndex(Collection[] collections)
collections
- Collection[] An array of collections to indexpublic abstract void createInvertedIndex()
protected abstract TermPipeline getEndOfPipeline()
protected MetaIndexBuilder createMetaIndexBuilder()
protected static final int[] parseInts(java.lang.String[] in)
protected void load_indexer_properties()
protected void load_field_ids()
protected void load_pipeline()
protected void load_builder_boundary_documents()
public void index(Collection[] collections)
collections
- The document collection objects to index.public static void merge(java.lang.String mpath, java.lang.String mprefix, int lowest, int highest)
mpath
- Path of all indicesmprefix
- Common prefix of all indiceslowest
- lowest subfix of prefixhighest
- highest subfix of prefixprotected static void mergeTwoIndices(java.lang.String[] index1, java.lang.String[] index2, java.lang.String[] outputIndex)
index1
- Path/Prefix of source index 1index2
- Path/Prefix of source index 2outputIndex
- Path/Prefix of destination indexpublic static void merge(java.lang.String mpath, java.lang.String mprefix, java.util.LinkedList<java.lang.String[]> llist, int counterMerged)
mpath
- Common path of all indicesmprefix
- Prefix of target indexcounterMerged
- - number of indices to mergeprotected void finishedDirectIndexBuild()
protected void finishedInvertedIndexBuild()
public boolean useFieldInformation()
protected void indexEmpty(java.util.Map<java.lang.String,java.lang.String> docProperties) throws java.io.IOException
java.io.IOException
public static void main(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |