Class ExtensibleSinglePassIndexer


  • public abstract class ExtensibleSinglePassIndexer
    extends BasicSinglePassIndexer
    Directly based on BasicSinglePassIndexer, with just a few modifications to enable some extra hooks.
    Author:
    Roi Blanco, Jonathon Hare [jsh2{a.}ecs.soton.ac.uk]
    • Constructor Detail

      • ExtensibleSinglePassIndexer

        public ExtensibleSinglePassIndexer​(java.lang.String pathname,
                                           java.lang.String prefix)
        Default constructor
        Parameters:
        pathname - String the path where the datastructures will be created. This is assumed to be absolute.
        prefix - String the prefix of the index, usually "data".
    • Method Detail

      • getEndOfPipeline

        protected abstract TermPipeline getEndOfPipeline()
        Returns the end of the term pipeline, which corresponds to an instance of either BasicIndexer.BasicTermProcessor, or BasicIndexer.FieldTermProcessor, depending on whether field information is stored.
        Overrides:
        getEndOfPipeline in class BasicIndexer
        Returns:
        TermPipeline the end of the term pipeline.
      • getPostingInRunClass

        protected abstract java.lang.Class<? extends org.terrier.structures.indexing.singlepass.PostingInRun> getPostingInRunClass()
        Get the class for storing postings in runs.
        Returns:
        PostingInRun Subclass of PostingInRun for this indexer
      • createRunMerger

        protected void createRunMerger​(java.lang.String[][] files)
                                throws java.lang.Exception
        Hook method that creates a RunsMerger instance
        Overrides:
        createRunMerger in class BasicSinglePassIndexer
        Throws:
        java.io.IOException - if an I/O error occurs.
        java.lang.Exception
      • createDocumentPostings

        protected abstract void createDocumentPostings()
        Hook method that creates the right type of DocumentTree class.
        Overrides:
        createDocumentPostings in class BasicIndexer
      • createInvertedIndex

        public void createInvertedIndex​(Collection[] collections)
        Builds the inverted file and lexicon file for the given collections Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (e.g. stemming, stopping, lowercase, etc.). Only one thing is modified from BasicSinglePassIndexer - I've added a pre-processing operation before each term is passed to the pipeline
        Overrides:
        createInvertedIndex in class BasicSinglePassIndexer
        Parameters:
        collections - Collection[] the collections to be indexed.
      • preProcess

        protected abstract void preProcess​(Document doc,
                                           java.lang.String term)
        Perform an operation before the term pipeline is initiated. This could for example extract data and store in a field that the pipeline could access
        Parameters:
        doc - Current document
        term - Current term
      • getCurrentIndex

        public Index getCurrentIndex()
        Get the index currently being constructed by this indexer. This might be null if indexing hasn't commenced yet. It is useful for adding extra properties, etc to the index after indexing is finished.
        Returns:
        the current index
      • setFlushDelegate

        protected void setFlushDelegate​(SinglePassIndexerFlushDelegate _flushDelegate)
        Set the flushDelegate
        Parameters:
        _flushDelegate -
      • forceFlush

        protected void forceFlush()
                           throws java.io.IOException
        Force the indexer to flush everything and free memory. Either calls the super method, or passes to a delegate if the flushDelegate is set.
        Overrides:
        forceFlush in class BasicSinglePassIndexer
        Throws:
        java.io.IOException
        See Also:
        BasicSinglePassIndexer.forceFlush()