Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-108

Some indexers do not set the IterablePosting class for the DirectIndex

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.5
    • Component/s: .structures
    • Labels:
      None

      Description

      org.terrier.structures.BitPostingIndex.getPostings(BitIndexPointer) is not able to instantiate the postingImplementation when fields are used. The error happens in line 105ff:

      {code}
      if (fieldCount > 0)
      rtr = postingImplementation
      .getConstructor(BitIn.class, Integer.TYPE, DocumentIndex.class, Integer.TYPE)
      .newInstance(file, pointer.getNumberOfEntries(), null, fieldCount);
      else
      rtr = postingImplementation
      .getConstructor(BitIn.class, Integer.TYPE, DocumentIndex.class)
      .newInstance(file, pointer.getNumberOfEntries(), null);
      {code}


      The reason is that per default this constructor is called and thus a BasicIterablePosting is used:

      {code}
      public DirectIndex(Index index, String structureName) throws IOException {
      super(index, structureName, BasicIterablePosting.class);
      docIndex = index.getDocumentIndex();
      }
      {code}

      The solution is to adjust the code as follows:

      {code}
      public DirectIndex(Index index, String structureName) throws IOException {
      super(index, structureName, index.getIntIndexProperty("index.direct.fields.count", 0) > 0 ?
      FieldIterablePosting.class : BasicIterablePosting.class);
      docIndex = index.getDocumentIndex();
      }
      {code}

      The same goes for org.terrier.structures.DirectIndexInputStream.DirectIndexInputStream(Index, String):

      {code}
      public DirectIndexInputStream(Index index, String structureName) throws IOException
      {
      super(index, structureName, (Iterator<DocumentIndexEntry>)index.getIndexStructureInputStream("document"),
      index.getIntIndexProperty("index.direct.fields.count", 0) > 0 ?
      FieldIterablePosting.class : BasicIterablePosting.class);
      }
      {code}

      I didn't check if this problem is also present in additional parts of the code.

        Attachments

          Activity

          rec Richard Eckart de Castilho created issue -
          Hide
          craigm Craig Macdonald added a comment -

          Can you tell me two things:

          (a) Which indexing method did you use? (classical, singlepass, MapReduce)

          (b) Can you paste me your index data.properties file (grep "index.direct" data.properties)

          Thanks for your detailed investigations.

          Show
          craigm Craig Macdonald added a comment - Can you tell me two things: (a) Which indexing method did you use? (classical, singlepass, MapReduce) (b) Can you paste me your index data.properties file (grep "index.direct" data.properties) Thanks for your detailed investigations.
          Hide
          rec Richard Eckart de Castilho added a comment -

          I used a custom indexer (IncrementalIndexer). It basically allows me to index a document term-by-term. Basically it is a a reordered, stateful BasicIndexer. I upgraded this to Terrier 3.0 now and think that I did not miss anything important. If you are interested, I would consider contributing it.

          Here are the properties:

          index.direct.fields.names=TOKEN,UTF8,NO_NAME_FIELD_2,NO_NAME_FIELD_3,NO_NAME_FIELD_4,NO_NAME_FIELD_5,NO_NAME_FIELD_6,NO_NAME_FIELD_7,NO_NAME_FIELD_8,NO_NAME_FIELD_9,NO_NAME_FIELD_10,NO_NAME_FIELD_11,NO_NAME_FIELD_12,NO_NAME_FIELD_13,NO_NAME_FIELD_14,NO_NAME_FIELD_15,NO_NAME_FIELD_16,NO_NAME_FIELD_17,NO_NAME_FIELD_18,NO_NAME_FIELD_19,NO_NAME_FIELD_20,NO_NAME_FIELD_21,NO_NAME_FIELD_22,NO_NAME_FIELD_23,NO_NAME_FIELD_24,NO_NAME_FIELD_25,NO_NAME_FIELD_26,NO_NAME_FIELD_27,NO_NAME_FIELD_28,NO_NAME_FIELD_29,NO_NAME_FIELD_30
          index.direct.class=org.terrier.structures.DirectIndex
          index.direct-inputstream.parameter_values=index,structureName,org.terrier.structures.postings.FieldIterablePosting
          index.document-factory.parameter_values=${index.direct.fields.count}
          index.direct.parameter_types=org.terrier.structures.Index,java.lang.String
          index.direct-inputstream.parameter_types=org.terrier.structures.Index,java.lang.String,java.lang.Class
          index.direct.fields.count=31
          index.lexicon-valuefactory.parameter_values=${index.direct.fields.count}
          index.direct-inputstream.class=org.terrier.structures.DirectIndexInputStream
          
          Show
          rec Richard Eckart de Castilho added a comment - I used a custom indexer (IncrementalIndexer). It basically allows me to index a document term-by-term. Basically it is a a reordered, stateful BasicIndexer. I upgraded this to Terrier 3.0 now and think that I did not miss anything important. If you are interested, I would consider contributing it. Here are the properties: index.direct.fields.names=TOKEN,UTF8,NO_NAME_FIELD_2,NO_NAME_FIELD_3,NO_NAME_FIELD_4,NO_NAME_FIELD_5,NO_NAME_FIELD_6,NO_NAME_FIELD_7,NO_NAME_FIELD_8,NO_NAME_FIELD_9,NO_NAME_FIELD_10,NO_NAME_FIELD_11,NO_NAME_FIELD_12,NO_NAME_FIELD_13,NO_NAME_FIELD_14,NO_NAME_FIELD_15,NO_NAME_FIELD_16,NO_NAME_FIELD_17,NO_NAME_FIELD_18,NO_NAME_FIELD_19,NO_NAME_FIELD_20,NO_NAME_FIELD_21,NO_NAME_FIELD_22,NO_NAME_FIELD_23,NO_NAME_FIELD_24,NO_NAME_FIELD_25,NO_NAME_FIELD_26,NO_NAME_FIELD_27,NO_NAME_FIELD_28,NO_NAME_FIELD_29,NO_NAME_FIELD_30 index.direct.class=org.terrier.structures.DirectIndex index.direct-inputstream.parameter_values=index,structureName,org.terrier.structures.postings.FieldIterablePosting index.document-factory.parameter_values=${index.direct.fields.count} index.direct.parameter_types=org.terrier.structures.Index,java.lang. String index.direct-inputstream.parameter_types=org.terrier.structures.Index,java.lang. String ,java.lang. Class index.direct.fields.count=31 index.lexicon-valuefactory.parameter_values=${index.direct.fields.count} index.direct-inputstream.class=org.terrier.structures.DirectIndexInputStream
          Hide
          craigm Craig Macdonald added a comment -

          Thanks.

          Can you update the following index properties?
          {{{
          index.direct.parameter_types=org.terrier.structures.Index,java.lang.String,java.lang.Class
          index.direct.parameter_values=index,structureName,org.terrier.structures.postings.FieldIterablePosting
          }}}

          I would rather change BasicIndexer to write the correct index properties than add more intelligence in the constructors.

          Your IncrementalIndexer sounds interesting. I'm currently refactoring the Indexer classes - document analysing is a distinct notion from structure writing, so I'm trying to suitably move things around.

          What I can't work out is why the unit tests don't identify this problem, as the correctness of the direct index is checked in ShakespeareEndToEndTest.java

          Show
          craigm Craig Macdonald added a comment - Thanks. Can you update the following index properties? {{{ index.direct.parameter_types=org.terrier.structures.Index,java.lang.String,java.lang.Class index.direct.parameter_values=index,structureName,org.terrier.structures.postings.FieldIterablePosting }}} I would rather change BasicIndexer to write the correct index properties than add more intelligence in the constructors. Your IncrementalIndexer sounds interesting. I'm currently refactoring the Indexer classes - document analysing is a distinct notion from structure writing, so I'm trying to suitably move things around. What I can't work out is why the unit tests don't identify this problem, as the correctness of the direct index is checked in ShakespeareEndToEndTest.java
          Hide
          craigm Craig Macdonald added a comment -

          Ok, I fixed BasicIndexer, BlockIndexer, StrutureMerger and Inverted2DirectIndexBuilder. BasicShakespeareEndToEndTest now catchs this.\

          Here is a patch. Can you make the same change to IncrementalIndexer and re-test? Many thanks

          Index: src/core/org/terrier/structures/merging/StructureMerger.java
          ===================================================================
          --- src/core/org/terrier/structures/merging/StructureMerger.java	(revision 2991)
          +++ src/core/org/terrier/structures/merging/StructureMerger.java	(working copy)
          @@ -487,8 +487,9 @@
           			destIndex.addIndexStructure(
           					"direct", 
           					"org.terrier.structures.DirectIndex", 
          -					"org.terrier.structures.Index,java.lang.String", 
          -					"index,structureName");
          +					"org.terrier.structures.Index,java.lang.String,java.lang.Class", 
          +					"index,structureName,"+ 
          +						(fieldCount > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass));
           			destIndex.addIndexStructureInputStream(
           					"direct",
           					"org.terrier.structures.DirectIndexInputStream", 
          Index: src/core/org/terrier/structures/indexing/singlepass/Inverted2DirectIndexBuilder.java
          ===================================================================
          --- src/core/org/terrier/structures/indexing/singlepass/Inverted2DirectIndexBuilder.java	(revision 2991)
          +++ src/core/org/terrier/structures/indexing/singlepass/Inverted2DirectIndexBuilder.java	(working copy)
          @@ -246,8 +246,9 @@
           			index.addIndexStructure(
           				destinationStructure, 
           				directIndexClass, 
          -				"org.terrier.structures.Index,java.lang.String", 
          -				"index,structureName");
          +				"org.terrier.structures.Index,java.lang.String,java.lang.Class", 
          +				"index,structureName,"+ 
          +					(fieldCount > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass));
           			index.addIndexStructureInputStream(
           				destinationStructure, 
           				directIndexInputStreamClass,
          Index: src/core/org/terrier/indexing/BasicIndexer.java
          ===================================================================
          --- src/core/org/terrier/indexing/BasicIndexer.java	(revision 2991)
          +++ src/core/org/terrier/indexing/BasicIndexer.java	(working copy)
          @@ -303,8 +303,9 @@
           		currentIndex.addIndexStructure(
           				"direct", 
           				"org.terrier.structures.DirectIndex", 
          -				"org.terrier.structures.Index,java.lang.String", 
          -				"index,structureName");
          +				"org.terrier.structures.Index,java.lang.String,java.lang.Class", 
          +				"index,structureName,"+ 
          +					(FieldScore.FIELDS_COUNT > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass));
           		currentIndex.addIndexStructureInputStream(
           				"direct",
           				"org.terrier.structures.DirectIndexInputStream", 
          Index: src/test/org/terrier/tests/BasicShakespeareEndToEndTest.java
          ===================================================================
          --- src/test/org/terrier/tests/BasicShakespeareEndToEndTest.java	(revision 2991)
          +++ src/test/org/terrier/tests/BasicShakespeareEndToEndTest.java	(working copy)
          @@ -24,9 +24,16 @@
            *   Craig Macdonald <craigm{a.}dcs.gla.ac.uk> (original contributor)
            */
           package org.terrier.tests;
          +import static org.junit.Assert.assertEquals;
          +import static org.junit.Assert.assertTrue;
          +
           import org.junit.Test;
          -import static org.junit.Assert.*;
          +import org.terrier.structures.BitIndexPointer;
          +import org.terrier.structures.BitPostingIndex;
          +import org.terrier.structures.BitPostingIndexInputStream;
           import org.terrier.structures.Index;
          +import org.terrier.structures.postings.FieldPosting;
          +import org.terrier.structures.postings.IterablePosting;
           
           public class BasicShakespeareEndToEndTest extends ShakespeareEndToEndTest {
           
          @@ -73,6 +80,28 @@
           			assertEquals(2, index.getCollectionStatistics().getNumberOfFields());
           			assertEquals(123, index.getCollectionStatistics().getFieldTokens()[0]);
           			assertEquals(611, index.getCollectionStatistics().getFieldTokens()[1]);
          +			
          +			
          +			
          +			//no check correct type of all structures
          +			BitPostingIndexInputStream bpiis;
          +			IterablePosting ip;
          +			BitPostingIndex bpi;
          +			
          +			//check stream structures
          +			bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("direct");
          +			ip = bpiis.next();
          +			assertTrue(ip instanceof FieldPosting);
          +			bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream("inverted");
          +			ip = bpiis.next();
          +			assertTrue(ip instanceof FieldPosting);
          +			
          +			//check random structures
          +			bpi = (BitPostingIndex) index.getInvertedIndex();
          +			ip = bpi.getPostings((BitIndexPointer) index.getLexicon().getLexiconEntry(0).getValue());
          +			assertTrue(ip instanceof FieldPosting);
          +			bpi = (BitPostingIndex) index.getDirectIndex();
          +			ip = bpi.getPostings((BitIndexPointer) index.getDocumentIndex().getDocumentEntry(0));
           		}
           	}
           	
          Index: src/core/org/terrier/indexing/BlockIndexer.java
          ===================================================================
          --- src/core/org/terrier/indexing/BlockIndexer.java	(revision 2991)
          +++ src/core/org/terrier/indexing/BlockIndexer.java	(working copy)
          @@ -421,8 +421,9 @@
           		currentIndex.addIndexStructure(
           				"direct", 
           				"org.terrier.structures.BlockDirectIndex", 
          -				"org.terrier.structures.Index,java.lang.String", 
          -				"index,structureName");
          +				"org.terrier.structures.Index,java.lang.String,java.lang.Class", 
          +				"index,structureName,"+ 
          +				(FieldScore.FIELDS_COUNT > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass));
           		currentIndex.addIndexStructureInputStream(
           				"direct",
           				"org.terrier.structures.BlockDirectIndexInputStream", 
          
          
          Show
          craigm Craig Macdonald added a comment - Ok, I fixed BasicIndexer, BlockIndexer, StrutureMerger and Inverted2DirectIndexBuilder. BasicShakespeareEndToEndTest now catchs this.\ Here is a patch. Can you make the same change to IncrementalIndexer and re-test? Many thanks Index: src/core/org/terrier/structures/merging/StructureMerger.java =================================================================== --- src/core/org/terrier/structures/merging/StructureMerger.java (revision 2991) +++ src/core/org/terrier/structures/merging/StructureMerger.java (working copy) @@ -487,8 +487,9 @@ destIndex.addIndexStructure( "direct" , "org.terrier.structures.DirectIndex" , - "org.terrier.structures.Index,java.lang. String " , - "index,structureName" ); + "org.terrier.structures.Index,java.lang. String ,java.lang. Class " , + "index,structureName," + + (fieldCount > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass)); destIndex.addIndexStructureInputStream( "direct" , "org.terrier.structures.DirectIndexInputStream" , Index: src/core/org/terrier/structures/indexing/singlepass/Inverted2DirectIndexBuilder.java =================================================================== --- src/core/org/terrier/structures/indexing/singlepass/Inverted2DirectIndexBuilder.java (revision 2991) +++ src/core/org/terrier/structures/indexing/singlepass/Inverted2DirectIndexBuilder.java (working copy) @@ -246,8 +246,9 @@ index.addIndexStructure( destinationStructure, directIndexClass, - "org.terrier.structures.Index,java.lang. String " , - "index,structureName" ); + "org.terrier.structures.Index,java.lang. String ,java.lang. Class " , + "index,structureName," + + (fieldCount > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass)); index.addIndexStructureInputStream( destinationStructure, directIndexInputStreamClass, Index: src/core/org/terrier/indexing/BasicIndexer.java =================================================================== --- src/core/org/terrier/indexing/BasicIndexer.java (revision 2991) +++ src/core/org/terrier/indexing/BasicIndexer.java (working copy) @@ -303,8 +303,9 @@ currentIndex.addIndexStructure( "direct" , "org.terrier.structures.DirectIndex" , - "org.terrier.structures.Index,java.lang. String " , - "index,structureName" ); + "org.terrier.structures.Index,java.lang. String ,java.lang. Class " , + "index,structureName," + + (FieldScore.FIELDS_COUNT > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass)); currentIndex.addIndexStructureInputStream( "direct" , "org.terrier.structures.DirectIndexInputStream" , Index: src/test/org/terrier/tests/BasicShakespeareEndToEndTest.java =================================================================== --- src/test/org/terrier/tests/BasicShakespeareEndToEndTest.java (revision 2991) +++ src/test/org/terrier/tests/BasicShakespeareEndToEndTest.java (working copy) @@ -24,9 +24,16 @@ * Craig Macdonald <craigm{a.}dcs.gla.ac.uk> (original contributor) */ package org.terrier.tests; + import static org.junit.Assert.assertEquals; + import static org.junit.Assert.assertTrue; + import org.junit.Test; - import static org.junit.Assert.*; + import org.terrier.structures.BitIndexPointer; + import org.terrier.structures.BitPostingIndex; + import org.terrier.structures.BitPostingIndexInputStream; import org.terrier.structures.Index; + import org.terrier.structures.postings.FieldPosting; + import org.terrier.structures.postings.IterablePosting; public class BasicShakespeareEndToEndTest extends ShakespeareEndToEndTest { @@ -73,6 +80,28 @@ assertEquals(2, index.getCollectionStatistics().getNumberOfFields()); assertEquals(123, index.getCollectionStatistics().getFieldTokens()[0]); assertEquals(611, index.getCollectionStatistics().getFieldTokens()[1]); + + + + //no check correct type of all structures + BitPostingIndexInputStream bpiis; + IterablePosting ip; + BitPostingIndex bpi; + + //check stream structures + bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream( "direct" ); + ip = bpiis.next(); + assertTrue(ip instanceof FieldPosting); + bpiis = (BitPostingIndexInputStream) index.getIndexStructureInputStream( "inverted" ); + ip = bpiis.next(); + assertTrue(ip instanceof FieldPosting); + + //check random structures + bpi = (BitPostingIndex) index.getInvertedIndex(); + ip = bpi.getPostings((BitIndexPointer) index.getLexicon().getLexiconEntry(0).getValue()); + assertTrue(ip instanceof FieldPosting); + bpi = (BitPostingIndex) index.getDirectIndex(); + ip = bpi.getPostings((BitIndexPointer) index.getDocumentIndex().getDocumentEntry(0)); } } Index: src/core/org/terrier/indexing/BlockIndexer.java =================================================================== --- src/core/org/terrier/indexing/BlockIndexer.java (revision 2991) +++ src/core/org/terrier/indexing/BlockIndexer.java (working copy) @@ -421,8 +421,9 @@ currentIndex.addIndexStructure( "direct" , "org.terrier.structures.BlockDirectIndex" , - "org.terrier.structures.Index,java.lang. String " , - "index,structureName" ); + "org.terrier.structures.Index,java.lang. String ,java.lang. Class " , + "index,structureName," + + (FieldScore.FIELDS_COUNT > 0 ? fieldDirectIndexPostingIteratorClass : basicDirectIndexPostingIteratorClass)); currentIndex.addIndexStructureInputStream( "direct" , "org.terrier.structures.BlockDirectIndexInputStream" ,
          Hide
          craigm Craig Macdonald added a comment -

          updated title.

          Show
          craigm Craig Macdonald added a comment - updated title.
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Summary org.terrier.structures.BitPostingIndex.getPostings(BitIndexPointer) fails when fields are used Some indexers do not set the IterablePosting class for the DirectIndex
          Hide
          rec Richard Eckart de Castilho added a comment -

          I upgraded my IncrementalIndexer accordingly and that works fine. Thanks very much for the fast fix

          Show
          rec Richard Eckart de Castilho added a comment - I upgraded my IncrementalIndexer accordingly and that works fine. Thanks very much for the fast fix
          Hide
          rec Richard Eckart de Castilho added a comment -

          The IncrementalIndexer basically allows you to do this:

                          // Write a simple index
          		indexer.startDirectIndex();
          		indexer.startDocument("doc1");
          		indexer.indexTerm("term1", singleton("token")); // The second argument are the field names to add to
          		indexer.indexTerm("term2", singleton("token"));
          		indexer.indexTerm("tag1", singleton("tag"));
          		indexer.endDocument();
          		indexer.syncFieldNames(); // This is where the fields names discovered during term adding are synched with FieldScore
          		indexer.endDirectIndex();
          		indexer.createInvertedIndex();
          

          Btw. it would be great if not everything in Terrier was done using (public) static variables for almost anything. I had to do some black
          classpath voodoo to be able to deploy multiple Terrier instances in different configurations in the same JVM.

          Show
          rec Richard Eckart de Castilho added a comment - The IncrementalIndexer basically allows you to do this: // Write a simple index indexer.startDirectIndex(); indexer.startDocument( "doc1" ); indexer.indexTerm( "term1" , singleton( "token" )); // The second argument are the field names to add to indexer.indexTerm( "term2" , singleton( "token" )); indexer.indexTerm( "tag1" , singleton( "tag" )); indexer.endDocument(); indexer.syncFieldNames(); // This is where the fields names discovered during term adding are synched with FieldScore indexer.endDirectIndex(); indexer.createInvertedIndex(); Btw. it would be great if not everything in Terrier was done using (public) static variables for almost anything. I had to do some black classpath voodoo to be able to deploy multiple Terrier instances in different configurations in the same JVM.
          Hide
          craigm Craig Macdonald added a comment -

          Closing as resolved. Fix has been commited to trunk. Anyone experiencing this issue with an index already created by Terrier v3.0 should amend the data.properties file of their index:

          index.direct.parameter_types=org.terrier.structures.Index,java.lang.String,java.lang.Class
          index.direct.parameter_values=index,structureName,org.terrier.structures.postings.FieldIterablePosting
          #or BlockFieldIterablePosting for a block & field index
          
          Show
          craigm Craig Macdonald added a comment - Closing as resolved. Fix has been commited to trunk. Anyone experiencing this issue with an index already created by Terrier v3.0 should amend the data.properties file of their index: index.direct.parameter_types=org.terrier.structures.Index,java.lang.String,java.lang.Class index.direct.parameter_values=index,structureName,org.terrier.structures.postings.FieldIterablePosting #or BlockFieldIterablePosting for a block & field index
          craigm Craig Macdonald made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.1 [ 10040 ]
          Resolution Fixed [ 1 ]
          Hide
          craigm Craig Macdonald added a comment -

          My proposed refactor would let you do similar code as the following:

          IndexWriter iw = //choose some appropriate implementation
          
          //create a document posting list for a document
          DocumentPostingList d = new DocumentPostingList();
          d.indexTerm("term", fieldid);
          
          //add document to index
          iw.addDocument(d);
          
          //finish all structures
          iw.close();
          

          Would this be OK by you?

          I know all about ApplicationSetup. I have plans round this, it takes time though! We thought best to release what we had at present.

          Show
          craigm Craig Macdonald added a comment - My proposed refactor would let you do similar code as the following: IndexWriter iw = //choose some appropriate implementation //create a document posting list for a document DocumentPostingList d = new DocumentPostingList(); d.indexTerm( "term" , fieldid); //add document to index iw.addDocument(d); //finish all structures iw.close(); Would this be OK by you? I know all about ApplicationSetup. I have plans round this, it takes time though! We thought best to release what we had at present.
          Hide
          rec Richard Eckart de Castilho added a comment -

          That should be fine I think. And if that enables us to incrementally build block indexes, that would be great.

          Show
          rec Richard Eckart de Castilho added a comment - That should be fine I think. And if that enables us to incrementally build block indexes, that would be great.
          Hide
          rec Richard Eckart de Castilho added a comment -

          Just one more comment. It may become an issue for us that Hadoop is so deeply integrated into Terrier. I set up Terrier as a Maven project here and hoped to
          be able to make Hadoop an optional dependency. Mainly for two reasons: a) reduce dependencies for non-Hadoop projects and b) to be able to run Terrier
          on another version of Hadoop - that is not using Terriers Hadoop capabilities, but embedding it as part of another Hadoop program. Maybe it would be possible
          to use adapters to use adapter classes to adapt Terrier classes to Hadoop instead of having Terrier classes directly inherit from Hadoop. That is, making Hadoop
          really an optional feature on top of Terrier.

          Show
          rec Richard Eckart de Castilho added a comment - Just one more comment. It may become an issue for us that Hadoop is so deeply integrated into Terrier. I set up Terrier as a Maven project here and hoped to be able to make Hadoop an optional dependency. Mainly for two reasons: a) reduce dependencies for non-Hadoop projects and b) to be able to run Terrier on another version of Hadoop - that is not using Terriers Hadoop capabilities, but embedding it as part of another Hadoop program. Maybe it would be possible to use adapters to use adapter classes to adapt Terrier classes to Hadoop instead of having Terrier classes directly inherit from Hadoop. That is, making Hadoop really an optional feature on top of Terrier.
          Hide
          craigm Craig Macdonald added a comment -

          Its just an interface on the existing code, not incremental indexing directly. However, if you have the relevant code, then we wrap it up in the same interface.

          Can you file a new issue for the last comment re Hadoop please?

          Show
          craigm Craig Macdonald added a comment - Its just an interface on the existing code, not incremental indexing directly. However, if you have the relevant code, then we wrap it up in the same interface. Can you file a new issue for the last comment re Hadoop please?
          Hide
          rec Richard Eckart de Castilho added a comment -

          Variant of BasicIndexer which allows to index one term a a time. Feel free to integrate if you like it.

          Show
          rec Richard Eckart de Castilho added a comment - Variant of BasicIndexer which allows to index one term a a time. Feel free to integrate if you like it.
          rec Richard Eckart de Castilho made changes -
          Attachment IncrementalIndexerImpl.java [ 10191 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              rec Richard Eckart de Castilho
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: