[TR-136] Hadoop indexing misbehaves when terrier.index.prefix is not "data" Created: 24/Jun/10  Updated: 05/Apr/11  Resolved: 18/Feb/11

Status: Resolved
Project: Terrier Core
Component/s: .indexing
Affects Version/s: 3.0
Fix Version/s: 3.5

Type: Bug Priority: Trivial
Reporter: Craig Macdonald Assignee: Richard McCreadie
Resolution: Fixed  
Labels: None


 Description   
Hadoop MR indexing in Terrier misbehaves slightly when the index prefix is not "data". In particular, indexing completes normally, using the default prefix of "data", however MetaIndex reversal fails. As the priority says, trivial.

 Comments   
Comment by Craig Macdonald [ 24/Jun/10 ]

Patch:

Index: src/core/org/terrier/applications/HadoopIndexing.java
===================================================================
--- src/core/org/terrier/applications/HadoopIndexing.java	(revision 3010)
+++ src/core/org/terrier/applications/HadoopIndexing.java	(working copy)
@@ -166,6 +166,7 @@
 			conf.setReducerClass(Hadoop_BasicSinglePassIndexer.class);
 		}
 		FileOutputFormat.setOutputPath(conf, new Path(ApplicationSetup.TERRIER_INDEX_PATH));
+		conf.set("indexing.hadoop.prefix", ApplicationSetup.TERRIER_INDEX_PREFIX);
 		conf.setMapOutputKeyClass(SplitEmittedTerm.class);
 		conf.setMapOutputValueClass(MapEmittedPostingList.class);
 		conf.setBoolean("indexing.hadoop.multiple.indices", docPartitioned);
Index: src/core/org/terrier/indexing/hadoop/Hadoop_BasicSinglePassIndexer.java
===================================================================
--- src/core/org/terrier/indexing/hadoop/Hadoop_BasicSinglePassIndexer.java	(revision 2991)
+++ src/core/org/terrier/indexing/hadoop/Hadoop_BasicSinglePassIndexer.java	(working copy)
@@ -114,7 +114,7 @@
 	
 	public static void main(String[] args) throws Exception
     {
-        if (args.length > 0 && args[0].equals("--finish"))
+        if (args.length == 2 && args[0].equals("--finish"))
         {
             final JobFactory jf = HadoopPlugin.getJobFactory("HOD-TerrierIndexing");
             if (jf == null)
@@ -157,7 +157,7 @@
 					@Override
 					public void run() {
 						try{
-							Index index = Index.createIndex(destinationIndexPath, "data-"+id);
+							Index index = Index.createIndex(destinationIndexPath, ApplicationSetup.TERRIER_INDEX_PREFIX+"-"+id);
 							CompressingMetaIndexBuilder.reverseAsMapReduceJob(index, "meta", reverseMetaKeys, jf);
 							index.close();
 						} catch (Exception e) {
@@ -460,17 +460,18 @@
 		start = true;
 		//load in the current index
 		final Path indexDestination = FileOutputFormat.getWorkOutputPath(jc);
+		final String indexDestinationPrefix = jc.get("indexing.hadoop.prefix", "data");
 		reduceId = TaskAttemptID.forName(jc.get("mapred.task.id")).getTaskID().getId();
 		path = indexDestination.toString();
 		mutipleIndices = jc.getBoolean("indexing.hadoop.multiple.indices", true);
 		if (jc.getNumReduceTasks() > 1)
 		{
-			//gets the reduce number and suffices this to data
-			prefix = "data-"+reduceId;
+			//gets the reduce number and suffices this to the index prefix
+			prefix = indexDestinationPrefix + "-"+reduceId;
 		}
 		else
 		{
-			prefix = "data";
+			prefix = indexDestinationPrefix;
 		}
 		
 		currentIndex = Index.createNewIndex(path, prefix);
@@ -671,9 +672,10 @@
 		currentIndex.setIndexProperty("num.Terms",""+ lexstream.getNumberOfTermsWritten() );
 		currentIndex.setIndexProperty("num.Tokens",""+lexstream.getNumberOfTokensWritten() );
 		currentIndex.setIndexProperty("num.Pointers",""+lexstream.getNumberOfPointersWritten() );
-		this.finishedInvertedIndexBuild();
 		if (FieldScore.FIELDS_COUNT > 0)
 			currentIndex.addIndexStructure("lexicon-valuefactory", FieldLexiconEntry.Factory.class.getName(), "java.lang.String", "${index.inverted.fields.count}");
+		this.finishedInvertedIndexBuild();
+			
 		
 		//the document indices are only merged if we are creating multiple indices
 		//OR if this is the first reducer for a job creating a single index

Comment by Craig Macdonald [ 17/Feb/11 ]

Tagging for 3.1

Comment by Craig Macdonald [ 18/Feb/11 ]

Richard tested this manually. No test case.

Generated at Thu Dec 14 02:35:02 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.