Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
Terrier on Hadoop 1.0.3
Posted by: vklj ()
Date: October 18, 2012 11:05AM

I am trying to run Terrier on Hadoop 1.0.3. It's a configuration with two map-reduce nodes; the shared filesystem is mounted on a partition shared by NFS. I've tested the setup with the standard WordCount example, and it seems to work fine.

When trying to index using Terrier, I get this results:

======================================
INFO - Term-partitioned Mode, 26 reducers creating one inverted index.
INFO - Copying terrier share/ directory (/collections-raptor/datasets/ClueWeb/2009/_subset/CatB/_index/_formats/terrier-3.5-hadoop/_algo/terrier-3.5/share) to shared storage area (file:/collections-raptor/hadoop/fs/tmp/1319127975-terrier.share)
INFO - Loaded the native-hadoop library
INFO - Copying classpath to job
WARN - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
WARN - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
INFO - Allocating 1492 files across 2 map tasks
INFO - Running job: job_201210181149_0002
INFO - map 0% reduce 0%
INFO - Task Id : attempt_201210181149_0002_m_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201210181149_0002_m_000000_0: WARN - Snappy native library not loaded
attempt_201210181149_0002_m_000000_0: INFO - numReduceTasks: 26
attempt_201210181149_0002_m_000000_0: INFO - io.sort.mb = 100
attempt_201210181149_0002_m_000000_0: INFO - data buffer = 79691776/99614720
attempt_201210181149_0002_m_000000_0: INFO - record buffer = 262144/327680
attempt_201210181149_0002_m_000000_0: INFO - Reloading Application Setup
attempt_201210181149_0002_m_000000_0: INFO - Checking memory usage every 20 maxDocPerFlush=0
attempt_201210181149_0002_m_000000_0: INFO - Opening file:/collections-raptor/datasets/ClueWeb/2009/_subset/CatB/_data/en0000/00.warc.gz
attempt_201210181149_0002_m_000000_0: INFO - Successfully loaded & initialized native-zlib library
attempt_201210181149_0002_m_000000_0: INFO - Map task_201210181149_0002_m_000000, flush requested, containing 1 documents, flush 0
attempt_201210181149_0002_m_000000_0: INFO - Map task_201210181149_0002_m_000000 finishing, indexed 0 in 0 flushes
attempt_201210181149_0002_m_000000_0: WARN - Error running child
attempt_201210181149_0002_m_000000_0: java.lang.NullPointerException
attempt_201210181149_0002_m_000000_0: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
attempt_201210181149_0002_m_000000_0: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
attempt_201210181149_0002_m_000000_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
attempt_201210181149_0002_m_000000_0: at java.security.AccessController.doPrivileged(Native Method)
attempt_201210181149_0002_m_000000_0: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201210181149_0002_m_000000_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
attempt_201210181149_0002_m_000000_0: at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201210181149_0002_m_000000_0: INFO - Runnning cleanup for the task
INFO - Task Id : attempt_201210181149_0002_m_000001_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201210181149_0002_m_000001_0: WARN - Snappy native library not loaded
attempt_201210181149_0002_m_000001_0: INFO - numReduceTasks: 26
attempt_201210181149_0002_m_000001_0: INFO - io.sort.mb = 100
attempt_201210181149_0002_m_000001_0: INFO - data buffer = 79691776/99614720
attempt_201210181149_0002_m_000001_0: INFO - record buffer = 262144/327680
attempt_201210181149_0002_m_000001_0: INFO - Reloading Application Setup
attempt_201210181149_0002_m_000001_0: INFO - Checking memory usage every 20 maxDocPerFlush=0
attempt_201210181149_0002_m_000001_0: INFO - Opening file:/collections-raptor/datasets/ClueWeb/2009/_subset/CatB/_data/en0007/46.warc.gz
attempt_201210181149_0002_m_000001_0: INFO - Successfully loaded & initialized native-zlib library
attempt_201210181149_0002_m_000001_0: INFO - Map task_201210181149_0002_m_000001, flush requested, containing 1 documents, flush 0
attempt_201210181149_0002_m_000001_0: INFO - Map task_201210181149_0002_m_000001 finishing, indexed 0 in 0 flushes
attempt_201210181149_0002_m_000001_0: WARN - Error running child
attempt_201210181149_0002_m_000001_0: java.lang.NullPointerException
attempt_201210181149_0002_m_000001_0: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
attempt_201210181149_0002_m_000001_0: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
attempt_201210181149_0002_m_000001_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
attempt_201210181149_0002_m_000001_0: at java.security.AccessController.doPrivileged(Native Method)
attempt_201210181149_0002_m_000001_0: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201210181149_0002_m_000001_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
attempt_201210181149_0002_m_000001_0: at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201210181149_0002_m_000001_0: INFO - Runnning cleanup for the task
INFO - Task Id : attempt_201210181149_0002_m_000001_1, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201210181149_0002_m_000001_1: WARN - Snappy native library not loaded
attempt_201210181149_0002_m_000001_1: INFO - numReduceTasks: 26
attempt_201210181149_0002_m_000001_1: INFO - io.sort.mb = 100
attempt_201210181149_0002_m_000001_1: INFO - data buffer = 79691776/99614720
attempt_201210181149_0002_m_000001_1: INFO - record buffer = 262144/327680
attempt_201210181149_0002_m_000001_1: INFO - Reloading Application Setup
attempt_201210181149_0002_m_000001_1: INFO - Checking memory usage every 20 maxDocPerFlush=0
attempt_201210181149_0002_m_000001_1: INFO - Opening file:/collections-raptor/datasets/ClueWeb/2009/_subset/CatB/_data/en0007/46.warc.gz
attempt_201210181149_0002_m_000001_1: INFO - Successfully loaded & initialized native-zlib library
attempt_201210181149_0002_m_000001_1: INFO - Map task_201210181149_0002_m_000001, flush requested, containing 1 documents, flush 0
attempt_201210181149_0002_m_000001_1: INFO - Map task_201210181149_0002_m_000001 finishing, indexed 0 in 0 flushes
attempt_201210181149_0002_m_000001_1: WARN - Error running child
attempt_201210181149_0002_m_000001_1: java.lang.NullPointerException
attempt_201210181149_0002_m_000001_1: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
attempt_201210181149_0002_m_000001_1: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
attempt_201210181149_0002_m_000001_1: at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
attempt_201210181149_0002_m_000001_1: at java.security.AccessController.doPrivileged(Native Method)
attempt_201210181149_0002_m_000001_1: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201210181149_0002_m_000001_1: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
attempt_201210181149_0002_m_000001_1: at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201210181149_0002_m_000001_1: INFO - Runnning cleanup for the task
INFO - Task Id : attempt_201210181149_0002_m_000000_1, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201210181149_0002_m_000000_1: WARN - Snappy native library not loaded
attempt_201210181149_0002_m_000000_1: INFO - numReduceTasks: 26
attempt_201210181149_0002_m_000000_1: INFO - io.sort.mb = 100
attempt_201210181149_0002_m_000000_1: INFO - data buffer = 79691776/99614720
attempt_201210181149_0002_m_000000_1: INFO - record buffer = 262144/327680
attempt_201210181149_0002_m_000000_1: INFO - Reloading Application Setup
attempt_201210181149_0002_m_000000_1: INFO - Checking memory usage every 20 maxDocPerFlush=0
attempt_201210181149_0002_m_000000_1: INFO - Opening file:/collections-raptor/datasets/ClueWeb/2009/_subset/CatB/_data/en0000/00.warc.gz
attempt_201210181149_0002_m_000000_1: INFO - Successfully loaded & initialized native-zlib library
attempt_201210181149_0002_m_000000_1: INFO - Map task_201210181149_0002_m_000000, flush requested, containing 1 documents, flush 0
attempt_201210181149_0002_m_000000_1: INFO - Map task_201210181149_0002_m_000000 finishing, indexed 0 in 0 flushes
attempt_201210181149_0002_m_000000_1: WARN - Error running child
attempt_201210181149_0002_m_000000_1: java.lang.NullPointerException
attempt_201210181149_0002_m_000000_1: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
attempt_201210181149_0002_m_000000_1: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
attempt_201210181149_0002_m_000000_1: at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
attempt_201210181149_0002_m_000000_1: at java.security.AccessController.doPrivileged(Native Method)
attempt_201210181149_0002_m_000000_1: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201210181149_0002_m_000000_1: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
attempt_201210181149_0002_m_000000_1: at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201210181149_0002_m_000000_1: INFO - Runnning cleanup for the task
INFO - Task Id : attempt_201210181149_0002_m_000001_2, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201210181149_0002_m_000001_2: WARN - Snappy native library not loaded
attempt_201210181149_0002_m_000001_2: INFO - numReduceTasks: 26
attempt_201210181149_0002_m_000001_2: INFO - io.sort.mb = 100
attempt_201210181149_0002_m_000001_2: INFO - data buffer = 79691776/99614720
attempt_201210181149_0002_m_000001_2: INFO - record buffer = 262144/327680
attempt_201210181149_0002_m_000001_2: INFO - Reloading Application Setup
attempt_201210181149_0002_m_000001_2: INFO - Checking memory usage every 20 maxDocPerFlush=0
attempt_201210181149_0002_m_000001_2: INFO - Opening file:/collections-raptor/datasets/ClueWeb/2009/_subset/CatB/_data/en0007/46.warc.gz
attempt_201210181149_0002_m_000001_2: INFO - Successfully loaded & initialized native-zlib library
attempt_201210181149_0002_m_000001_2: INFO - Map task_201210181149_0002_m_000001, flush requested, containing 1 documents, flush 0
attempt_201210181149_0002_m_000001_2: INFO - Map task_201210181149_0002_m_000001 finishing, indexed 0 in 0 flushes
attempt_201210181149_0002_m_000001_2: WARN - Error running child
attempt_201210181149_0002_m_000001_2: java.lang.NullPointerException
attempt_201210181149_0002_m_000001_2: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
attempt_201210181149_0002_m_000001_2: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
attempt_201210181149_0002_m_000001_2: at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
attempt_201210181149_0002_m_000001_2: at java.security.AccessController.doPrivileged(Native Method)
attempt_201210181149_0002_m_000001_2: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201210181149_0002_m_000001_2: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
attempt_201210181149_0002_m_000001_2: at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201210181149_0002_m_000001_2: INFO - Runnning cleanup for the task
INFO - Task Id : attempt_201210181149_0002_m_000000_2, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201210181149_0002_m_000000_2: WARN - Snappy native library not loaded
attempt_201210181149_0002_m_000000_2: INFO - numReduceTasks: 26
attempt_201210181149_0002_m_000000_2: INFO - io.sort.mb = 100
attempt_201210181149_0002_m_000000_2: INFO - data buffer = 79691776/99614720
attempt_201210181149_0002_m_000000_2: INFO - record buffer = 262144/327680
attempt_201210181149_0002_m_000000_2: INFO - Reloading Application Setup
attempt_201210181149_0002_m_000000_2: INFO - Checking memory usage every 20 maxDocPerFlush=0
attempt_201210181149_0002_m_000000_2: INFO - Opening file:/collections-raptor/datasets/ClueWeb/2009/_subset/CatB/_data/en0000/00.warc.gz
attempt_201210181149_0002_m_000000_2: INFO - Successfully loaded & initialized native-zlib library
attempt_201210181149_0002_m_000000_2: INFO - Map task_201210181149_0002_m_000000, flush requested, containing 1 documents, flush 0
attempt_201210181149_0002_m_000000_2: INFO - Map task_201210181149_0002_m_000000 finishing, indexed 0 in 0 flushes
attempt_201210181149_0002_m_000000_2: WARN - Error running child
attempt_201210181149_0002_m_000000_2: java.lang.NullPointerException
attempt_201210181149_0002_m_000000_2: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:67)
attempt_201210181149_0002_m_000000_2: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:357)
attempt_201210181149_0002_m_000000_2: at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
attempt_201210181149_0002_m_000000_2: at java.security.AccessController.doPrivileged(Native Method)
attempt_201210181149_0002_m_000000_2: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201210181149_0002_m_000000_2: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
attempt_201210181149_0002_m_000000_2: at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201210181149_0002_m_000000_2: INFO - Runnning cleanup for the task
INFO - Job complete: job_201210181149_0002
INFO - Counters: 7
INFO - Job Counters
INFO - SLOTS_MILLIS_MAPS=54552
INFO - Total time spent by all reduces waiting after reserving slots (ms)=0
INFO - Total time spent by all maps waiting after reserving slots (ms)=0
INFO - Rack-local map tasks=8
INFO - Launched map tasks=8
INFO - SLOTS_MILLIS_REDUCES=0
INFO - Failed map tasks=1
INFO - Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201210181149_0002_m_000001
ERROR - Problem running job
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.terrier.applications.HadoopIndexing.main(HadoopIndexing.java:230)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:371)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:564)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:235)
Time elapsed: 50.85 seconds.
====================================================

I've already followed the instructions at: [terrier.org] , and also looked at these threads:

[terrier.org]
[terrier.org]

However, none of the solutions seems to work. Do you have any idea about what might be going on?

-- Victor

Options: ReplyQuote
Re: Terrier on Hadoop 1.0.3
Posted by: craigm ()
Date: October 18, 2012 06:15PM

Hi Victor,

Looking at [comments.gmane.org], I think its a log4j badness on Terrier's paper.

Can you try commenting out the log4j configuration in ApplicationSetup?

Cheers,

Craig

Options: ReplyQuote
Re: Terrier on Hadoop 1.0.3
Posted by: craigm ()
Date: October 19, 2012 11:26AM

See also [terrier.org]

Craig

Options: ReplyQuote
Re: Terrier on Hadoop 1.0.3
Posted by: vklj ()
Date: October 23, 2012 10:08AM

I uncommented the body of isLog4JConfigured as shown below, and it worked. Thanks for the heads up!

private static boolean isLog4JConfigured()
{
boolean log4jConfigured = false;
java.util.Enumeration en = Logger.getRootLogger().getAllAppenders();
if (!(en instanceof org.apache.log4j.helpers.NullEnumeration))
{
log4jConfigured = true;
}
else
{
java.util.Enumeration cats = org.apache.log4j.LogManager.getCurrentLoggers();
while (cats.hasMoreElements())
{
Logger c = (org.apache.log4j.Logger) cats.nextElement();
if (!(c.getAllAppenders() instanceof org.apache.log4j.helpers.NullEnumeration))
{
log4jConfigured = true;
}
}
}
return log4jConfigured;
//return false;
}

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.