[TR-201] log4j conflicts can occur for hadoop indexing Created: 05/Jun/12  Updated: 19/Jun/14  Resolved: 18/Jul/13

Status: Resolved
Project: Terrier Core
Component/s: None
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Major
Reporter: Benjamin Piwowarski Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: Text File TR-201.patch    
Issue Links:
Duplicate
duplicates TR-111 Terrier breaks existing log4j configu... Resolved
is duplicated by TR-232 HadoopImageTerrier project can't work... Resolved

 Description   
I am trying to index ClueWeb09B with terrier, but it does not work, apparently due to a conflict in log4j configuration:

Setting JAVA_HOME to /usr
INFO - JAAS Configuration already set up for Hadoop, not re-installing.
INFO - Term-partitioned Mode, 26 reducers creating one inverted index.
INFO - Copying terrier share/ directory (/home/bpiwowar/terrier-3.5/share) to shared storage area (hdfs://oops1/tmp/-1138700486-terrier.share)
INFO - Copying classpath to job
WARN - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
WARN - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
INFO - Allocating 100 files across 2 map tasks
INFO - Running job: job_201204061503_0009
INFO - map 0% reduce 0%
INFO - Task Id : attempt_201204061503_0009_m_000001_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:94)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:337)
at org.apache.hadoop.mapred.Child$4.run(Child.java:272)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)

...


Here is a discussion about the topic
https://groups.google.com/forum/#!msg/brisk-users/Ohj0Z5Zzvqg/2hEZk5WpLIgJ

The first work-around is to uncomment the code in isLog4JConfigured (ApplicationSetup), but a better solution would be to rely on a Terrier specific logger repository, i.e. following
http://articles.qos.ch/sc.html


 Comments   
Comment by Craig Macdonald [ 05/Jun/12 ]

updated title

Comment by Craig Macdonald [ 05/Jun/12 ]

Thanks Benjami.

isLog4JConfigured() never worked as intended. Its interesting that it does something for you.

So that I can try to reproduce your issue, which Hadoop version/distribution are you using?

Craig

Comment by Benjamin Piwowarski [ 05/Jun/12 ]

I am using hadoop from cloudera (Hadoop 0.20.2-cdh3u3)

Benjamin

Comment by Craig Macdonald [ 19/Oct/12 ]

tagging for 3.6. A forum post at http://terrier.org/forum//read.php?3,2608 also reported this, and another issue locally in Glasgow.

Comment by Craig Macdonald [ 18/Jul/13 ]

I looked into this. The key point is log4j will (wrongly) configure itself from the Hadoop jar files. The solution is to ensure that bin/anyclass.sh informs log4j if a terrier-log.xml file exists in the TERRIER_ETC directory. The isLog4jConfigured() in ApplicationSetup works as expected. Patch attached.

Comment by Craig Macdonald [ 18/Jul/13 ]

Committed r3700.

Generated at Sat Dec 16 03:31:24 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.