Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-181

Hadoop indexing should not copy hadoop libraries to a job classpath

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: .indexing
    • Labels:
      None

      Description

      Dear Terrier Team,
      I've noticed that every time I run an indexing job using Terrier (-H option on the command line) HadoopUtility copies the entire classpath to the job's classpath in order to make libraries available to all the nodes of the cluster. Some of them (the one included in hadoop0.20 directory) are already present on each node since they are part of any hadoop installation.
      I thus modified anyclass.sh and HadoopUtility in order to upload only the libraries which are necessary to the indexing job: that is all libraries in the lib folder except those in the lib/hadoop0.20 subfolder. The jars present in the latter folder will still be included in the classpath. I made up a new property terrier.hadoopLibDir which contains all the unnecessary libraries that will not be uploaded to an hadoop cluster.

      Just my 2 cents :)

        Attachments

        1. anyclass.sh
          3 kB
          Marco Didonna
        2. HadoopUtility.java
          16 kB
          Marco Didonna

          Issue Links

            Activity

            Hide
            craigm Craig Macdonald added a comment -

            Thanks Marco, good catch!

            Show
            craigm Craig Macdonald added a comment - Thanks Marco, good catch!
            Hide
            craigm Craig Macdonald added a comment -

            Marco,

            Do you have a usecase where its not the jar files in lib/hadoop0.20 that you are using?

            Cheers,

            Craig

            Show
            craigm Craig Macdonald added a comment - Marco, Do you have a usecase where its not the jar files in lib/hadoop0.20 that you are using? Cheers, Craig
            Hide
            craigm Craig Macdonald added a comment -

            This and TR-205 should be resolved concurrently.

            Show
            craigm Craig Macdonald added a comment - This and TR-205 should be resolved concurrently.
            Hide
            craigm Craig Macdonald added a comment -

            Tagging for 3.6

            Show
            craigm Craig Macdonald added a comment - Tagging for 3.6
            Hide
            richardm Richard McCreadie added a comment -

            Committed fix for this issue. Using the new lib folder structure /lib/hadoop/ as per issue TR-205 to find hadoop jar files as below, rather than alter the start scripts.

            List<String> hadoopJarList = new ArrayList<String>();

            // find all hadoop jar files. We use the structure of the lib folder to determine these
            String separator = ApplicationSetup.FILE_SEPARATOR;
            for (String candidateHadoopJar : jarList) {
            if (candidateHadoopJar.contains("lib"separator"hadoop"+separator))

            { //System.err.println("Removing "+candidateHadoopJar+" from classpath"); hadoopJarList.add(candidateHadoopJar); }

            }

            jarList.removeAll(hadoopJarList);

            Show
            richardm Richard McCreadie added a comment - Committed fix for this issue. Using the new lib folder structure /lib/hadoop/ as per issue TR-205 to find hadoop jar files as below, rather than alter the start scripts. List<String> hadoopJarList = new ArrayList<String>(); // find all hadoop jar files. We use the structure of the lib folder to determine these String separator = ApplicationSetup.FILE_SEPARATOR; for (String candidateHadoopJar : jarList) { if (candidateHadoopJar.contains("lib" separator "hadoop"+separator)) { //System.err.println("Removing "+candidateHadoopJar+" from classpath"); hadoopJarList.add(candidateHadoopJar); } } jarList.removeAll(hadoopJarList);

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                noiano Marco Didonna
              • Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: