Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-216

Changing Hadoop temporary folder without recompiling

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: .utility
    • Labels:
      None

      Description

      When indexing with Hadoop, Terrier needs a temporary folder on the shared filesystem to store intermediate files. Currently, that folder is /tmp/, and the path is hard-coded in org.terrier.utility.io.HadoopUtility.makeTemporaryFile .

      final Path tempFile = new Path("/tmp/"+(randomKey)+"-"+filename);

      There are at least two orthogonal use cases where one might be interested in changing the path:

      - Several applications need a temporary folder on the Hadoop FS, and one wants to isolate Terrier's files from the rest's. It easens clean-up when a job fails.
      - The shared filesystem is actually a network share mounted on every node, which is not under /tmp.

      The current workaround is to replace "/tmp/" with the desired path.
      I propose adding a new property (e.g. terrier.hadoop.temporary.folder) which defaults to "/tmp/", and whose value is used instead of the literal "/tmp/".

        Attachments

          Activity

            People

            • Assignee:
              richardm Richard McCreadie
              Reporter:
              vklj Víctor López Juan
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: