Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-216

Changing Hadoop temporary folder without recompiling


    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: .utility
    • Labels:


      When indexing with Hadoop, Terrier needs a temporary folder on the shared filesystem to store intermediate files. Currently, that folder is /tmp/, and the path is hard-coded in org.terrier.utility.io.HadoopUtility.makeTemporaryFile .

      final Path tempFile = new Path("/tmp/"+(randomKey)+"-"+filename);

      There are at least two orthogonal use cases where one might be interested in changing the path:

      - Several applications need a temporary folder on the Hadoop FS, and one wants to isolate Terrier's files from the rest's. It easens clean-up when a job fails.
      - The shared filesystem is actually a network share mounted on every node, which is not under /tmp.

      The current workaround is to replace "/tmp/" with the desired path.
      I propose adding a new property (e.g. terrier.hadoop.temporary.folder) which defaults to "/tmp/", and whose value is used instead of the literal "/tmp/".



          vklj Víctor López Juan created issue -
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Priority Minor [ 4 ] Trivial [ 5 ]
          Assignee Craig Macdonald [ craigm ] Richard McCreadie [ richardm ]
          Fix Version/s 3.6 [ 10060 ]
          richardm Richard McCreadie made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]


            • Assignee:
              richardm Richard McCreadie
              vklj Víctor López Juan
            • Watchers:
              0 Start watching this issue


              • Created: