[TR-216] Changing Hadoop temporary folder without recompiling Created: 25/Oct/12  Updated: 05/Mar/14  Resolved: 05/Mar/14

Status: Resolved
Project: Terrier Core
Component/s: .utility
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Improvement Priority: Trivial
Reporter: Víctor López Juan Assignee: Richard McCreadie
Resolution: Fixed  
Labels: None

When indexing with Hadoop, Terrier needs a temporary folder on the shared filesystem to store intermediate files. Currently, that folder is /tmp/, and the path is hard-coded in org.terrier.utility.io.HadoopUtility.makeTemporaryFile .

final Path tempFile = new Path("/tmp/"+(randomKey)+"-"+filename);

There are at least two orthogonal use cases where one might be interested in changing the path:

- Several applications need a temporary folder on the Hadoop FS, and one wants to isolate Terrier's files from the rest's. It easens clean-up when a job fails.
- The shared filesystem is actually a network share mounted on every node, which is not under /tmp.

The current workaround is to replace "/tmp/" with the desired path.
I propose adding a new property (e.g. terrier.hadoop.temporary.folder) which defaults to "/tmp/", and whose value is used instead of the literal "/tmp/".

Comment by Craig Macdonald [ 25/Oct/12 ]

+1. I would call the property terrier.hadoop.io.tmpdir, which is similar to the Java equivalent.

Comment by Víctor López Juan [ 29/Oct/12 ]

Agreed; terrier.hadoop.io.tmpdir is quite self-descriptive.

Comment by Richard McCreadie [ 05/Mar/14 ]

This issue appears to have already been fixed by Craig in commit 3698.

Resolving issue.

Generated at Mon Dec 17 00:39:03 GMT 2018 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.