[TR-216] Changing Hadoop temporary folder without recompiling Created: 25/Oct/12 Updated: 05/Mar/14 Resolved: 05/Mar/14
|Reporter:||Víctor López Juan||Assignee:||Richard McCreadie|
When indexing with Hadoop, Terrier needs a temporary folder on the shared filesystem to store intermediate files. Currently, that folder is /tmp/, and the path is hard-coded in org.terrier.utility.io.HadoopUtility.makeTemporaryFile .
final Path tempFile = new Path("/tmp/"+(randomKey)+"-"+filename);
There are at least two orthogonal use cases where one might be interested in changing the path:
- Several applications need a temporary folder on the Hadoop FS, and one wants to isolate Terrier's files from the rest's. It easens clean-up when a job fails.
- The shared filesystem is actually a network share mounted on every node, which is not under /tmp.
The current workaround is to replace "/tmp/" with the desired path.
I propose adding a new property (e.g. terrier.hadoop.temporary.folder) which defaults to "/tmp/", and whose value is used instead of the literal "/tmp/".
|Comment by Craig Macdonald [ 25/Oct/12 ]|
+1. I would call the property terrier.hadoop.io.tmpdir, which is similar to the Java equivalent.
|Comment by Víctor López Juan [ 29/Oct/12 ]|
Agreed; terrier.hadoop.io.tmpdir is quite self-descriptive.
|Comment by Richard McCreadie [ 05/Mar/14 ]|
This issue appears to have already been fixed by Craig in commit 3698.