Class Files


  • public class Files
    extends java.lang.Object
    Utililty class for opening readers/writers and input/output streams to files. Handles gzipped and bzipped files on the fly, ie if a file ends in ".gz" or ".GZ", then it will be opened using a GZipInputStream/GZipOutputStream. ".bz2" files are handled in a similar fashion. All returned Streams, Readers, Writers etc are Buffered. If a charset encoding is not specified, then the system default is used. New interfaces are used to descibe random data access.

    FileSystem plugsin

    Additional file systems can be plugged into this module, by calling the addFileSystemCapability() method. FileSystems have read and/or write capabilities, as specified using the FSCapability constants. Files using these external file systems should be denoted by scheme prefixes - eg ftp://, http:// etc. NB: file:// is the default scheme



    Additional Compression Support

    Support for additional stream compression & decompression can be plugged in by calling addFilterInputStreamMapping().



    File Caching

    Terrier can cache files which will see heavy IO activity. In particular, files mentioned in the files.to.cache property will be cached to the default temporary folder. There are also API method to populate the cache with files. For all methods, java.io.tmpdir is the default temporary directory. An IOException will occur if caching fails for some reason.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static interface  Files.FSCapability
      constants declaring which capabilities a file system has
      protected static class  Files.PathTransformation
      a search regex and a replacement for path transformations
    • Constructor Summary

      Constructors 
      Constructor Description
      Files()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void addFileSystemCapability​(FileSystem fs)
      Add a file system to Terrier.
      static void addFilterInputStreamMapping​(java.lang.String regex, java.lang.Class<? extends java.io.InputStream> inputStreamClass, java.lang.Class<? extends java.io.OutputStream> outputStreamClass)
      Add a filter mapping to the Files layes.
      static void addPathTransormation​(java.lang.String find, java.lang.String replace)
      add a static transformation to apply to a path.
      static void cacheFile​(java.lang.String filename)
      Cache to the temporary directory specified by java.io.tmpdir System property.
      static void cacheFile​(java.lang.String filename, java.lang.String temporaryFolder)
      Cache file to specified temporary folder
      static boolean canRead​(java.lang.String filename)
      returns true iff path can be read
      static boolean canWrite​(java.lang.String filename)
      returns true iff path can be read
      static java.lang.Long copyFile​(java.io.File srcFile, java.io.File destFile)
      Copy a file from srcFile to destFile.
      static java.lang.Long copyFile​(java.io.InputStream in, java.io.OutputStream out)
      Copy all bytes from in to out
      static java.lang.Long copyFile​(java.lang.String srcFilename, java.lang.String destFilename)
      Copy a file from srcFile to destFile.
      static java.lang.Long createChecksum​(java.io.File file)
      Returns the CRC checksum of denoted file
      static boolean delete​(java.lang.String filename)
      Delete the named file.
      static boolean deleteOnExit​(java.lang.String path)
      Mark the named path as to be deleted on exit.
      static boolean exists​(java.lang.String path)
      returns true iff the path is really a path
      protected static FileSystem getFileSystem​(java.lang.String filename)
      derive the file system to use that is associated with the scheme in the specified filename.
      static java.lang.String getFileSystemName​(java.lang.String path)
      Get the name of the file system that would be used to access a given file or directory.
      static java.lang.String getParent​(java.lang.String path)
      What is the parent path to the specified path?
      protected static void initialise_mappings()
      initialise the default compression mappings
      protected static void initialise_static_cache()
      we may have been specified some files to cache immediately
      protected static void intialise_transformations()
      initialise the transformations from Application property
      static boolean isDirectory​(java.lang.String path)
      return true if path is a directory
      static long length​(java.io.File f)
      returns the length of file f
      static long length​(java.lang.String filename)
      returns the length of the file, or 0L if cannot be found etc
      static java.lang.String[] list​(java.lang.String path)
      List the contents of a directory
      static void main​(java.lang.String[] args)
      Check that the a specified file exists as per Terrier's file system abstraction layer
      static boolean mkdir​(java.lang.String path)
      returns true if the specificed path can be made as a directory
      protected static java.io.InputStream openFile​(java.lang.String filename)
      Opens an OutputStream to a file called Filename, processing all allowed writable file systems named in writeFileSystemPrefixes
      static RandomDataInput openFileRandom​(java.io.File file)
      Open a file for random access reading
      static RandomDataInput openFileRandom​(java.lang.String filename)
      Returns a RandomAccessFile implementation accessing the specified file
      static java.io.BufferedReader openFileReader​(java.io.File file)
      Opens a reader to the file called file.
      static java.io.BufferedReader openFileReader​(java.io.File file, java.lang.String charset)
      Opens a reader to the file called filename.
      static java.io.BufferedReader openFileReader​(java.lang.String filename)
      Opens a reader to the file called filename.
      static java.io.BufferedReader openFileReader​(java.lang.String filename, java.lang.String charset)
      Opens a reader to the file called filename.
      static java.io.InputStream openFileStream​(java.io.File file)
      Opens an InputStream to a file called file.
      static java.io.InputStream openFileStream​(java.lang.String filename)
      Opens an InputStream to a file called filename.
      static boolean rename​(java.lang.String sourceFilename, java.lang.String destFilename)
      rename a file or directory.
      protected static java.lang.String transform​(java.lang.String filename)
      apply any transformations to the specified filename
      protected static java.io.OutputStream writeFile​(java.lang.String filename)
      Opens an OutputStream to a file called filename, using the filesystem named in the scheme component of the filename.
      static RandomDataOutput writeFileRandom​(java.io.File file)
      Open a file for random access writing and reading
      static RandomDataOutput writeFileRandom​(java.lang.String filename)
      Returns a RandomAccessFile implementation accessing the specificed file
      static java.io.OutputStream writeFileStream​(java.io.File file)
      Opens an OutputStream to a file called file.
      static java.io.OutputStream writeFileStream​(java.lang.String filename)
      Opens an OutputStream to a file called filename.
      static java.io.Writer writeFileWriter​(java.io.File file)
      Opens an Writer to a file called file.
      static java.io.Writer writeFileWriter​(java.io.File file, java.lang.String charset)
      Opens an Writer to a file called file.
      static java.io.Writer writeFileWriter​(java.lang.String filename)
      Opens an Writer to a file called file.
      static java.io.Writer writeFileWriter​(java.lang.String filename, java.lang.String charset)
      Opens an Writer to a file called file.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • fileSystems

        protected static final java.util.Map<java.lang.String,​FileSystem> fileSystems
        map of scheme to FileSystem implementation
      • pathTransformations

        protected static final java.util.List<Files.PathTransformation> pathTransformations
        transformations to apply to a path
      • DEFAULT_SCHEME

        protected static final java.lang.String DEFAULT_SCHEME
        default scheme
    • Constructor Detail

      • Files

        public Files()
    • Method Detail

      • addFilterInputStreamMapping

        public static void addFilterInputStreamMapping​(java.lang.String regex,
                                                       java.lang.Class<? extends java.io.InputStream> inputStreamClass,
                                                       java.lang.Class<? extends java.io.OutputStream> outputStreamClass)
        Add a filter mapping to the Files layes. This is the method used to implement stream decompression. For example:
         addFilterInputStreamMapping(".+\\.gz$", GZIPInputStream.class, GZIPOutputStream.class);
         addFilterInputStreamMapping(".+\\.GZ$", GZIPInputStream.class, GZIPOutputStream.class);
         
        Parameters:
        regex - Regular expression that the filename must match to require the filter stream
        inputStreamClass - Class extending InputStream that decompresses the file
        outputStreamClass - Class extending OutputStream that compresses the file
      • initialise_static_cache

        protected static void initialise_static_cache()
        we may have been specified some files to cache immediately
      • intialise_transformations

        protected static void intialise_transformations()
        initialise the transformations from Application property
      • initialise_mappings

        protected static void initialise_mappings()
        initialise the default compression mappings
      • cacheFile

        public static void cacheFile​(java.lang.String filename)
                              throws java.io.IOException
        Cache to the temporary directory specified by java.io.tmpdir System property.
        Throws:
        java.io.IOException
      • cacheFile

        public static void cacheFile​(java.lang.String filename,
                                     java.lang.String temporaryFolder)
                              throws java.io.IOException
        Cache file to specified temporary folder
        Throws:
        java.io.IOException
      • addPathTransormation

        public static void addPathTransormation​(java.lang.String find,
                                                java.lang.String replace)
        add a static transformation to apply to a path. Find and replace are both regular expressions
      • addFileSystemCapability

        public static void addFileSystemCapability​(FileSystem fs)
        Add a file system to Terrier. File systems are denoted by URI scheme prefixes (e.g. http). The underlying file system is represented by an FileSystem
      • transform

        protected static java.lang.String transform​(java.lang.String filename)
        apply any transformations to the specified filename
      • getFileSystem

        protected static FileSystem getFileSystem​(java.lang.String filename)
        derive the file system to use that is associated with the scheme in the specified filename.
        Parameters:
        filename -
      • getFileSystemName

        public static java.lang.String getFileSystemName​(java.lang.String path)
        Get the name of the file system that would be used to access a given file or directory.
        Parameters:
        path -
        Returns:
        name Name of the file system, or null if no filesystem found
      • openFile

        protected static java.io.InputStream openFile​(java.lang.String filename)
                                               throws java.io.IOException
        Opens an OutputStream to a file called Filename, processing all allowed writable file systems named in writeFileSystemPrefixes
        Parameters:
        filename - Filename of file to open
        Throws:
        java.io.IOException
      • writeFile

        protected static java.io.OutputStream writeFile​(java.lang.String filename)
                                                 throws java.io.IOException
        Opens an OutputStream to a file called filename, using the filesystem named in the scheme component of the filename.
        Parameters:
        filename - Filename of file to open, optionally including scheme
        Throws:
        java.io.IOException
      • openFileRandom

        public static RandomDataInput openFileRandom​(java.lang.String filename)
                                              throws java.io.IOException
        Returns a RandomAccessFile implementation accessing the specified file
        Throws:
        java.io.IOException
      • writeFileRandom

        public static RandomDataOutput writeFileRandom​(java.lang.String filename)
                                                throws java.io.IOException
        Returns a RandomAccessFile implementation accessing the specificed file
        Throws:
        java.io.IOException
      • delete

        public static boolean delete​(java.lang.String filename)
        Delete the named file. Returns false if the scheme of filename cannot be recognised, the filesystem doesnt have write capability, or the underlying filesystem could not delete the file
        Parameters:
        filename - path to file to delete
      • deleteOnExit

        public static boolean deleteOnExit​(java.lang.String path)
        Mark the named path as to be deleted on exit. Returns false if the scheme of the filename cannot be recognised, the filesystem does not have write capability, or the file system does not have deleteOnExit capability
      • exists

        public static boolean exists​(java.lang.String path)
        returns true iff the path is really a path
      • canRead

        public static boolean canRead​(java.lang.String filename)
        returns true iff path can be read
      • canWrite

        public static boolean canWrite​(java.lang.String filename)
        returns true iff path can be read
      • mkdir

        public static boolean mkdir​(java.lang.String path)
        returns true if the specificed path can be made as a directory
      • length

        public static long length​(java.lang.String filename)
        returns the length of the file, or 0L if cannot be found etc
      • isDirectory

        public static boolean isDirectory​(java.lang.String path)
        return true if path is a directory
      • rename

        public static boolean rename​(java.lang.String sourceFilename,
                                     java.lang.String destFilename)
        rename a file or directory. If the two are on different file systems, it is assumed to be a file
      • getParent

        public static java.lang.String getParent​(java.lang.String path)
        What is the parent path to the specified path?
      • list

        public static java.lang.String[] list​(java.lang.String path)
        List the contents of a directory
      • openFileReader

        public static java.io.BufferedReader openFileReader​(java.io.File file)
                                                     throws java.io.IOException
        Opens a reader to the file called file. Provided for easy overriding for encoding support etc in child classes. Called from openNextFile().
        Parameters:
        file - File to open.
        Returns:
        BufferedReader of the file
        Throws:
        java.io.IOException
      • openFileReader

        public static java.io.BufferedReader openFileReader​(java.io.File file,
                                                            java.lang.String charset)
                                                     throws java.io.IOException
        Opens a reader to the file called filename. Provided for easy overriding for encoding support etc in child classes. Called from openNextFile().
        Parameters:
        file - File to open.
        charset - Character set encoding of file. null for system default.
        Returns:
        BufferedReader of the file
        Throws:
        java.io.IOException
      • openFileReader

        public static java.io.BufferedReader openFileReader​(java.lang.String filename)
                                                     throws java.io.IOException
        Opens a reader to the file called filename. Provided for easy overriding for encoding support etc in child classes. Called from openNextFile().
        Parameters:
        filename - File to open.
        Returns:
        BufferedReader of the file
        Throws:
        java.io.IOException
      • openFileReader

        public static java.io.BufferedReader openFileReader​(java.lang.String filename,
                                                            java.lang.String charset)
                                                     throws java.io.IOException
        Opens a reader to the file called filename. Provided for easy overriding for encoding support etc in child classes. Called from openNextFile().
        Parameters:
        filename - File to open.
        charset - Character set encoding of file. null for system default.
        Returns:
        BufferedReader of the file
        Throws:
        java.io.IOException
      • openFileStream

        public static java.io.InputStream openFileStream​(java.io.File file)
                                                  throws java.io.IOException
        Opens an InputStream to a file called file.
        Parameters:
        file - File to open.
        Returns:
        InputStream of the file
        Throws:
        java.io.IOException
      • openFileRandom

        public static RandomDataInput openFileRandom​(java.io.File file)
                                              throws java.io.IOException
        Open a file for random access reading
        Throws:
        java.io.IOException
      • openFileStream

        public static java.io.InputStream openFileStream​(java.lang.String filename)
                                                  throws java.io.IOException
        Opens an InputStream to a file called filename.
        Parameters:
        filename - File to open.
        Returns:
        InputStream of the file
        Throws:
        java.io.IOException
      • writeFileStream

        public static java.io.OutputStream writeFileStream​(java.io.File file)
                                                    throws java.io.IOException
        Opens an OutputStream to a file called file.
        Parameters:
        file - File to open.
        Returns:
        OutputStream of the file
        Throws:
        java.io.IOException
      • writeFileRandom

        public static RandomDataOutput writeFileRandom​(java.io.File file)
                                                throws java.io.IOException
        Open a file for random access writing and reading
        Throws:
        java.io.IOException
      • writeFileStream

        public static java.io.OutputStream writeFileStream​(java.lang.String filename)
                                                    throws java.io.IOException
        Opens an OutputStream to a file called filename.
        Parameters:
        filename - File to open.
        Returns:
        OutputStream of the file
        Throws:
        java.io.IOException
      • writeFileWriter

        public static java.io.Writer writeFileWriter​(java.io.File file)
                                              throws java.io.IOException
        Opens an Writer to a file called file. System default encoding will be used.
        Parameters:
        file - File to open.
        Returns:
        Writer of the file
        Throws:
        java.io.IOException
      • writeFileWriter

        public static java.io.Writer writeFileWriter​(java.io.File file,
                                                     java.lang.String charset)
                                              throws java.io.IOException
        Opens an Writer to a file called file.
        Parameters:
        file - File to open.
        charset - Character set encoding of file. null for system default.
        Returns:
        Writer of the file
        Throws:
        java.io.IOException
      • writeFileWriter

        public static java.io.Writer writeFileWriter​(java.lang.String filename)
                                              throws java.io.IOException
        Opens an Writer to a file called file. System default encoding will be used.
        Parameters:
        filename - File to open.
        Returns:
        Writer of the file
        Throws:
        java.io.IOException
      • writeFileWriter

        public static java.io.Writer writeFileWriter​(java.lang.String filename,
                                                     java.lang.String charset)
                                              throws java.io.IOException
        Opens an Writer to a file called file.
        Parameters:
        filename - File to open.
        charset - Character set encoding of file. null for system default.
        Returns:
        Writer of the file
        Throws:
        java.io.IOException
      • copyFile

        public static java.lang.Long copyFile​(java.lang.String srcFilename,
                                              java.lang.String destFilename)
                                       throws java.io.IOException
        Copy a file from srcFile to destFile.
        Returns:
        null if OK
        Throws:
        java.io.IOException - if there was a problem copying
      • copyFile

        public static java.lang.Long copyFile​(java.io.File srcFile,
                                              java.io.File destFile)
                                       throws java.io.IOException
        Copy a file from srcFile to destFile.
        Returns:
        null if OK
        Throws:
        java.io.IOException - if there was a problem copying
      • copyFile

        public static java.lang.Long copyFile​(java.io.InputStream in,
                                              java.io.OutputStream out)
                                       throws java.io.IOException
        Copy all bytes from in to out
        Returns:
        null if OK throws IOException if there was a problem copying
        Throws:
        java.io.IOException
      • createChecksum

        public static java.lang.Long createChecksum​(java.io.File file)
                                             throws java.io.IOException
        Returns the CRC checksum of denoted file
        Throws:
        java.io.IOException
      • length

        public static long length​(java.io.File f)
        returns the length of file f
      • main

        public static void main​(java.lang.String[] args)
        Check that the a specified file exists as per Terrier's file system abstraction layer