Class Distance


  • public class Distance
    extends java.lang.Object
    Class containing useful utility methods for counting the number of occurrences of two terms within windows, etc.
    Since:
    3.0
    Author:
    David Hannah and Craig Macdonald
    • Constructor Summary

      Constructors 
      Constructor Description
      Distance()  
    • Method Summary

      All Methods Static Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      protected static int countTrue​(boolean[] in)  
      static int findSmallest​(int[] x, int[] y)
      Find smallest difference between two elements of two arrays
      static int noTimes​(int[][] blocksForEachTerm, int windowSize, int documentLengthInTokens)
      Counts number of blocks where all given terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
      static int noTimes​(int[] blocksOfTerm1, int[] blocksOfTerm2, int windowSize, int documentLengthInTokens)
      Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
      static int noTimes​(int[] blocksOfTerm1, int start1, int end1, int[] blocksOfTerm2, int start2, int end2, int windowSize, int documentLengthInTokens)
      Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
      static int noTimesNEW​(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)
      Returns the number of windows that have the both terms occurring, in the order specified.
      static int noTimesSameOrder​(int[][] blocksOfAllTerms1, int documentLengthInTokens)
      Deprecated.
      static int noTimesSameOrder​(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)  
      static int noTimesSameOrder​(int[] posTerm1, int start1, int end1, int[] posTerm2, int start2, int end2, int windowSize, int documentLength)  
      static int noTimesSameOrderOLD​(int[] blocksOfTerm1, int[] blocksofTerm2, int windowSize, int documentLengthInTokens)
      number of blocks where
      static void windowsForTerms​(int[] blocksOfTerm, int windowSize, int numberOfNGrams, int[] windows_for_term)
      Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term.
      static void windowsForTerms​(int[] blocksOfTerm, int start, int end, int windowSize, int numberOfNGrams, int[] windows_for_term)
      Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Distance

        public Distance()
    • Method Detail

      • noTimes

        public static final int noTimes​(int[] blocksOfTerm1,
                                        int start1,
                                        int end1,
                                        int[] blocksOfTerm2,
                                        int start2,
                                        int end2,
                                        int windowSize,
                                        int documentLengthInTokens)
        Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
        Parameters:
        blocksOfTerm1 -
        start1 - The start index for the correct blockIds in blocksOfTerm1
        end1 - The end for the correct blockIds in blocksOfTerm1
        blocksOfTerm2 -
        start2 - The start index for the correct blockIds in blocksOfTerm2
        end2 - The end index for the correct blockIds in blocksOfTerm2
        windowSize -
        documentLengthInTokens -
      • noTimes

        public static final int noTimes​(int[] blocksOfTerm1,
                                        int[] blocksOfTerm2,
                                        int windowSize,
                                        int documentLengthInTokens)
        Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
        Parameters:
        blocksOfTerm1 -
        blocksOfTerm2 -
        windowSize -
        documentLengthInTokens -
      • noTimes

        public static final int noTimes​(int[][] blocksForEachTerm,
                                        int windowSize,
                                        int documentLengthInTokens)
        Counts number of blocks where all given terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
        Parameters:
        blocksForEachTerm - - array of int[] of blocks for each term
        windowSize -
        documentLengthInTokens -
      • windowsForTerms

        public static final void windowsForTerms​(int[] blocksOfTerm,
                                                 int start,
                                                 int end,
                                                 int windowSize,
                                                 int numberOfNGrams,
                                                 int[] windows_for_term)
        Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term. To control how much of array is examined, see windowsForTerms(int[], int, int, int, int, int[]).
        Parameters:
        blocksOfTerm - - block occurrences for term
        start - - start index to consider in blocksOfTerm
        end - - end index to consider in blocksOfTerm
        windowSize - - size of each window
        numberOfNGrams - - number of windows in document
        windows_for_term - - array of length numberOfNGrams
      • windowsForTerms

        public static final void windowsForTerms​(int[] blocksOfTerm,
                                                 int windowSize,
                                                 int numberOfNGrams,
                                                 int[] windows_for_term)
        Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term. To control how much of array is examined, see windowsForTerms(int[], int, int, int, int, int[]).
        Parameters:
        blocksOfTerm - - block occurrences for term
        windowSize - - size of each window
        numberOfNGrams - - number of windows in document
        windows_for_term - - array of length numberOfNGrams
      • noTimesSameOrder

        public static final int noTimesSameOrder​(int[] term0Positions,
                                                 int[] term1Positions,
                                                 int windowSize,
                                                 int documentLengthInTokens)
      • noTimesSameOrder

        public static final int noTimesSameOrder​(int[] posTerm1,
                                                 int start1,
                                                 int end1,
                                                 int[] posTerm2,
                                                 int start2,
                                                 int end2,
                                                 int windowSize,
                                                 int documentLength)
      • noTimesNEW

        public static final int noTimesNEW​(int[] term0Positions,
                                           int[] term1Positions,
                                           int windowSize,
                                           int documentLengthInTokens)
        Returns the number of windows that have the both terms occurring, in the order specified. New version, implemented 10/6/2010 by craigm.
      • countTrue

        protected static final int countTrue​(boolean[] in)
      • noTimesSameOrder

        @Deprecated
        public static final int noTimesSameOrder​(int[][] blocksOfAllTerms1,
                                                 int documentLengthInTokens)
        Deprecated.
        number of blocks where terms occur in an ajdacent manner. dont use this method, it has no concept of windows
      • noTimesSameOrderOLD

        public static final int noTimesSameOrderOLD​(int[] blocksOfTerm1,
                                                    int[] blocksofTerm2,
                                                    int windowSize,
                                                    int documentLengthInTokens)
        number of blocks where
      • findSmallest

        public static final int findSmallest​(int[] x,
                                             int[] y)
        Find smallest difference between two elements of two arrays