org.terrier.utility
Class Distance

java.lang.Object
  extended by org.terrier.utility.Distance

public class Distance
extends java.lang.Object

Class containing useful utility methods for counting the number of occurrences of two terms within windows, etc.

Since:
3.0
Author:
David Hannah and Craig Macdonald

Constructor Summary
Distance()
           
 
Method Summary
protected static int countTrue(boolean[] in)
           
static int findSmallest(int[] x, int[] y)
          Find smallest difference between two elements of two arrays
static int noTimes(int[][] blocksForEachTerm, int windowSize, int documentLengthInTokens)
          Counts number of blocks where all given terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
static int noTimes(int[] blocksOfTerm1, int[] blocksOfTerm2, int windowSize, int documentLengthInTokens)
          Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
static int noTimes(int[] blocksOfTerm1, int start1, int end1, int[] blocksOfTerm2, int start2, int end2, int windowSize, int documentLengthInTokens)
          Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given
static int noTimesNEW(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)
          Returns the number of windows that have the both terms occurring, in the order specified.
static int noTimesSameOrder(int[][] blocksOfAllTerms1, int documentLengthInTokens)
          Deprecated. 
static int noTimesSameOrder(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)
          Returns the number of windows that have the both terms occurring, in the order specified.
static int noTimesSameOrder(int[] term0Positions, int pos0, int length0, int[] term1Positions, int pos1, int length1, int windowSize, int documentLengthInTokens)
          Returns the number of windows that have the both terms occurring, in the order specified.
static int noTimesSameOrderOLD(int[] blocksOfTerm1, int[] blocksofTerm2, int windowSize, int documentLengthInTokens)
          number of blocks where
static void windowsForTerms(int[] blocksOfTerm, int windowSize, int numberOfNGrams, int[] windows_for_term)
          Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term.
static void windowsForTerms(int[] blocksOfTerm, int start, int end, int windowSize, int numberOfNGrams, int[] windows_for_term)
          Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Distance

public Distance()
Method Detail

noTimes

public static final int noTimes(int[] blocksOfTerm1,
                                int start1,
                                int end1,
                                int[] blocksOfTerm2,
                                int start2,
                                int end2,
                                int windowSize,
                                int documentLengthInTokens)
Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given

Parameters:
blocksOfTerm1 -
start1 - The start index for the correct blockIds in blocksOfTerm1
end1 - The end for the correct blockIds in blocksOfTerm1
blocksOfTerm2 -
start2 - The start index for the correct blockIds in blocksOfTerm2
end2 - The end index for the correct blockIds in blocksOfTerm2
windowSize -
documentLengthInTokens -

noTimes

public static final int noTimes(int[] blocksOfTerm1,
                                int[] blocksOfTerm2,
                                int windowSize,
                                int documentLengthInTokens)
Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given

Parameters:
blocksOfTerm1 -
blocksOfTerm2 -
windowSize -
documentLengthInTokens -

noTimes

public static final int noTimes(int[][] blocksForEachTerm,
                                int windowSize,
                                int documentLengthInTokens)
Counts number of blocks where all given terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given

Parameters:
blocksForEachTerm - - array of int[] of blocks for each term
windowSize -
documentLengthInTokens -

windowsForTerms

public static final void windowsForTerms(int[] blocksOfTerm,
                                         int start,
                                         int end,
                                         int windowSize,
                                         int numberOfNGrams,
                                         int[] windows_for_term)
Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term. To control how much of array is examined, see windowsForTerms(int[], int, int, int, int, int[]).

Parameters:
blocksOfTerm - - block occurrences for term
start - - start index to consider in blocksOfTerm
end - - end index to consider in blocksOfTerm
windowSize - - size of each window
numberOfNGrams - - number of windows in document
windows_for_term - - array of length numberOfNGrams

windowsForTerms

public static final void windowsForTerms(int[] blocksOfTerm,
                                         int windowSize,
                                         int numberOfNGrams,
                                         int[] windows_for_term)
Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term. To control how much of array is examined, see windowsForTerms(int[], int, int, int, int, int[]).

Parameters:
blocksOfTerm - - block occurrences for term
windowSize - - size of each window
numberOfNGrams - - number of windows in document
windows_for_term - - array of length numberOfNGrams

noTimesSameOrder

public static final int noTimesSameOrder(int[] term0Positions,
                                         int[] term1Positions,
                                         int windowSize,
                                         int documentLengthInTokens)
Returns the number of windows that have the both terms occurring, in the order specified. New version, implemented 10/6/2010 by craigm.


noTimesNEW

public static final int noTimesNEW(int[] term0Positions,
                                   int[] term1Positions,
                                   int windowSize,
                                   int documentLengthInTokens)
Returns the number of windows that have the both terms occurring, in the order specified. New version, implemented 10/6/2010 by craigm.


noTimesSameOrder

public static final int noTimesSameOrder(int[] term0Positions,
                                         int pos0,
                                         int length0,
                                         int[] term1Positions,
                                         int pos1,
                                         int length1,
                                         int windowSize,
                                         int documentLengthInTokens)
Returns the number of windows that have the both terms occurring, in the order specified. New version, implemented 10/6/2010 by craigm.


countTrue

protected static int countTrue(boolean[] in)

noTimesSameOrder

@Deprecated
public static final int noTimesSameOrder(int[][] blocksOfAllTerms1,
                                                    int documentLengthInTokens)
Deprecated. 

number of blocks where terms occur in an ajdacent manner. dont use this method, it has no concept of windows


noTimesSameOrderOLD

public static final int noTimesSameOrderOLD(int[] blocksOfTerm1,
                                            int[] blocksofTerm2,
                                            int windowSize,
                                            int documentLengthInTokens)
number of blocks where


findSmallest

public static final int findSmallest(int[] x,
                                     int[] y)
Find smallest difference between two elements of two arrays



Terrier 3.5. Copyright © 2004-2011 University of Glasgow