Package org.terrier.utility
Class Distance
- java.lang.Object
-
- org.terrier.utility.Distance
-
public class Distance extends java.lang.Object
Class containing useful utility methods for counting the number of occurrences of two terms within windows, etc.- Since:
- 3.0
- Author:
- David Hannah and Craig Macdonald
-
-
Constructor Summary
Constructors Constructor Description Distance()
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected static int
countTrue(boolean[] in)
static int
findSmallest(int[] x, int[] y)
Find smallest difference between two elements of two arraysstatic int
noTimes(int[][] blocksForEachTerm, int windowSize, int documentLengthInTokens)
Counts number of blocks where all given terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as givenstatic int
noTimes(int[] blocksOfTerm1, int[] blocksOfTerm2, int windowSize, int documentLengthInTokens)
Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as givenstatic int
noTimes(int[] blocksOfTerm1, int start1, int end1, int[] blocksOfTerm2, int start2, int end2, int windowSize, int documentLengthInTokens)
Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as givenstatic int
noTimesNEW(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)
Returns the number of windows that have the both terms occurring, in the order specified.static int
noTimesSameOrder(int[][] blocksOfAllTerms1, int documentLengthInTokens)
Deprecated.static int
noTimesSameOrder(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)
static int
noTimesSameOrder(int[] posTerm1, int start1, int end1, int[] posTerm2, int start2, int end2, int windowSize, int documentLength)
static int
noTimesSameOrderOLD(int[] blocksOfTerm1, int[] blocksofTerm2, int windowSize, int documentLengthInTokens)
number of blocks wherestatic void
windowsForTerms(int[] blocksOfTerm, int windowSize, int numberOfNGrams, int[] windows_for_term)
Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term.static void
windowsForTerms(int[] blocksOfTerm, int start, int end, int windowSize, int numberOfNGrams, int[] windows_for_term)
Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term.
-
-
-
Method Detail
-
noTimes
public static final int noTimes(int[] blocksOfTerm1, int start1, int end1, int[] blocksOfTerm2, int start2, int end2, int windowSize, int documentLengthInTokens)
Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given- Parameters:
blocksOfTerm1
-start1
- The start index for the correct blockIds in blocksOfTerm1end1
- The end for the correct blockIds in blocksOfTerm1blocksOfTerm2
-start2
- The start index for the correct blockIds in blocksOfTerm2end2
- The end index for the correct blockIds in blocksOfTerm2windowSize
-documentLengthInTokens
-
-
noTimes
public static final int noTimes(int[] blocksOfTerm1, int[] blocksOfTerm2, int windowSize, int documentLengthInTokens)
Counts number of blocks where two terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given- Parameters:
blocksOfTerm1
-blocksOfTerm2
-windowSize
-documentLengthInTokens
-
-
noTimes
public static final int noTimes(int[][] blocksForEachTerm, int windowSize, int documentLengthInTokens)
Counts number of blocks where all given terms occur within a block of windowSize in length, in a document of length documentLengthInTokens where the blocks for the terms are as given- Parameters:
blocksForEachTerm
- - array of int[] of blocks for each termwindowSize
-documentLengthInTokens
-
-
windowsForTerms
public static final void windowsForTerms(int[] blocksOfTerm, int start, int end, int windowSize, int numberOfNGrams, int[] windows_for_term)
Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term. To control how much of array is examined, see windowsForTerms(int[], int, int, int, int, int[]).- Parameters:
blocksOfTerm
- - block occurrences for termstart
- - start index to consider in blocksOfTermend
- - end index to consider in blocksOfTermwindowSize
- - size of each windownumberOfNGrams
- - number of windows in documentwindows_for_term
- - array of length numberOfNGrams
-
windowsForTerms
public static final void windowsForTerms(int[] blocksOfTerm, int windowSize, int numberOfNGrams, int[] windows_for_term)
Sets the number of occurrences of a term in each window, given the specified window size, the number of n-grams in the document, and the blocks of the term. To control how much of array is examined, see windowsForTerms(int[], int, int, int, int, int[]).- Parameters:
blocksOfTerm
- - block occurrences for termwindowSize
- - size of each windownumberOfNGrams
- - number of windows in documentwindows_for_term
- - array of length numberOfNGrams
-
noTimesSameOrder
public static final int noTimesSameOrder(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)
-
noTimesSameOrder
public static final int noTimesSameOrder(int[] posTerm1, int start1, int end1, int[] posTerm2, int start2, int end2, int windowSize, int documentLength)
-
noTimesNEW
public static final int noTimesNEW(int[] term0Positions, int[] term1Positions, int windowSize, int documentLengthInTokens)
Returns the number of windows that have the both terms occurring, in the order specified. New version, implemented 10/6/2010 by craigm.
-
countTrue
protected static final int countTrue(boolean[] in)
-
noTimesSameOrder
@Deprecated public static final int noTimesSameOrder(int[][] blocksOfAllTerms1, int documentLengthInTokens)
Deprecated.number of blocks where terms occur in an ajdacent manner. dont use this method, it has no concept of windows
-
noTimesSameOrderOLD
public static final int noTimesSameOrderOLD(int[] blocksOfTerm1, int[] blocksofTerm2, int windowSize, int documentLengthInTokens)
number of blocks where
-
findSmallest
public static final int findSmallest(int[] x, int[] y)
Find smallest difference between two elements of two arrays
-
-