org.terrier.matching.dsms
Class PhraseScoreModifier

java.lang.Object
  extended by org.terrier.matching.dsms.PhraseScoreModifier
All Implemented Interfaces:
java.lang.Cloneable, DocumentScoreModifier

public class PhraseScoreModifier
extends java.lang.Object
implements DocumentScoreModifier

Modifies the scores of the documents which contain, or do not contain a given phrase.

Author:
Vassilis Plachouras, Craig Macdonald

Field Summary
protected static int BLOCK_SIZE
          Number of tokens in one block.
protected  int blockDistance
          The maximum distance, in blocks, that is allowed between the phrase terms.
protected static org.apache.log4j.Logger logger
          the logger for this class
protected  java.util.List<Query> phraseTerms
          A list of the strings of the phrase terms.
protected  boolean required
          Indicates whether the phrase should appear in the retrieved documents, or not.
 
Constructor Summary
PhraseScoreModifier(java.util.List<Query> pTerms)
          Constructs a phrase score modifier for a given set of query terms.
PhraseScoreModifier(java.util.List<Query> pTerms, boolean r)
          Constructs a phrase score modifier for a given set of query terms.
PhraseScoreModifier(java.util.List<Query> pTerms, boolean r, int bDist)
          Constructs a phrase score modifier for a given set of query terms, whether they are required to appear in a document, and the allowed distance between the phrase terms.
PhraseScoreModifier(java.util.List<Query> pTerms, int bDist)
          Constructs a phrase score modifier for a given set of query terms and the allowed distance between them.
 
Method Summary
 java.lang.Object clone()
          Clones this DSM.
 java.lang.String getName()
          Returns the name of the modifier.
 boolean modifyScores(Index index, MatchingQueryTerms terms, ResultSet set)
          Modifies the scores of documents, in which there exist, or there does not exist a given phrase.
protected  int[] range(int[] array, int floor, int ceiling)
          Performs a binary search in an array and returns the indices of the array for which the elements of the array are higher and lower than the given floor and ceiling.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final org.apache.log4j.Logger logger
the logger for this class


blockDistance

protected int blockDistance
The maximum distance, in blocks, that is allowed between the phrase terms. The default value of one corresponds to phrase search, while any higher value enables proximity search.


phraseTerms

protected java.util.List<Query> phraseTerms
A list of the strings of the phrase terms.


required

protected boolean required
Indicates whether the phrase should appear in the retrieved documents, or not. The default value is true.


BLOCK_SIZE

protected static final int BLOCK_SIZE
Number of tokens in one block. As defined in ApplicationSetup.BLOCK_SIZE

Constructor Detail

PhraseScoreModifier

public PhraseScoreModifier(java.util.List<Query> pTerms)
Constructs a phrase score modifier for a given set of query terms.

Parameters:
pTerms - ArrayList the terms that make up the query.

PhraseScoreModifier

public PhraseScoreModifier(java.util.List<Query> pTerms,
                           int bDist)
Constructs a phrase score modifier for a given set of query terms and the allowed distance between them.

Parameters:
pTerms - ArrayList the terms that make up the query.
bDist - int the allowed distance between phrase terms.

PhraseScoreModifier

public PhraseScoreModifier(java.util.List<Query> pTerms,
                           boolean r)
Constructs a phrase score modifier for a given set of query terms.

Parameters:
pTerms - ArrayList the terms that make up the query.
r - boolean indicates whether the phrase is required.

PhraseScoreModifier

public PhraseScoreModifier(java.util.List<Query> pTerms,
                           boolean r,
                           int bDist)
Constructs a phrase score modifier for a given set of query terms, whether they are required to appear in a document, and the allowed distance between the phrase terms.

Parameters:
pTerms - ArrayList the terms that make up the query.
r - boolean indicates whether the phrase is required.
bDist - int the allowed distance between the phrase terms.
Method Detail

getName

public java.lang.String getName()
Returns the name of the modifier.

Specified by:
getName in interface DocumentScoreModifier
Returns:
String the name of the modifier.

clone

public java.lang.Object clone()
Clones this DSM. Note that phraseTerms is shallow copied, because Strings are immutable

Specified by:
clone in interface DocumentScoreModifier
Overrides:
clone in class java.lang.Object

modifyScores

public boolean modifyScores(Index index,
                            MatchingQueryTerms terms,
                            ResultSet set)
Modifies the scores of documents, in which there exist, or there does not exist a given phrase.

Specified by:
modifyScores in interface DocumentScoreModifier
Parameters:
index - Index the data structures to use.
terms - MatchingQueryTerms the terms to be matched for the query. This does not correspond to the phrase terms necessarily, but to all the terms of the query.
set - ResultSet the result set for the query.
Returns:
true if any scores have been altered

range

protected int[] range(int[] array,
                      int floor,
                      int ceiling)
Performs a binary search in an array and returns the indices of the array for which the elements of the array are higher and lower than the given floor and ceiling. This method is based on code from http://www.tbray.org/ongoing/org/tbray/ongoing/BinarySearch.java. Corrected for binary search bug: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

Parameters:
array - the array to search in
floor - the lower limit of the range we want to check for.
ceiling - the upper limit of the range we want to check for.
Returns:
int[] an array of two integers. The first integer corresponds to the index of the element of the array, which is lower than the floor, and the second integer corresponds to index of the element of the array, which is higher than the ceiling.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow