org.terrier.querying
Class DFRBagExpansionTerms

java.lang.Object
  extended by org.terrier.querying.ExpansionTerms
      extended by org.terrier.querying.DFRBagExpansionTerms

public class DFRBagExpansionTerms
extends ExpansionTerms

This class implements a data structure of terms in the top-retrieved documents. In particular, this implementation treats the entire feedback set as a bag of words, and weights term occurrences in this bag.

Properties:

Author:
Gianni Amati, Ben He, Vassilis Plachouras, Craig Macdonald

Nested Class Summary
 
Nested classes/interfaces inherited from class org.terrier.querying.ExpansionTerms
ExpansionTerms.ExpansionTerm
 
Field Summary
protected  double averageDocumentLength
          The average document length in the collection.
protected  PostingIndex<BitIndexPointer> directIndex
           
protected  DocumentIndex documentIndex
           
protected  int feedbackDocumentCount
           
protected  Lexicon<java.lang.String> lexicon
          The lexicon used for retrieval.
 double normaliser
          The parameter-free term weight normaliser.
protected  int numberOfDocuments
          The number of documents in the collection.
protected  long numberOfTokens
          The number of tokens in the collection.
protected  gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> terms
          The terms in the top-retrieval documents.
protected  double totalDocumentLength
          The number of tokens in the X top ranked documents.
 
Fields inherited from class org.terrier.querying.ExpansionTerms
EXPANSIONTERM_DESC_SCORE_SORTER, model, originalTermFreqs, originalTermids
 
Constructor Summary
DFRBagExpansionTerms(CollectionStatistics collStats, Lexicon<java.lang.String> _lexicon, PostingIndex<BitIndexPointer> _directIndex, DocumentIndex _documentIndex)
          Constructs an instance of ExpansionTerms.
 
Method Summary
 void assignWeights(QueryExpansionModel QEModel)
          Assign weight to terms that are stored in ExpansionTerm[] terms.
 void deleteTerm(int termid)
          Remove the records for a given term
 double getDocumentFrequency(int termId)
          Returns the number of the top-ranked documents a given term occurs in.
 SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms)
          This method implements the functionality of assigning expansion weights to the terms in the top-retrieved documents, and returns the most informative terms among them.
protected  SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms, QueryExpansionModel QEModel)
           
 double getExpansionProbability(int termId)
          Returns the probability of a given termid occurring in the expansion documents.
 gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> getExpansionTerms()
          Returns expanded terms
 double getExpansionWeight(int termId)
          Returns the weight of a term with the given term identifier.
 double getExpansionWeight(int termId, QueryExpansionModel model)
          Returns the weight of a term with the given term identifier, computed by the specified query expansion model.
 double getExpansionWeight(java.lang.String term)
          Returns the weight of a given term.
 double getExpansionWeight(java.lang.String term, QueryExpansionModel model)
          Returns the weight of a given term, computed by the specified query expansion model.
 double getFrequency(int termId)
          Returns the frequency of a given term in the top-ranked documents.
 double getFrequency(java.lang.String term)
          Returns the frequency of a given term in the top-ranked documents.
 int getNumberOfUniqueTerms()
          Returns the unique number of terms found in all the top-ranked documents
 double getOriginalExpansionWeight(java.lang.String term)
          Returns the un-normalised weight of a given term.
 int[] getTermIds()
          Returns the termids of all terms found in the top-ranked documents
 void insertDocument(FeedbackDocument doc)
          Adds the feedback document to the feedback set.
 void insertDocument(int docid, int rank, double score)
          Adds the feedback document from the index given a docid
protected  void insertTerm(int termID, double withinDocumentFrequency)
          Add a term in the X top-retrieved documents as a candidate of the expanded terms.
 void setTotalDocumentLength(double totalLength)
          Allows the totalDocumentLength to be set after the fact
 
Methods inherited from class org.terrier.querying.ExpansionTerms
setModel, setOriginalQueryTerms
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

terms

protected gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> terms
The terms in the top-retrieval documents.


lexicon

protected Lexicon<java.lang.String> lexicon
The lexicon used for retrieval.


directIndex

protected PostingIndex<BitIndexPointer> directIndex

documentIndex

protected DocumentIndex documentIndex

numberOfDocuments

protected int numberOfDocuments
The number of documents in the collection.


numberOfTokens

protected long numberOfTokens
The number of tokens in the collection.


averageDocumentLength

protected double averageDocumentLength
The average document length in the collection.


totalDocumentLength

protected double totalDocumentLength
The number of tokens in the X top ranked documents.


normaliser

public double normaliser
The parameter-free term weight normaliser.


feedbackDocumentCount

protected int feedbackDocumentCount
Constructor Detail

DFRBagExpansionTerms

public DFRBagExpansionTerms(CollectionStatistics collStats,
                            Lexicon<java.lang.String> _lexicon,
                            PostingIndex<BitIndexPointer> _directIndex,
                            DocumentIndex _documentIndex)
Constructs an instance of ExpansionTerms.

Parameters:
collStats - Statistics of the used corpora
_lexicon - Lexicon The lexicon used for retrieval.
_directIndex - DirectIndex to use for finding terms for documents
_documentIndex - DocumentIndex to use for finding statistics about documents
Method Detail

setTotalDocumentLength

public void setTotalDocumentLength(double totalLength)
Allows the totalDocumentLength to be set after the fact


getTermIds

public int[] getTermIds()
Returns the termids of all terms found in the top-ranked documents


getNumberOfUniqueTerms

public int getNumberOfUniqueTerms()
Returns the unique number of terms found in all the top-ranked documents

Specified by:
getNumberOfUniqueTerms in class ExpansionTerms

getExpansionTerms

public gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> getExpansionTerms()
Returns expanded terms

Returns:
terms

getExpandedTerms

public SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms)
This method implements the functionality of assigning expansion weights to the terms in the top-retrieved documents, and returns the most informative terms among them. Conservative Query Expansion (ConservativeQE) is used if the number of expanded terms is set to 0. In this case, no new query terms are added to the query, only the existing ones reweighted.

Specified by:
getExpandedTerms in class ExpansionTerms
Parameters:
numberOfExpandedTerms - int The number of terms to extract from the top-retrieved documents. ConservativeQE is set if this parameter is set to 0. * @return TermTreeNode[] The expanded terms.
Returns:
weighted query terms

getExpandedTerms

protected SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms,
                                             QueryExpansionModel QEModel)

deleteTerm

public void deleteTerm(int termid)
Remove the records for a given term


getExpansionWeight

public double getExpansionWeight(java.lang.String term,
                                 QueryExpansionModel model)
Returns the weight of a given term, computed by the specified query expansion model.

Parameters:
term - String the term to set the weight for.
model - QueryExpansionModel the used query expansion model.
Returns:
double the weight of the specified term.

getExpansionWeight

public double getExpansionWeight(java.lang.String term)
Returns the weight of a given term.

Parameters:
term - String the term to get the weight for.
Returns:
double the weight of the specified term.

getOriginalExpansionWeight

public double getOriginalExpansionWeight(java.lang.String term)
Returns the un-normalised weight of a given term.

Parameters:
term - String the given term.
Returns:
The un-normalised term weight.

getFrequency

public double getFrequency(java.lang.String term)
Returns the frequency of a given term in the top-ranked documents.

Parameters:
term - String the term to get the frequency for.
Returns:
double the frequency of the specified term in the top-ranked documents.

getFrequency

public double getFrequency(int termId)
Returns the frequency of a given term in the top-ranked documents.

Parameters:
termId - int the id of the term to get the frequency for.
Returns:
double the frequency of the specified term in the top-ranked documents.

getDocumentFrequency

public double getDocumentFrequency(int termId)
Returns the number of the top-ranked documents a given term occurs in.

Parameters:
termId - int the id of the term to get the frequency for.
Returns:
double the document frequency of the specified term in the top-ranked documents.

assignWeights

public void assignWeights(QueryExpansionModel QEModel)
Assign weight to terms that are stored in ExpansionTerm[] terms.

Parameters:
QEModel - QueryExpansionModel the used query expansion model.

getExpansionWeight

public double getExpansionWeight(int termId,
                                 QueryExpansionModel model)
Returns the weight of a term with the given term identifier, computed by the specified query expansion model.

Parameters:
termId - int the term identifier to set the weight for.
model - QueryExpansionModel the used query expansion model.
Returns:
double the weight of the specified term.

getExpansionWeight

public double getExpansionWeight(int termId)
Returns the weight of a term with the given term identifier.

Parameters:
termId - int the term identifier to set the weight for.
Returns:
double the weight of the specified term.

getExpansionProbability

public double getExpansionProbability(int termId)
Returns the probability of a given termid occurring in the expansion documents. Returns the quotient document frequency in the expansion documents, divided by the total length of all the expansion documents.

Parameters:
termId - int the term identifier to obtain the probability
Returns:
double the probability of the term

insertDocument

public void insertDocument(FeedbackDocument doc)
                    throws java.io.IOException
Adds the feedback document to the feedback set.

Specified by:
insertDocument in class ExpansionTerms
Throws:
java.io.IOException

insertDocument

public void insertDocument(int docid,
                           int rank,
                           double score)
                    throws java.io.IOException
Adds the feedback document from the index given a docid

Throws:
java.io.IOException

insertTerm

protected void insertTerm(int termID,
                          double withinDocumentFrequency)
Add a term in the X top-retrieved documents as a candidate of the expanded terms.

Parameters:
termID - int the integer identifier of a term
withinDocumentFrequency - double the within document frequency of a term


Terrier 3.5. Copyright © 2004-2011 University of Glasgow