java.lang.Object
- org.terrier.querying.ExpansionTerms
- - org.terrier.querying.DFRBagExpansionTerms

```
public class DFRBagExpansionTerms
extends ExpansionTerms
```
This class implements a data structure of terms in the top-retrieved documents. In particular, this implementation treats the entire feedback set as a bag of words, and weights term occurrences in this bag.
Properties:
- expansion.mindocuments - the minimum number of documents a term must exist in before it can be considered to be informative. Defaults to 2. For more information, see Giambattista Amati: Information Theoretic Approach to Information Extraction. FQAS 2006: 519-529 DOI 10.1007/11766254_44
Author:

Gianni Amati, Ben He, Vassilis Plachouras, Craig Macdonald

Nested Class Summary
- Nested classes/interfaces inherited from class org.terrier.querying.ExpansionTerms
  ExpansionTerms.ExpansionTerm

Field Summary

Fields
Modifier and Type	Field	Description
`protected double`	`averageDocumentLength`	The average document length in the collection.
`protected PostingIndex<?>`	`directIndex`
`protected DocumentIndex`	`documentIndex`
`protected int`	`feedbackDocumentCount`
`protected Lexicon<java.lang.String>`	`lexicon`	The lexicon used for retrieval.
`protected static org.slf4j.Logger`	`logger`	The logger used
`double`	`normaliser`	The parameter-free term weight normaliser.
`protected int`	`numberOfDocuments`	The number of documents in the collection.
`protected long`	`numberOfTokens`	The number of tokens in the collection.
`protected gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm>`	`terms`	The terms in the top-retrieval documents.
`protected double`	`totalDocumentLength`	The number of tokens in the X top ranked documents.

Fields inherited from class org.terrier.querying.ExpansionTerms
EXPANSIONTERM_DESC_SCORE_SORTER, model, originalTermFreqs, originalTermids

Constructor Summary

Constructors
Constructor	Description
`DFRBagExpansionTerms(CollectionStatistics collStats, Lexicon<java.lang.String> _lexicon, PostingIndex<?> _directIndex, DocumentIndex _documentIndex)`	Constructs an instance of ExpansionTerms.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`assignWeights(QueryExpansionModel QEModel)`	Assign weight to terms that are stored in ExpansionTerm[] terms.
`void`	`deleteTerm(int termid)`	Remove the records for a given term
`double`	`getDocumentFrequency(int termId)`	Returns the number of the top-ranked documents a given term occurs in.
`SingleTermQuery[]`	`getExpandedTerms(int numberOfExpandedTerms)`	This method implements the functionality of assigning expansion weights to the terms in the top-retrieved documents, and returns the most informative terms among them.
`protected SingleTermQuery[]`	`getExpandedTerms(int numberOfExpandedTerms, QueryExpansionModel QEModel)`
`double`	`getExpansionProbability(int termId)`	Returns the probability of a given termid occurring in the expansion documents.
`gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm>`	`getExpansionTerms()`	Returns expanded terms
`double`	`getExpansionWeight(int termId)`	Returns the weight of a term with the given term identifier.
`double`	`getExpansionWeight(int termId, QueryExpansionModel model)`	Returns the weight of a term with the given term identifier, computed by the specified query expansion model.
`double`	`getExpansionWeight(java.lang.String term)`	Returns the weight of a given term.
`double`	`getExpansionWeight(java.lang.String term, QueryExpansionModel model)`	Returns the weight of a given term, computed by the specified query expansion model.
`double`	`getFrequency(int termId)`	Returns the frequency of a given term in the top-ranked documents.
`double`	`getFrequency(java.lang.String term)`	Returns the frequency of a given term in the top-ranked documents.
`int`	`getNumberOfUniqueTerms()`	Returns the unique number of terms found in all the top-ranked documents
`double`	`getOriginalExpansionWeight(java.lang.String term)`	Returns the un-normalised weight of a given term.
`int[]`	`getTermIds()`	Returns the termids of all terms found in the top-ranked documents
`void`	`insertDocument(int docid, int rank, double score)`	Adds the feedback document from the index given a docid
`void`	`insertDocument(FeedbackDocument doc)`	Adds the feedback document to the feedback set.
`protected void`	`insertTerm(int termID, double withinDocumentFrequency)`	Add a term in the X top-retrieved documents as a candidate of the expanded terms.
`void`	`setOriginalQueryTerms(MatchingQueryTerms query)`	Set the original query terms.
`void`	`setTotalDocumentLength(double totalLength)`	Allows the totalDocumentLength to be set after the fact

Methods inherited from class org.terrier.querying.ExpansionTerms
getOriginalTermIds, setModel

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - logger
```
protected static org.slf4j.Logger logger
```
    The logger used
  - terms
```
protected gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> terms
```
    The terms in the top-retrieval documents.
  - lexicon
```
protected Lexicon<java.lang.String> lexicon
```
    The lexicon used for retrieval.
  - directIndex
```
protected PostingIndex<?> directIndex
```
  - documentIndex
```
protected DocumentIndex documentIndex
```
  - numberOfDocuments
```
protected int numberOfDocuments
```
    The number of documents in the collection.
  - numberOfTokens
```
protected long numberOfTokens
```
    The number of tokens in the collection.
  - averageDocumentLength
```
protected double averageDocumentLength
```
    The average document length in the collection.
  - totalDocumentLength
```
protected double totalDocumentLength
```
    The number of tokens in the X top ranked documents.
  - normaliser
```
public double normaliser
```
    The parameter-free term weight normaliser.
  - feedbackDocumentCount
```
protected int feedbackDocumentCount
```
- Constructor Detail
  - DFRBagExpansionTerms
```
public DFRBagExpansionTerms(CollectionStatistics collStats,
                            Lexicon<java.lang.String> _lexicon,
                            PostingIndex<?> _directIndex,
                            DocumentIndex _documentIndex)
```
    Constructs an instance of ExpansionTerms.
    
    Parameters:
    
    collStats - Statistics of the used corpora
    
    _lexicon - Lexicon The lexicon used for retrieval.
    
    _directIndex - DirectIndex to use for finding terms for documents
    
    _documentIndex - DocumentIndex to use for finding statistics about documents
- Method Detail
  - setTotalDocumentLength
```
public void setTotalDocumentLength(double totalLength)
```
    Allows the totalDocumentLength to be set after the fact
  - setOriginalQueryTerms
```
public void setOriginalQueryTerms(MatchingQueryTerms query)
```
    Description copied from class: ExpansionTerms
    
    Set the original query terms.
    
    Overrides:
    
    setOriginalQueryTerms in class ExpansionTerms
    
    Parameters:
    
    query - The original query.
  - getTermIds
```
public int[] getTermIds()
```
    Returns the termids of all terms found in the top-ranked documents
  - getNumberOfUniqueTerms
```
public int getNumberOfUniqueTerms()
```
    Returns the unique number of terms found in all the top-ranked documents
    
    Specified by:
    
    getNumberOfUniqueTerms in class ExpansionTerms
  - getExpansionTerms
```
public gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> getExpansionTerms()
```
    Returns expanded terms
    
    Returns:
    
    terms
  - getExpandedTerms
```
public SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms)
```
    This method implements the functionality of assigning expansion weights to the terms in the top-retrieved documents, and returns the most informative terms among them. Conservative Query Expansion (ConservativeQE) is used if the number of expanded terms is set to 0. In this case, no new query terms are added to the query, only the existing ones reweighted.
    
    Specified by:
    
    getExpandedTerms in class ExpansionTerms
    
    Parameters:
    
    numberOfExpandedTerms - int The number of terms to extract from the top-retrieved documents. ConservativeQE is set if this parameter is set to 0. * @return TermTreeNode[] The expanded terms.
    
    Returns:
    
    weighted query terms
  - getExpandedTerms
```
protected SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms,
                                             QueryExpansionModel QEModel)
```
  - deleteTerm
```
public void deleteTerm(int termid)
```
    Remove the records for a given term
  - getExpansionWeight
```
public double getExpansionWeight(java.lang.String term,
                                 QueryExpansionModel model)
```
    Returns the weight of a given term, computed by the specified query expansion model.
    
    Parameters:
    
    term - String the term to set the weight for.
    
    model - QueryExpansionModel the used query expansion model.
    
    Returns:
    
    double the weight of the specified term.
  - getExpansionWeight
```
public double getExpansionWeight(java.lang.String term)
```
    Returns the weight of a given term.
    
    Parameters:
    
    term - String the term to get the weight for.
    
    Returns:
    
    double the weight of the specified term.
  - getOriginalExpansionWeight
```
public double getOriginalExpansionWeight(java.lang.String term)
```
    Returns the un-normalised weight of a given term.
    
    Parameters:
    
    term - String the given term.
    
    Returns:
    
    The un-normalised term weight.
  - getFrequency
```
public double getFrequency(java.lang.String term)
```
    Returns the frequency of a given term in the top-ranked documents.
    
    Parameters:
    
    term - String the term to get the frequency for.
    
    Returns:
    
    double the frequency of the specified term in the top-ranked documents.
  - getFrequency
```
public double getFrequency(int termId)
```
    Returns the frequency of a given term in the top-ranked documents.
    
    Parameters:
    
    termId - int the id of the term to get the frequency for.
    
    Returns:
    
    double the frequency of the specified term in the top-ranked documents.
  - getDocumentFrequency
```
public double getDocumentFrequency(int termId)
```
    Returns the number of the top-ranked documents a given term occurs in.
    
    Parameters:
    
    termId - int the id of the term to get the frequency for.
    
    Returns:
    
    double the document frequency of the specified term in the top-ranked documents.
  - assignWeights
```
public void assignWeights(QueryExpansionModel QEModel)
```
    Assign weight to terms that are stored in ExpansionTerm[] terms.
    
    Parameters:
    
    QEModel - QueryExpansionModel the used query expansion model.
  - getExpansionWeight
```
public double getExpansionWeight(int termId,
                                 QueryExpansionModel model)
```
    Returns the weight of a term with the given term identifier, computed by the specified query expansion model.
    
    Parameters:
    
    termId - int the term identifier to set the weight for.
    
    model - QueryExpansionModel the used query expansion model.
    
    Returns:
    
    double the weight of the specified term.
  - getExpansionWeight
```
public double getExpansionWeight(int termId)
```
    Returns the weight of a term with the given term identifier.
    
    Parameters:
    
    termId - int the term identifier to set the weight for.
    
    Returns:
    
    double the weight of the specified term.
  - getExpansionProbability
```
public double getExpansionProbability(int termId)
```
    Returns the probability of a given termid occurring in the expansion documents. Returns the quotient document frequency in the expansion documents, divided by the total length of all the expansion documents.
    
    Parameters:
    
    termId - int the term identifier to obtain the probability
    
    Returns:
    
    double the probability of the term
  - insertDocument
```
public void insertDocument(FeedbackDocument doc)
                    throws java.io.IOException
```
    Adds the feedback document to the feedback set.
    
    Specified by:
    
    insertDocument in class ExpansionTerms
    
    Throws:
    
    java.io.IOException
  - insertDocument
```
public void insertDocument(int docid,
                           int rank,
                           double score)
                    throws java.io.IOException
```
    Adds the feedback document from the index given a docid
    
    Throws:
    
    java.io.IOException
  - insertTerm
```
protected void insertTerm(int termID,
                          double withinDocumentFrequency)
```
    Add a term in the X top-retrieved documents as a candidate of the expanded terms.
    
    Parameters:
    
    termID - int the integer identifier of a term
    
    withinDocumentFrequency - double the within document frequency of a term

Class DFRBagExpansionTerms

Nested Class Summary

Nested classes/interfaces inherited from class org.terrier.querying.ExpansionTerms

Field Summary

Fields inherited from class org.terrier.querying.ExpansionTerms

Constructor Summary

Method Summary

Methods inherited from class org.terrier.querying.ExpansionTerms

Methods inherited from class java.lang.Object

Field Detail

logger

terms

lexicon

directIndex

documentIndex

numberOfDocuments

numberOfTokens

averageDocumentLength

totalDocumentLength

normaliser

feedbackDocumentCount

Constructor Detail

DFRBagExpansionTerms

Method Detail

setTotalDocumentLength

setOriginalQueryTerms

getTermIds

getNumberOfUniqueTerms

getExpansionTerms

getExpandedTerms

getExpandedTerms

deleteTerm

getExpansionWeight

getExpansionWeight

getOriginalExpansionWeight

getFrequency

getFrequency

getDocumentFrequency

assignWeights

getExpansionWeight

getExpansionWeight

getExpansionProbability

insertDocument

insertDocument

insertTerm