Package org.terrier.querying
Class DFRBagExpansionTerms
- java.lang.Object
-
- org.terrier.querying.ExpansionTerms
-
- org.terrier.querying.DFRBagExpansionTerms
-
public class DFRBagExpansionTerms extends ExpansionTerms
This class implements a data structure of terms in the top-retrieved documents. In particular, this implementation treats the entire feedback set as a bag of words, and weights term occurrences in this bag.Properties:
- expansion.mindocuments - the minimum number of documents a term must exist in before it can be considered to be informative. Defaults to 2. For more information, see Giambattista Amati: Information Theoretic Approach to Information Extraction. FQAS 2006: 519-529 DOI 10.1007/11766254_44
- Author:
- Gianni Amati, Ben He, Vassilis Plachouras, Craig Macdonald
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.terrier.querying.ExpansionTerms
ExpansionTerms.ExpansionTerm
-
-
Field Summary
Fields Modifier and Type Field Description protected double
averageDocumentLength
The average document length in the collection.protected PostingIndex<?>
directIndex
protected DocumentIndex
documentIndex
protected int
feedbackDocumentCount
protected Lexicon<java.lang.String>
lexicon
The lexicon used for retrieval.protected static org.slf4j.Logger
logger
The logger useddouble
normaliser
The parameter-free term weight normaliser.protected int
numberOfDocuments
The number of documents in the collection.protected long
numberOfTokens
The number of tokens in the collection.protected gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm>
terms
The terms in the top-retrieval documents.protected double
totalDocumentLength
The number of tokens in the X top ranked documents.-
Fields inherited from class org.terrier.querying.ExpansionTerms
EXPANSIONTERM_DESC_SCORE_SORTER, model, originalTermFreqs, originalTermids
-
-
Constructor Summary
Constructors Constructor Description DFRBagExpansionTerms(CollectionStatistics collStats, Lexicon<java.lang.String> _lexicon, PostingIndex<?> _directIndex, DocumentIndex _documentIndex)
Constructs an instance of ExpansionTerms.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
assignWeights(QueryExpansionModel QEModel)
Assign weight to terms that are stored in ExpansionTerm[] terms.void
deleteTerm(int termid)
Remove the records for a given termdouble
getDocumentFrequency(int termId)
Returns the number of the top-ranked documents a given term occurs in.SingleTermQuery[]
getExpandedTerms(int numberOfExpandedTerms)
This method implements the functionality of assigning expansion weights to the terms in the top-retrieved documents, and returns the most informative terms among them.protected SingleTermQuery[]
getExpandedTerms(int numberOfExpandedTerms, QueryExpansionModel QEModel)
double
getExpansionProbability(int termId)
Returns the probability of a given termid occurring in the expansion documents.gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm>
getExpansionTerms()
Returns expanded termsdouble
getExpansionWeight(int termId)
Returns the weight of a term with the given term identifier.double
getExpansionWeight(int termId, QueryExpansionModel model)
Returns the weight of a term with the given term identifier, computed by the specified query expansion model.double
getExpansionWeight(java.lang.String term)
Returns the weight of a given term.double
getExpansionWeight(java.lang.String term, QueryExpansionModel model)
Returns the weight of a given term, computed by the specified query expansion model.double
getFrequency(int termId)
Returns the frequency of a given term in the top-ranked documents.double
getFrequency(java.lang.String term)
Returns the frequency of a given term in the top-ranked documents.int
getNumberOfUniqueTerms()
Returns the unique number of terms found in all the top-ranked documentsdouble
getOriginalExpansionWeight(java.lang.String term)
Returns the un-normalised weight of a given term.int[]
getTermIds()
Returns the termids of all terms found in the top-ranked documentsvoid
insertDocument(int docid, int rank, double score)
Adds the feedback document from the index given a docidvoid
insertDocument(FeedbackDocument doc)
Adds the feedback document to the feedback set.protected void
insertTerm(int termID, double withinDocumentFrequency)
Add a term in the X top-retrieved documents as a candidate of the expanded terms.void
setOriginalQueryTerms(MatchingQueryTerms query)
Set the original query terms.void
setTotalDocumentLength(double totalLength)
Allows the totalDocumentLength to be set after the fact-
Methods inherited from class org.terrier.querying.ExpansionTerms
getOriginalTermIds, setModel
-
-
-
-
Field Detail
-
logger
protected static org.slf4j.Logger logger
The logger used
-
terms
protected gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> terms
The terms in the top-retrieval documents.
-
lexicon
protected Lexicon<java.lang.String> lexicon
The lexicon used for retrieval.
-
directIndex
protected PostingIndex<?> directIndex
-
documentIndex
protected DocumentIndex documentIndex
-
numberOfDocuments
protected int numberOfDocuments
The number of documents in the collection.
-
numberOfTokens
protected long numberOfTokens
The number of tokens in the collection.
-
averageDocumentLength
protected double averageDocumentLength
The average document length in the collection.
-
totalDocumentLength
protected double totalDocumentLength
The number of tokens in the X top ranked documents.
-
normaliser
public double normaliser
The parameter-free term weight normaliser.
-
feedbackDocumentCount
protected int feedbackDocumentCount
-
-
Constructor Detail
-
DFRBagExpansionTerms
public DFRBagExpansionTerms(CollectionStatistics collStats, Lexicon<java.lang.String> _lexicon, PostingIndex<?> _directIndex, DocumentIndex _documentIndex)
Constructs an instance of ExpansionTerms.- Parameters:
collStats
- Statistics of the used corpora_lexicon
- Lexicon The lexicon used for retrieval._directIndex
- DirectIndex to use for finding terms for documents_documentIndex
- DocumentIndex to use for finding statistics about documents
-
-
Method Detail
-
setTotalDocumentLength
public void setTotalDocumentLength(double totalLength)
Allows the totalDocumentLength to be set after the fact
-
setOriginalQueryTerms
public void setOriginalQueryTerms(MatchingQueryTerms query)
Description copied from class:ExpansionTerms
Set the original query terms.- Overrides:
setOriginalQueryTerms
in classExpansionTerms
- Parameters:
query
- The original query.
-
getTermIds
public int[] getTermIds()
Returns the termids of all terms found in the top-ranked documents
-
getNumberOfUniqueTerms
public int getNumberOfUniqueTerms()
Returns the unique number of terms found in all the top-ranked documents- Specified by:
getNumberOfUniqueTerms
in classExpansionTerms
-
getExpansionTerms
public gnu.trove.TIntObjectHashMap<ExpansionTerms.ExpansionTerm> getExpansionTerms()
Returns expanded terms- Returns:
- terms
-
getExpandedTerms
public SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms)
This method implements the functionality of assigning expansion weights to the terms in the top-retrieved documents, and returns the most informative terms among them. Conservative Query Expansion (ConservativeQE) is used if the number of expanded terms is set to 0. In this case, no new query terms are added to the query, only the existing ones reweighted.- Specified by:
getExpandedTerms
in classExpansionTerms
- Parameters:
numberOfExpandedTerms
- int The number of terms to extract from the top-retrieved documents. ConservativeQE is set if this parameter is set to 0. * @return TermTreeNode[] The expanded terms.- Returns:
- weighted query terms
-
getExpandedTerms
protected SingleTermQuery[] getExpandedTerms(int numberOfExpandedTerms, QueryExpansionModel QEModel)
-
deleteTerm
public void deleteTerm(int termid)
Remove the records for a given term
-
getExpansionWeight
public double getExpansionWeight(java.lang.String term, QueryExpansionModel model)
Returns the weight of a given term, computed by the specified query expansion model.- Parameters:
term
- String the term to set the weight for.model
- QueryExpansionModel the used query expansion model.- Returns:
- double the weight of the specified term.
-
getExpansionWeight
public double getExpansionWeight(java.lang.String term)
Returns the weight of a given term.- Parameters:
term
- String the term to get the weight for.- Returns:
- double the weight of the specified term.
-
getOriginalExpansionWeight
public double getOriginalExpansionWeight(java.lang.String term)
Returns the un-normalised weight of a given term.- Parameters:
term
- String the given term.- Returns:
- The un-normalised term weight.
-
getFrequency
public double getFrequency(java.lang.String term)
Returns the frequency of a given term in the top-ranked documents.- Parameters:
term
- String the term to get the frequency for.- Returns:
- double the frequency of the specified term in the top-ranked documents.
-
getFrequency
public double getFrequency(int termId)
Returns the frequency of a given term in the top-ranked documents.- Parameters:
termId
- int the id of the term to get the frequency for.- Returns:
- double the frequency of the specified term in the top-ranked documents.
-
getDocumentFrequency
public double getDocumentFrequency(int termId)
Returns the number of the top-ranked documents a given term occurs in.- Parameters:
termId
- int the id of the term to get the frequency for.- Returns:
- double the document frequency of the specified term in the top-ranked documents.
-
assignWeights
public void assignWeights(QueryExpansionModel QEModel)
Assign weight to terms that are stored in ExpansionTerm[] terms.- Parameters:
QEModel
- QueryExpansionModel the used query expansion model.
-
getExpansionWeight
public double getExpansionWeight(int termId, QueryExpansionModel model)
Returns the weight of a term with the given term identifier, computed by the specified query expansion model.- Parameters:
termId
- int the term identifier to set the weight for.model
- QueryExpansionModel the used query expansion model.- Returns:
- double the weight of the specified term.
-
getExpansionWeight
public double getExpansionWeight(int termId)
Returns the weight of a term with the given term identifier.- Parameters:
termId
- int the term identifier to set the weight for.- Returns:
- double the weight of the specified term.
-
getExpansionProbability
public double getExpansionProbability(int termId)
Returns the probability of a given termid occurring in the expansion documents. Returns the quotient document frequency in the expansion documents, divided by the total length of all the expansion documents.- Parameters:
termId
- int the term identifier to obtain the probability- Returns:
- double the probability of the term
-
insertDocument
public void insertDocument(FeedbackDocument doc) throws java.io.IOException
Adds the feedback document to the feedback set.- Specified by:
insertDocument
in classExpansionTerms
- Throws:
java.io.IOException
-
insertDocument
public void insertDocument(int docid, int rank, double score) throws java.io.IOException
Adds the feedback document from the index given a docid- Throws:
java.io.IOException
-
insertTerm
protected void insertTerm(int termID, double withinDocumentFrequency)
Add a term in the X top-retrieved documents as a candidate of the expanded terms.- Parameters:
termID
- int the integer identifier of a termwithinDocumentFrequency
- double the within document frequency of a term
-
-