Query Language

Terrier offers two query languages - a high-level, user facing query language, and a low-level query language for developers which is expressed in terms of matching operations (matching ops). All user queries are rewritten down into matching operations. The matching op query language borrows from the Indri and Galago query languages.

User Query Language

Terrier offers a user flexible query language for searching with phrases, fields, or specifying that terms are required to appear in the retrieved documents.

Some examples of Terrier's query language are the following:

Combinations of the different constructs are possible as well. For example, the query term1 term2 -"term1 term2" would retrieve all the documents that contain at least one of the terms term1 and term2, but not the documents where the phrase "term1 term2" appears.

Note that in some configurations, the Terrier query language may not be available by default. In particular, if batch processing queries from a file using a class that extends TRECQuery, then the queries are pre-processed by a tokeniser that may remove the query language characters (e.g. brackets and colons). To use the Terrier query language in this case, you should use SingleLineTRECQuery and set SingleLineTRECQuery.tokenise to false in the terrier.properties file.

Matching Op Query Language

In general, this follows a subset of the Indri query language. We have two types of operators, namely: semantic, in that they cause a posting list with particular semantics to be generated at matching time; in contrast, syntactic operators are syntactic sugar defined within the query language, which allow attributes of the semantic operators to be changed.

The semantic operators will be familiar to those who have used Indri or Galago:

There are currently two syntactic operators:

Note that semantic operators cannot contain syntactic operators.

Using the Matching Op Query Language

You can use the matchingop query language in interactive querying command by passing the -m option. The prompt will be matchop query>, as shown in the exxample below:

$ bin/terrier interactive -m
Setting TERRIER_HOME to /home/Terrier
23:33:14.496 [main] INFO  o.t.structures.CompressingMetaIndex - Structure meta reading lookup file into memory
23:33:14.503 [main] INFO  o.t.structures.CompressingMetaIndex - Structure meta loading data file into memory
matchop query> #combine:0=0.85:1=0.15:2=0.05(#combine(dramatise personae) #1(dramatise personae) #uw8(dramatise personae))
etc

Similarly, batchretrieve command also takes a -m option, whereby the queries will be assumed to be in matchingop query language.

$ cat mytopics
1 terrier #1(information retrieval)
2 systems
$ bin/terrier batchretrieve -s -m -t mytopics

where -m defines that matchingop query language will be used, and -s defines that topics are in single-line format.

Matching Op Query Language Specification

Each top-level match operator must resolve to an expression that can be defined as posting list during Matching. The PostingListManager is responsible for opening the correct postings from the inverted index for each matching operator.

Some match operators may require a particular type of input posting. For instance, a #uwN operator requires that each input operator generates postings that implements BlockPosting (i.e. an index created with position information). Moreover, each match operator may return a different type of posting - for instance, the IterablePosting created by a #uwN operator does not return the position information, although that from a #1 does.

Operator Class Type Input Posting Type Output Posting Type
term SingleTermOp - Any (as input postings)
term.FIELD SingleTermOp - Fields (FieldPosting) Frequency
#band AndQueryOp AND Any Binary (i.e. frequency=1)
#uwN UnorderedWindowOp AND Positional (BlockPosting) Frequency
#1 PhraseOp AND Positional (BlockPosting) Positions
#syn SynonymOp OR Any (depends on input postings)
#prefix PrefixTermOp OR Any (depends on input postings)
#prefix FuzzyTermOp OR Any (depends on input postings)

On the other hand, the syntactic operators (such as #combine and #tag) are defined solely in the matchop query parser, and hence there is no equivalent matchop class. As these cannot result in a single posting list, their positioning within a matchop is restricted. For instance, all of the following queries are invalid:


Webpage: http://terrier.org
Contact: School of Computing Science
Copyright (C) 2004-2019 University of Glasgow. All Rights Reserved.