Terrier Desktop Search

Desktop Terrier is an example application we have provided with Terrier for two purposes:

Running Desktop Terrier

If it is the first time you use Desktop Terrier, you can select to automaticaly index the documentation that comes along with the distribution of Terrier, or choose the folders you wish Terrier to index. Once you have selected all the folders to be indexed, press the Create Index button.

Indexing

Indexing is the process where Terrier examines all the files in the folders you specified, reads the documents if it can, and saves all the words it finds in each one. In the indexing tab, there are two buttons. The first opens a dialog in order to select the folders you wish to index and the second initiates the indexing process.

When you click the "Select Folders" button, a dialog opens, where you can select a set of folders to index. The application will look into these folders recursively, and will index the documents from which it can extract meaningful text. According to the extension of the files, the application uses the corresponding parser. If there is no parser for a particular file type, then the application assumes that it cannot extract the text from the file and ignores it. The association between file extensions and parsers is set with the property indexing.simplefilecollection.extensionsparsers, which defaults to the value:

  txt:FileDocument,text:FileDocument,tex:FileDocument,bib:FileDocument,
  pdf:PDFDocument,html:HTMLDocument,htm:HTMLDocument,xhtml:HTMLDocument,
  xml:HTMLDocument,doc:MSWordDocument,ppt:MSPowerpointDocument,xls:MSExcelDocument

Additional parsers can be added by providing classes that implement the interface uk.ac.gla.terrier.indexing.Document, and updating the value of the property indexing.simplefilecollection.extensionparsers. For example, if we add a class called com.acme.PSDocument that extracts the text from Postscript documents, then we would append the value of indexing.simplefilecollection.extensionparsers with:

ps:com.acme.PSDocument

Once you have selected the folders to index, you may click the "Index" button in order to start the indexing process. The progress of the indexing is shown in the lower part of the window, where the output of the indexing is shown. When the indexing is over, the focus moves to the search tab automatically. The time taken to index will depend on the type of documents you selected to index, how many files, and how much memory (RAM) your computer has. For example, parsing PDF documents may slow down the indexing process.

You can now use the Search tab of Desktop Terrier to search for documents. Enter terms that you think your document may contain in the text box beside the Search button, and press Search. Documents Terrier thinks are relevant will be displayed in the list below. You can open a document by double clicking on that row in the table. The type of the document is shown in the second column, where the strings and the colors can be specified by setting the properties desktopsearch.filetype.types and desktopsearch.filetype.colors. The default values of these properties are the following:


desktopsearch.filetype.types=
	txt:Text,
	text:Text,
	tex:TeX,
	bib:Bib,
	pdf:PDF,
	html:HTML,
	htm:HTML,
	xhtml:XHTML,
	xml:XML,
	doc:Word,
	ppt:Powerpoint,
	xls:Excel

desktopsearch.filetype.colors=
	Text:(221 221 221),
	TeX:(221 221 221),
	Bib:(221 221 221),
	PDF:(236 67 69),
	HTML:(177 228 250),
	Word:(100 100 255),
	Powerpoint:(250 110 49),
	Excel:(38 183 78),
	XHTML:(177 228 250),
	XML:(177 228 250)

Note that the new lines and spaces in the property values are shown only for illustration purposes and should not be included in the properties file. In the case where we add the class com.acme.PSPostscrit, then we would have to append the value of desktop.filetype.types:

ps:Postscript,
eps:EncapsulatedPS

and the value of the property desktopsearch.filetype.colors with the RGB value of the color we would like to show for the Postscript files:

Postscript:(180 180 180)
EncapsulatedPS:(180 210 180)

More Help

Desktop Terrier has a Help file, available from the Help menu.

Advanced Options

Should you have trouble using Desktop Terrier, ie if the application inexplicably disappears, then start Terrier using the --debug option. eg:

bin/desktop_terrier.sh --debug (Linux, Mac OS X)
bin\desktop_terrier.bat --debug (Windows)

"If you use Desktop Terrier regularly, you may wish to have Terrier re-index your documents automatically at set times. You can do this by scheduling Terrier to run with the --runindex option:

bin/desktop_terrier.sh --reindex (Linux, Mac OS X)
bin\desktop_terrier.bat --reindex (Windows)

You need to schedule this command line. On Unix use the crontab utility. On Windows use Scheduled Tasks, which can be found in the Control Panel.