Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
What information do each index file stores?
Posted by: deeper2 ()
Date: December 07, 2017 12:34PM

Dear Sir,
I have conduct indexing with terrier, and what to know What information do each index file stores? Which seems cannot find in Terrier docs and posts on this forum.

data.docid---------only docid?
data.if------------inverted files
data.lex-----------lexicon files
data.lexhash-------hash value of each term in lexicon?
data.lexid---------id of each term in lexicon
data.properties----properties
docpointers.col----map of docid to doctitle?

Hoping for you reply!

Options: ReplyQuote
Re: What information do each index file stores?
Posted by: craigm ()
Date: December 07, 2017 06:08PM

.docid is called the document index in later version of terrier - essentially, the document lengths.

.lexhash is the offset of each starting letter in the lexicon, to speed searches

docpointers.col wasn't much use. you can ignore ;-)

Again, these files are from a very old version of Terrier. The filenames are more meaningful in newer versions.

Craig

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.