Terrier Users :  Terrier Forum terrier.org
General discussion about using/developing applications using Terrier 
Adding the abstract
Posted by: petra1 ()
Date: August 13, 2017 05:27PM

Hello,
I am trying to add an abstract to the web search. I can run simple indexing and retrieval, but when I try to save an abstract, I am getting a Null Pointer Exception.

I have just tried to replicate the example at [terrier.org] and use FULLTEXT as the abstract.

Here are the settings:

querying.postprocesses.order=QueryExpansion
querying.postprocesses.controls=qe:QueryExpansion

TrecDocTags.doctag=DOC
TrecDocTags.idtag=DOCNO
TrecDocTags.skip=
TrecDocTags.process=DOC,DOCNO,DOCNAME,ENTITIES,TYPED_ENTITIES,RELATIONS,FULLTEXT,TEXT,LEMMAS
TrecDocTags.casesensitive=false

FieldTags.process=ENTITIES,TYPED_ENTITIES,RELATIONS,FULLTEXT,TEXT,LEMMAS

TaggedDocument.abstracts=title,body
TaggedDocument.abstracts.tags=FULLTEXT,ELSE
TaggedDocument.abstracts.tags.casesensitive=false
TaggedDocument.abstracts.lengths=256,2048

indexer.meta.forward.keys=DOCNO,title,body
indexer.meta.forward.keylens=26,256,2048
indexer.meta.reverse.keys=DOCNO

trec.encoding=UTF-8
termpipelines=Stopwords
trec.collection.class=TRECCollection
string.use_utf=true
indexing.simplefilecollection.recurse=true
matching.trecresults.format=DOCNO
trec.model=TF_IDF
tokeniser=UTFTokeniser


And this is the sample file:

<DOC>
<DOCNO>1</DOCNO>
<DOCNAME>Name 1</DOCNAME>
<TEXT>some text here</TEXT>
<FULLTEXT>other text here</FULLTEXT>
<LEMMAS>lemmatized text here</LEMMAS>
<ENTITIES>some text entities here</ENTITIES>
<RELATIONS>some entity relations here</RELATIONS>
</DOC>
<DOC>
<DOCNO>2</DOCNO>
<DOCNAME>Name 2</DOCNAME>
<TEXT>some text here</TEXT>
<FULLTEXT>other text here</FULLTEXT>
<LEMMAS>lemmatized text here</LEMMAS>
<ENTITIES>some text entities here</ENTITIES>
<RELATIONS>some entity relations here</RELATIONS>
</DOC>

Am I missing something? Is there any way how to get this running?

Thank you,
Petra

Options: ReplyQuote
Re: Adding the abstract
Posted by: Maram ()
Date: September 06, 2017 10:13PM

Hi Petra,

I noticed you are storing the body of the document using these commands:
indexer.meta.forward.keys=DOCNO,title,body
indexer.meta.forward.keylens=26,256,2048
indexer.meta.reverse.keys=DOCNO

did that work for you? did you manage to get the body of a document given the docid? can you share how you did that please?

Thanks a lot in advance!
Maram

Options: ReplyQuote
Re: Adding the abstract
Posted by: Maram ()
Date: September 06, 2017 10:15PM

BTW, based on the configuration you listed, you forgot to include the abstract in the metaindex (based on the example you shared). Here's what's said on the terrier page you shared:

# We also need to tell the indexer to store the abstracts generated
# In addition to the docno, we also need to move the 'title' and 'abstract' abstracts generated to the meta index
indexer.meta.forward.keys=docno,title,abstract
# The maximum lengths for the meta index entries.
indexer.meta.forward.keylens=26,256,2048

Options: ReplyQuote
Re: Adding the abstract
Posted by: petra1 ()
Date: September 08, 2017 08:30AM

Hi Maram,
yes, I was able to get this running. The problem was that TYPED_ENTITIES are in fields to process and they were missing from xml sample.

This setting worked for me (I am using fulltext instead of abstract):
TaggedDocument.abstracts=title,body
TaggedDocument.abstracts.tags=DOCNO,FULLTEXT
TaggedDocument.abstracts.tags.casesensitive=false
TaggedDocument.abstracts.lengths=26,2000
indexer.meta.forward.keys=docno,title,body
indexer.meta.forward.keylens=26,26,2000

Best,
Petra

Options: ReplyQuote
Re: Adding the abstract
Posted by: maryathomes ()
Date: September 27, 2017 10:34AM

Hii
To add the abstract to the current reference in your Citavi project.

_______________________
[www.ozassignmenthelp.com.au]



Edited 1 time(s). Last edit at 09/27/2017 10:36AM by maryathomes.

Options: ReplyQuote


Sorry, only registered users may post in this forum.
This forum powered by Phorum.