Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-561

What is technical structure the TREC (Text REtrieval Conference) file format?

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.1
    • Fix Version/s: None
    • Component/s: .applications, .indexing, tests
    • Labels:
      None

      Description

      I have been wondering about the internal structure of the TREC file format. Is it just an XML file with .trec name extension or there is more to that? By converting multiple .xml files to a single .trec file, do I need to just bring all the XML files into a single XML file (of course, the contents of each XML file are now in a single XML file:

          <DOC>
          <DOCNO>document number</DOCNO>
          <TEXT> content</TEXT>
          </DOC>
      ) and rename its extension or something special should be done?

      This information is required so that the converted collection could be used for indexing in Terrier

      Please help.

        Attachments

          Activity

          There are no comments yet on this issue.

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              Rocky Xanadul Irfan Ullah
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: