Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-320

Blocks for Integer compression fails for large documents (blocks.max)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1
    • Component/s: .structures
    • Labels:
      None

      Description

      From Ben He, UCAS

       I am having problem with the PFD compression which you may want to have a look at.

      The collection I was using is WT10G. According to documentation at http://terrier.org/docs/v4.0/compression.html , I added the following lines in the .properties file:

      indexing.direct.compression.configuration=org.terrier.structures.integer.IntegerCodecCompressionConfiguration

      index.direct.compression.integer.chunk.size=1024
      index.direct.compression.integer.fields.codec=LemireNewPFDVBCodec
      index.direct.compression.integer.blocks.codec=LemireNewPFDVBCodec
      index.direct.compression.integer.ids.codec=LemireNewPFDVBCodec
      index.direct.compression.integer.tfs.codec=LemireNewPFDVBCodec
      indexing.inverted.compression.configuration=org.terrier.structures.integer.IntegerCodecCompressionConfiguration

      index.inverted.compression.integer.chunk.size=1024
      index.inverted.compression.integer.ids.codec=LemireNewPFDVBCodec
      index.inverted.compression.integer.tfs.codec=LemireNewPFDVBCodec
      index.inverted.compression.integer.fields.codec=LemireNewPFDVBCodec
      index.inverted.compression.integer.blocks.codec=LemireNewPFDVBCodec

      I set the package prefix of the compression configuration class to "org.terrier.structures.integer" because "org.terrier.structures.indexing" as in the documentation throws a
      ClassNotFound exception.

      The two-pass indexing worked fine for non-block indexing, but fails with block indexing enabled while traversing the inverted file. It ended with an ArrayIndexOutOfBoundsException.

        Attachments

          Activity

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: