Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-311

New integer compression techniques for the direct and inverted index structures

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.6
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:
      None

      Description

      Attached, the files for enabling modern integer compression techniques for the inverted index in Terrier.

      Files:
      matteo_compression.jar: the code
      matteo_compression_test.jar: unit testing
      JavaFastPFOR_Terrier.jar: MODIFIED JavaFastPFOR library, contains also Kamikaze. Add this to the build path.

      Required modifications to the rest of the code:

      1) In org.terrier.compression, make BitInBase public
      2) In (tes) org.terrier.tests.ShakespeareEndToEndTest, use always PostingIndex and PostingIndexInputStream instead of InvertedIndex and InvertedIndexInputStream
      3) Replace PostingTestUtils with the attached file (it contains some extra methods)

      The main entry point for this library may be the InvertedIndexRecompresser utility, which recompress a classical inverted index file using modern integer techinques specified via a configuration file. Read the javadoc documentation to learn about the usage.

        Attachments

          Issue Links

            Activity

            catena.matteo Matteo Catena created issue -
            catena.matteo Matteo Catena made changes -
            Field Original Value New Value
            Attachment PostingTestUtils.java [ 10388 ]
            catena.matteo Matteo Catena made changes -
            Attachment matteo_compression_test.jar [ 10389 ]
            catena.matteo Matteo Catena made changes -
            Description Attached the file for enabling modern integer compression techniques for the inverted index in Terrier.

            Files:
            matteo_compression.jar: the code
            matteo_compression_test.jar: unit testing
            JavaFastPFOR_Terrier.jar: MODIFIED JavaFastPFOR library, contains also Kamikaze. Add this to the build path.

            Required modifications to the rest of the code:

            1) In org.terrier.compression, make BitInBase public
            2) In (tes) org.terrier.tests.ShakespeareEndToEndTest, use always PostingIndex and PostingIndexInputStream instead of InvertedIndex and InvertedIndexInputStream
            3) Replace PostingTestUtils with the attached file (it contains some extra methods)
            Attached, the files for enabling modern integer compression techniques for the inverted index in Terrier.

            Files:
            matteo_compression.jar: the code
            matteo_compression_test.jar: unit testing
            JavaFastPFOR_Terrier.jar: MODIFIED JavaFastPFOR library, contains also Kamikaze. Add this to the build path.

            Required modifications to the rest of the code:

            1) In org.terrier.compression, make BitInBase public
            2) In (tes) org.terrier.tests.ShakespeareEndToEndTest, use always PostingIndex and PostingIndexInputStream instead of InvertedIndex and InvertedIndexInputStream
            3) Replace PostingTestUtils with the attached file (it contains some extra methods)

            The main entry point for this library may be the InvertedIndexRecompresser utility, which recompress a classical inverted index file using modern integer techinques specified via a configuration file. Read the javadoc documentation to learn about the usage.
            craigm Craig Macdonald made changes -
            Fix Version/s 4.0 [ 10050 ]
            craigm Craig Macdonald made changes -
            Link This issue is related to TREC-368 [ TREC-368 ]
            craigm Craig Macdonald made changes -
            craigm Craig Macdonald made changes -
            Attachment matteo_compression.jar [ 10406 ]
            craigm Craig Macdonald made changes -
            Link This issue is blocked by TREC-368 [ TREC-368 ]
            craigm Craig Macdonald made changes -
            Link This issue is related to TREC-387 [ TREC-387 ]
            craigm Craig Macdonald made changes -
            Summary New integer compression techniques for the inverted index New integer compression techniques for the direct and inverted index structures
            craigm Craig Macdonald made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            richardm Richard McCreadie made changes -
            Project TREC [ 10010 ] Terrier Core [ 10000 ]
            Key TREC-334 TR-311
            Workflow jira [ 10725 ] Terrier Open Source [ 10874 ]
            Affects Version/s 3.6 [ 10060 ]
            Affects Version/s 3.6 [ 10061 ]
            Component/s Core [ 10020 ]
            Fix Version/s 4.0 [ 10051 ]
            Fix Version/s 4.0 [ 10050 ]

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                catena.matteo Matteo Catena
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: