Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-311

New integer compression techniques for the direct and inverted index structures

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.6
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:
      None

      Description

      Attached, the files for enabling modern integer compression techniques for the inverted index in Terrier.

      Files:
      matteo_compression.jar: the code
      matteo_compression_test.jar: unit testing
      JavaFastPFOR_Terrier.jar: MODIFIED JavaFastPFOR library, contains also Kamikaze. Add this to the build path.

      Required modifications to the rest of the code:

      1) In org.terrier.compression, make BitInBase public
      2) In (tes) org.terrier.tests.ShakespeareEndToEndTest, use always PostingIndex and PostingIndexInputStream instead of InvertedIndex and InvertedIndexInputStream
      3) Replace PostingTestUtils with the attached file (it contains some extra methods)

      The main entry point for this library may be the InvertedIndexRecompresser utility, which recompress a classical inverted index file using modern integer techinques specified via a configuration file. Read the javadoc documentation to learn about the usage.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                catena.matteo Matteo Catena
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: