Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-55

Singlepass indexing efficiency hindered by getMemoryConsumption() calls

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: .indexing
    • Labels:
      None

      Description

      Terrier 2.2.1 was reported to index WT10G in the follow times (two pass, singlepass, two pass + blocks, singlepass + blocks):
      62.5 min
      34.7 min
      2hour 18min
      53.1 min

      It would appear that these times are no longer being achieved for Terrier 3. Profiling is required.

        Attachments

          Activity

          Hide
          craigm Craig Macdonald added a comment -

          Most time was spent summing up the memory consumption of the MemoryPostings object:

          rank self accum count trace method
          1 41.13% 41.13% 32556 300488 uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption

          TRACE 300488:
          uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption(MemoryPostings.java:128)
          uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.checkFlush(BasicSinglePassIndexer.java:287)
          uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.indexDocument(BasicSinglePassIndexer.java:320)
          uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:224)

          This was caused by TREC-43.

          Show
          craigm Craig Macdonald added a comment - Most time was spent summing up the memory consumption of the MemoryPostings object: rank self accum count trace method 1 41.13% 41.13% 32556 300488 uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption TRACE 300488: uk.ac.gla.terrier.structures.indexing.singlepass.MemoryPostings.getMemoryConsumption(MemoryPostings.java:128) uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.checkFlush(BasicSinglePassIndexer.java:287) uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.indexDocument(BasicSinglePassIndexer.java:320) uk.ac.gla.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:224) This was caused by TREC-43 .
          Hide
          craigm Craig Macdonald added a comment -

          Instead of calculating memory usage for every call of getMemoryConsumption(), I instead keep track of consumption as postings objects are added or updated. For WT2G indexing, this takes indexing time (with profiling enabled) down from 989.647 seconds to 388.993 seconds (with profiling enabled).

          Show
          craigm Craig Macdonald added a comment - Instead of calculating memory usage for every call of getMemoryConsumption(), I instead keep track of consumption as postings objects are added or updated. For WT2G indexing, this takes indexing time (with profiling enabled) down from 989.647 seconds to 388.993 seconds (with profiling enabled).
          Hide
          craigm Craig Macdonald added a comment -

          Changed committed to SVN.

          Show
          craigm Craig Macdonald added a comment - Changed committed to SVN.
          Hide
          ounis Iadh Ounis added a comment - - edited

          Nicola also reported slow indexing with WT2G. He said he could not achieve the timings mentioned on the Terrier web page on his laptop with Terrier3. Is this due to the same issue?

          Show
          ounis Iadh Ounis added a comment - - edited Nicola also reported slow indexing with WT2G. He said he could not achieve the timings mentioned on the Terrier web page on his laptop with Terrier3. Is this due to the same issue?
          Hide
          craigm Craig Macdonald added a comment -

          Indeed, it was he who alerted me that there might be a problem here. I wont tell him how to fix it, as he doesn't have to index too regularly anyway.

          Show
          craigm Craig Macdonald added a comment - Indeed, it was he who alerted me that there might be a problem here. I wont tell him how to fix it, as he doesn't have to index too regularly anyway.
          Hide
          ounis Iadh Ounis added a comment -

          I told him to speak to you about the issue. Good (that the bug was fixed). You might be right: we do need some sort of unit testing after all.

          Show
          ounis Iadh Ounis added a comment - I told him to speak to you about the issue. Good (that the bug was fixed). You might be right: we do need some sort of unit testing after all.

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: