Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-44

In singlepass indexing, checking for enough free memory is insufficient

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: .indexing
    • Labels:
      None

      Description

      Single-pass tries to use as much memory as possible for mini-inverted indices (flushes). It uses some Java code to guess how much memory is left.
      When JVM has allocated all memory, and when only 70% free, flush().

      However, Java's memory management isn't reliable. We can easily get out of memory errors, particularly in Hadoop mode, and often for block indexing. Java6 makes this problem worse as it will throw OutOfMemoryError earlier than Java 5 would.

        Attachments

          Activity

          craigm Craig Macdonald created issue -
          Hide
          craigm Craig Macdonald added a comment -

          Proposed Solution: instead of checking how much is free, check to see how much you know you have used for mini inv index in memory. Set threshold e.g. to 300MB.

          Future work: set this as % of max JVM size?

          Show
          craigm Craig Macdonald added a comment - Proposed Solution: instead of checking how much is free, check to see how much you know you have used for mini inv index in memory. Set threshold e.g. to 300MB. Future work: set this as % of max JVM size?
          Hide
          craigm Craig Macdonald added a comment -

          Initial patch for singe-pass and hadoop mode indexing, for Terrier 2. Initial experiments show this to make indexing more resilient.

          Show
          craigm Craig Macdonald added a comment - Initial patch for singe-pass and hadoop mode indexing, for Terrier 2. Initial experiments show this to make indexing more resilient.
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Attachment singlepass-used-memory.patch [ 10144 ]
          craigm Craig Macdonald made changes -
          Assignee Iadh Ounis [ ounis ] Craig Macdonald [ craigm ]
          craigm Craig Macdonald made changes -
          Component/s Core [ 10020 ]
          Hide
          rodrygo Rodrygo L. T. Santos added a comment -

          Which mechanism are you using to measure the available memory? Is it different from this one?

          MemoryMXBean mxBean = ManagementFactory.getMemoryMXBean();
          MemoryUsage mu = mxBean.getHeapMemoryUsage();
          long used = mu.getUsed() / 1048576;
          long max = mu.getMax() / 1048576;

          Show
          rodrygo Rodrygo L. T. Santos added a comment - Which mechanism are you using to measure the available memory? Is it different from this one? MemoryMXBean mxBean = ManagementFactory.getMemoryMXBean(); MemoryUsage mu = mxBean.getHeapMemoryUsage(); long used = mu.getUsed() / 1048576; long max = mu.getMax() / 1048576;
          Hide
          craigm Craig Macdonald added a comment -

          Hi Rodrygo, thanks for your interest.

          We currently have a MemoryChecker interface. The implementation I'm using is based on the java.lang.Runtime object:
          http://trmaster/cgi-bin/viewvc/trunk/src/uk/ac/gla/terrier/utility/RuntimeMemoryChecker.java

          Would you be able to provide an implementation based on the bean interface you have found? I dont know if the statistics that this implementation provides are the same as those from the Runtime interface. I have checked the JDK source, and it's not the case that the bean is just a wrapper for the Runtime object.

          C

          Show
          craigm Craig Macdonald added a comment - Hi Rodrygo, thanks for your interest. We currently have a MemoryChecker interface. The implementation I'm using is based on the java.lang.Runtime object: http://trmaster/cgi-bin/viewvc/trunk/src/uk/ac/gla/terrier/utility/RuntimeMemoryChecker.java Would you be able to provide an implementation based on the bean interface you have found? I dont know if the statistics that this implementation provides are the same as those from the Runtime interface. I have checked the JDK source, and it's not the case that the bean is just a wrapper for the Runtime object. C
          Hide
          craigm Craig Macdonald added a comment -

          I committed an improved version to trunk.

          Show
          craigm Craig Macdonald added a comment - I committed an improved version to trunk.
          craigm Craig Macdonald made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          craigm Craig Macdonald made changes -
          Affects Version/s 3.0 [ 10020 ]
          Fix Version/s 3.0 [ 10020 ]
          craigm Craig Macdonald made changes -
          Project TREC [ 10010 ] Terrier Core [ 10000 ]
          Key TREC-43 TR-44
          Workflow jira [ 10112 ] Terrier Open Source [ 10302 ]
          Affects Version/s 3.0 [ 10030 ]
          Affects Version/s 3.0 [ 10020 ]
          Component/s .indexing [ 10002 ]
          Component/s Core [ 10020 ]
          Fix Version/s 3.0 [ 10030 ]
          Fix Version/s 3.0 [ 10020 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: