[TR-44] In singlepass indexing, checking for enough free memory is insufficient Created: 29/Jun/09  Updated: 05/Mar/10  Resolved: 16/Jul/09

Status: Resolved
Project: Terrier Core
Component/s: .indexing
Affects Version/s: 3.0
Fix Version/s: 3.0

Type: Improvement Priority: Major
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: File singlepass-used-memory.patch    

 Description   
Single-pass tries to use as much memory as possible for mini-inverted indices (flushes). It uses some Java code to guess how much memory is left.
When JVM has allocated all memory, and when only 70% free, flush().

However, Java's memory management isn't reliable. We can easily get out of memory errors, particularly in Hadoop mode, and often for block indexing. Java6 makes this problem worse as it will throw OutOfMemoryError earlier than Java 5 would.


 Comments   
Comment by Craig Macdonald [ 29/Jun/09 ]

Proposed Solution: instead of checking how much is free, check to see how much you know you have used for mini inv index in memory. Set threshold e.g. to 300MB.

Future work: set this as % of max JVM size?

Comment by Craig Macdonald [ 29/Jun/09 ]

Initial patch for singe-pass and hadoop mode indexing, for Terrier 2. Initial experiments show this to make indexing more resilient.

Comment by Rodrygo L. T. Santos [ 30/Jun/09 ]

Which mechanism are you using to measure the available memory? Is it different from this one?

MemoryMXBean mxBean = ManagementFactory.getMemoryMXBean();
MemoryUsage mu = mxBean.getHeapMemoryUsage();
long used = mu.getUsed() / 1048576;
long max = mu.getMax() / 1048576;

Comment by Craig Macdonald [ 01/Jul/09 ]

Hi Rodrygo, thanks for your interest.

We currently have a MemoryChecker interface. The implementation I'm using is based on the java.lang.Runtime object:
http://trmaster/cgi-bin/viewvc/trunk/src/uk/ac/gla/terrier/utility/RuntimeMemoryChecker.java

Would you be able to provide an implementation based on the bean interface you have found? I dont know if the statistics that this implementation provides are the same as those from the Runtime interface. I have checked the JDK source, and it's not the case that the bean is just a wrapper for the Runtime object.

C

Comment by Craig Macdonald [ 16/Jul/09 ]

I committed an improved version to trunk.

Generated at Sat Dec 16 18:48:59 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.