Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-563

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

    Details

      Description

      As suggested by Sir Craig, I am using the command line interface for batch indexing using the binary version of Terrier to understand its retrieval platform. While attempting to index INEX's Amazon/LibraryThing Collection having 2.8 million XML documents (distributed among 1100 folders), the command line come up with the following exception, after indexing only 7% of the collection:

      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
              at gnu.trove.TObjectIntHashMap.rehash(TObjectIntHashMap.java:172)
              at gnu.trove.THash.postInsertHook(THash.java:359)
              at gnu.trove.TObjectIntHashMap.put(TObjectIntHashMap.java:155)
              at org.terrier.structures.indexing.LexiconMap$1.execute(LexiconMap.java:98)
              at org.terrier.structures.indexing.LexiconMap$1.execute(LexiconMap.java:92)
              at gnu.trove.TObjectIntHashMap.forEachEntry(TObjectIntHashMap.java:426)
              at org.terrier.structures.indexing.DocumentPostingList.forEachTerm(DocumentPostingList.java:135)
              at org.terrier.structures.indexing.LexiconMap.insert(LexiconMap.java:92)
              at org.terrier.structures.indexing.LexiconBuilder.addDocumentTerms(LexiconBuilder.java:368)
              at org.terrier.structures.indexing.classical.BasicIndexer.indexDocument(BasicIndexer.java:393)
              at org.terrier.structures.indexing.classical.BasicIndexer.createDirectIndex(BasicIndexer.java:283)
              at org.terrier.structures.indexing.Indexer.index(Indexer.java:346)
              at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:154)
              at org.terrier.applications.BatchIndexing$Command.run(BatchIndexing.java:113)
              at org.terrier.applications.CLITool$CLIParsedCLITool.run(CLITool.java:155)
              at org.terrier.applications.CLITool.main(CLITool.java:316)

      Any suggestions on how to resolve this issue. How can we increase the memory heap space?

      Please help!

        Attachments

          Activity

          Hide
          Rocky Xanadul Irfan Ullah added a comment -

          Hi
          While reading the terreier.properties.sample file, I came across this string:

          memory.heap.usage=0.85

          I think, here, we can increase the heap size, but I am don't understand, what 0.85 means. Is it a percentage of the total memory size or what?

          Thanks in advance

          Show
          Rocky Xanadul Irfan Ullah added a comment - Hi While reading the terreier.properties.sample file, I came across this string: memory.heap.usage=0.85 I think, here, we can increase the heap size, but I am don't understand, what 0.85 means. Is it a percentage of the total memory size or what? Thanks in advance
          Hide
          Rocky Xanadul Irfan Ullah added a comment -

          I increased the memory for Java through Java Runtime Environment Settings in control panel, using the -Xmx2048m under the Runtime Parameters option. After processing the files (for a very smaller subset of the collection, mentioned above), the Exception message now changed to:

          15:04:25.616 [main] INFO o.t.structures.indexing.Indexer - Collection #0 took 677 seconds to index (57296 documents)
          15:04:25.725 [main] INFO o.t.s.indexing.LexiconBuilder - 29 lexicons to merge
          15:04:27.897 [main] INFO o.t.s.indexing.LexiconBuilder - Optimising structure lexicon
          15:04:27.897 [main] INFO o.t.s.i.FSOMapFileLexiconUtilities - Optimising lexicon with 326998 entries
          15:04:28.850 [main] INFO o.t.structures.indexing.Indexer - Started building the inverted index...
          15:04:28.850 [main] INFO o.t.structures.indexing.Indexer - Started building the inverted index...
          15:04:28.865 [main] INFO o.t.s.i.c.InvertedIndexBuilder - Iteration 1 of 1 iterations Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.terrier.structures.indexing.classical.InvertedIndexBuilder.createPointerForTerm(InvertedIndexBuilder.java:404) at org.terrier.structures.indexing.classical.InvertedIndexBuilder.scanLexiconForPointers(InvertedIndexBuilder.java:438) at org.terrier.structures.indexing.classical.InvertedIndexBuilder.createInvertedIndex(InvertedIndexBuilder.java:274) at org.terrier.structures.indexing.classical.BasicIndexer.createInvertedIndex(BasicIndexer.java:438) at org.terrier.structures.indexing.Indexer.index(Indexer.java:347) at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:154) at org.terrier.applications.BatchIndexing$Command.run(BatchIndexing.java:113) at org.terrier.applications.CLITool$CLIParsedCLITool.run(CLITool.java:155) at org.terrier.applications.CLITool.main(CLITool.java:316)

          While checking the CPU usage during processing, at the very last stage, when all the files are processed, the CPU usage jumps to 86% and then fluctuates among high numbers. Then suddenly it drops and the above exception appears.

          Any suggestions?

          Show
          Rocky Xanadul Irfan Ullah added a comment - I increased the memory for Java through Java Runtime Environment Settings in control panel , using the -Xmx2048m under the Runtime Parameters option. After processing the files (for a very smaller subset of the collection, mentioned above), the Exception message now changed to: 15:04:25.616 [main] INFO o.t.structures.indexing.Indexer - Collection #0 took 677 seconds to index (57296 documents) 15:04:25.725 [main] INFO o.t.s.indexing.LexiconBuilder - 29 lexicons to merge 15:04:27.897 [main] INFO o.t.s.indexing.LexiconBuilder - Optimising structure lexicon 15:04:27.897 [main] INFO o.t.s.i.FSOMapFileLexiconUtilities - Optimising lexicon with 326998 entries 15:04:28.850 [main] INFO o.t.structures.indexing.Indexer - Started building the inverted index... 15:04:28.850 [main] INFO o.t.structures.indexing.Indexer - Started building the inverted index... 15:04:28.865 [main] INFO o.t.s.i.c.InvertedIndexBuilder - Iteration 1 of 1 iterations Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.terrier.structures.indexing.classical.InvertedIndexBuilder.createPointerForTerm(InvertedIndexBuilder.java:404) at org.terrier.structures.indexing.classical.InvertedIndexBuilder.scanLexiconForPointers(InvertedIndexBuilder.java:438) at org.terrier.structures.indexing.classical.InvertedIndexBuilder.createInvertedIndex(InvertedIndexBuilder.java:274) at org.terrier.structures.indexing.classical.BasicIndexer.createInvertedIndex(BasicIndexer.java:438) at org.terrier.structures.indexing.Indexer.index(Indexer.java:347) at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:154) at org.terrier.applications.BatchIndexing$Command.run(BatchIndexing.java:113) at org.terrier.applications.CLITool$CLIParsedCLITool.run(CLITool.java:155) at org.terrier.applications.CLITool.main(CLITool.java:316) While checking the CPU usage during processing, at the very last stage, when all the files are processed, the CPU usage jumps to 86% and then fluctuates among high numbers. Then suddenly it drops and the above exception appears. Any suggestions?
          Hide
          craigm Craig Macdonald added a comment -

          Dear Irfan,

          There is a wiki page about memory problems with Terrier - see http://ir.dcs.gla.ac.uk/wiki/Terrier/MemoryIssues - I found this by simple googling 'terrier memory'.

          The key messages are:

          • use a dedicated server and set the memory as high as possible using the TERRIER_HEAP_MEM variable. We routinely index with using 32GB of memory
          • use the single-pass indexer (`bin/terrier batchindexing -j`), which detects low memory and flushes to disk

          Craig

          Show
          craigm Craig Macdonald added a comment - Dear Irfan, There is a wiki page about memory problems with Terrier - see http://ir.dcs.gla.ac.uk/wiki/Terrier/MemoryIssues - I found this by simple googling 'terrier memory'. The key messages are: use a dedicated server and set the memory as high as possible using the TERRIER_HEAP_MEM variable. We routinely index with using 32GB of memory use the single-pass indexer (`bin/terrier batchindexing -j`), which detects low memory and flushes to disk Craig
          Hide
          craigm Craig Macdonald added a comment -

          I should add that Terrier 5.2 will have a more robust classical indexer viz memory usage.

          Show
          craigm Craig Macdonald added a comment - I should add that Terrier 5.2 will have a more robust classical indexer viz memory usage.

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              Rocky Xanadul Irfan Ullah
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: