Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-537

NPE in parallel indexing with SimpleFileCollection

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.1
    • Component/s: .indexing
    • Labels:
      None

      Description

      This is indexing the LDC NYT collection, which is one-doc-per-XML-file. The XMLCollection doesn't like it because of a DTD request that doesn't resolve, so I'm indexing it with SimpleFileCollection. If you try to go parallel, it has an NPE that looks like a missing constructor someplace.

      $ ./bin/terrier batchindexing -b -p
      Setting TERRIER_HOME to /Users/soboroff/terrier/terrier-nyt
      22:28:40.834 [ForkJoinPool.commonPool-worker-3] ERROR o.terrier.indexing.CollectionFactory - ERROR: First Collection class named org.terrier.indexing.SimpleFileCollection - requested constructor not found
      java.lang.NoSuchMethodException: org.terrier.indexing.SimpleFileCollection.<init>(java.util.List, java.lang.String, java.lang.String, java.lang.String)
      at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_141]
      at java.lang.Class.getConstructor(Class.java:1825) ~[na:1.8.0_141]
      at org.terrier.indexing.CollectionFactory.loadCollections(CollectionFactory.java:97) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.indexing.CollectionFactory.loadCollection(CollectionFactory.java:76) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing.loadCollection(ThreadedBatchIndexing.java:83) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:126) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:122) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) [na:1.8.0_141]
      at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) [na:1.8.0_141]
      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [na:1.8.0_141]
      at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) [na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) [na:1.8.0_141]
      at java.util.stream.AbstractTask.compute(AbstractTask.java:316) [na:1.8.0_141]
      at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [na:1.8.0_141]
      22:28:40.834 [ForkJoinPool.commonPool-worker-1] ERROR o.terrier.indexing.CollectionFactory - ERROR: First Collection class named org.terrier.indexing.SimpleFileCollection - requested constructor not found
      java.lang.NoSuchMethodException: org.terrier.indexing.SimpleFileCollection.<init>(java.util.List, java.lang.String, java.lang.String, java.lang.String)
      at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_141]
      at java.lang.Class.getConstructor(Class.java:1825) ~[na:1.8.0_141]
      at org.terrier.indexing.CollectionFactory.loadCollections(CollectionFactory.java:97) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.indexing.CollectionFactory.loadCollection(CollectionFactory.java:76) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing.loadCollection(ThreadedBatchIndexing.java:83) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:126) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:122) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) [na:1.8.0_141]
      at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) [na:1.8.0_141]
      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [na:1.8.0_141]
      at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) [na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) [na:1.8.0_141]
      at java.util.stream.AbstractTask.compute(AbstractTask.java:316) [na:1.8.0_141]
      at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool$WorkQueue.pollAndExecCC(ForkJoinPool.java:1190) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool.helpComplete(ForkJoinPool.java:1879) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:2045) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:404) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) [na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) [na:1.8.0_141]
      at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) [na:1.8.0_141]
      at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) [na:1.8.0_141]
      at org.terrier.applications.ThreadedBatchIndexing.lambda$index$0(ThreadedBatchIndexing.java:166) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1424) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ~[na:1.8.0_141]
      22:28:40.834 [ForkJoinPool.commonPool-worker-2] ERROR o.terrier.indexing.CollectionFactory - ERROR: First Collection class named org.terrier.indexing.SimpleFileCollection - requested constructor not found
      java.lang.NoSuchMethodException: org.terrier.indexing.SimpleFileCollection.<init>(java.util.List, java.lang.String, java.lang.String, java.lang.String)
      at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_141]
      at java.lang.Class.getConstructor(Class.java:1825) ~[na:1.8.0_141]
      at org.terrier.indexing.CollectionFactory.loadCollections(CollectionFactory.java:97) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.indexing.CollectionFactory.loadCollection(CollectionFactory.java:76) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing.loadCollection(ThreadedBatchIndexing.java:83) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:126) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:122) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) [na:1.8.0_141]
      at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) [na:1.8.0_141]
      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [na:1.8.0_141]
      at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) [na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) [na:1.8.0_141]
      at java.util.stream.AbstractTask.compute(AbstractTask.java:316) [na:1.8.0_141]
      at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [na:1.8.0_141]
      at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [na:1.8.0_141]
      22:28:40.839 [ForkJoinPool.commonPool-worker-3] ERROR o.t.a.ThreadedBatchIndexing - Collection class named SimpleFileCollection not found, aborting
      22:28:40.839 [ForkJoinPool.commonPool-worker-1] ERROR o.t.a.ThreadedBatchIndexing - Collection class named SimpleFileCollection not found, aborting
      22:28:40.839 [ForkJoinPool.commonPool-worker-2] ERROR o.t.a.ThreadedBatchIndexing - Collection class named SimpleFileCollection not found, aborting
      22:28:40.982 [main] ERROR o.t.a.ThreadedBatchIndexing - Problem occurred during parallel indexing
      java.util.concurrent.ExecutionException: java.lang.NullPointerException
      at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1006) ~[na:1.8.0_141]
      at org.terrier.applications.ThreadedBatchIndexing.index(ThreadedBatchIndexing.java:166) ~[terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.BatchIndexing$Command.run(BatchIndexing.java:102) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.CLITool$CLIParsedCLITool.run(CLITool.java:130) [terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.CLITool.main(CLITool.java:244) [terrier-project-5.0-jar-with-dependencies.jar:na]
      Caused by: java.lang.NullPointerException: null
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_141]
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_141]
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_141]
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1005) ~[na:1.8.0_141]
      ... 4 common frames omitted
      Caused by: java.lang.NullPointerException: null
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_141]
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_141]
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_141]
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) ~[na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) ~[na:1.8.0_141]
      at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[na:1.8.0_141]
      at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) ~[na:1.8.0_141]
      at org.terrier.applications.ThreadedBatchIndexing.lambda$index$0(ThreadedBatchIndexing.java:166) ~[terrier-project-5.0-jar-with-dependencies.jar:na]
      at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1424) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) ~[na:1.8.0_141]
      at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ~[na:1.8.0_141]
      Caused by: java.lang.NullPointerException: null
      at org.terrier.structures.indexing.Indexer.index(Indexer.java:339) ~[terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:154) ~[terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:131) ~[terrier-project-5.0-jar-with-dependencies.jar:na]
      at org.terrier.applications.ThreadedBatchIndexing$1.apply(ThreadedBatchIndexing.java:122) ~[terrier-project-5.0-jar-with-dependencies.jar:na]
      at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_141]
      at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[na:1.8.0_141]
      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_141]
      at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) ~[na:1.8.0_141]
      at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) ~[na:1.8.0_141]
      at java.util.stream.AbstractTask.compute(AbstractTask.java:316) ~[na:1.8.0_141]
      at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) ~[na:1.8.0_141]
      ... 4 common frames omitted

        Attachments

          Issue Links

            Activity

            Hide
            craigm Craig Macdonald added a comment -

            Thanks. Trivial fix, but this we need a unit test to test ALL Collection classes for conformity. Note to self: Constructor APIs are BAD.

            Show
            craigm Craig Macdonald added a comment - Thanks. Trivial fix, but this we need a unit test to test ALL Collection classes for conformity. Note to self: Constructor APIs are BAD.
            Hide
            craigm Craig Macdonald added a comment -

            tagging for 5.1

            Show
            craigm Craig Macdonald added a comment - tagging for 5.1
            Hide
            craigm Craig Macdonald added a comment -

            Fixed and unit test introduced to check Collection constructors. Constructor APIs are /bad/.

            Show
            craigm Craig Macdonald added a comment - Fixed and unit test introduced to check Collection constructors. Constructor APIs are /bad/.

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                isoboroff Ian Soboroff
              • Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: