Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-94

BitPostingIndexInputFormat tries to use negative offsets

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: .structures
    • Labels:
      None

      Description

      INFO - Calculating splits of structure inverted
      INFO - File 0 approx splits=37.04791417717934
      INFO - File 1 approx splits=11.056029200553894
      INFO - File 2 approx splits=19.52261757850647
      INFO - File 3 approx splits=11.2651377171278
      INFO - File 4 approx splits=8.67573507130146
      INFO - File 5 approx splits=9.554178163409233
      INFO - File 6 approx splits=6.9783875644207
      INFO - File 7 approx splits=8.02258075773716
      INFO - File 8 approx splits=7.250798925757408
      INFO - File 9 approx splits=3.25227153301239
      INFO - File 10 approx splits=3.524603247642517
      INFO - File 11 approx splits=8.75064267218113
      INFO - File 12 approx splits=12.29665707051754
      INFO - File 13 approx splits=6.377697631716728
      INFO - File 14 approx splits=4.812018930912018
      INFO - File 15 approx splits=15.15673378109932
      INFO - File 16 approx splits=1.0457783639431
      INFO - File 17 approx splits=10.512042790651321
      INFO - File 18 approx splits=21.401717394590378
      INFO - File 19 approx splits=10.833241820335388
      INFO - File 20 approx splits=3.486497238278389
      INFO - File 21 approx splits=4.725020796060562
      INFO - File 22 approx splits=7.931180149316788
      INFO - File 23 approx splits=0.6753774285316467
      INFO - File 24 approx splits=1.1868641674518585
      INFO - File 25 approx splits=0.801939070224762
      Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: Negative offset is not supported. File: /Indices/ClueWeb09/TREC-B/classical/data.inverted.bf25
              at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:722)
              at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:703)
              at org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
              at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

              at org.apache.hadoop.ipc.Client.call(Client.java:715)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
              at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
              at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
              at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
              at org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:297)
              at org.apache.hadoop.dfs.DFSClient.getBlockLocations(DFSClient.java:318)
              at org.apache.hadoop.dfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:137)
              at org.terrier.structures.indexing.singlepass.hadoop.BitPostingIndexInputFormat.getSplits(BitPostingIndexInputFormat.java:233)
              at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
              at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
              at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce$Inv2DirectMultiReduceJob.runJob(Inv2DirectMultiReduce.java:166)
              at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce.invertStructure(Inv2DirectMultiReduce.java:338)
              at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce.main(Inv2DirectMultiReduce.java:282)

        Attachments

          Activity

          Hide
          craigm Craig Macdonald added a comment - - edited

          This was an issue created by earlier private issue. I have totally reworked the algorithm to create the splits. Empriical evidence suggests this works as expected now.

          Show
          craigm Craig Macdonald added a comment - - edited This was an issue created by earlier private issue. I have totally reworked the algorithm to create the splits. Empriical evidence suggests this works as expected now.

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: