Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 3.0
-
Fix Version/s: 3.0
-
Component/s: .structures
-
Labels:None
Description
INFO - Calculating splits of structure inverted
INFO - File 0 approx splits=37.04791417717934
INFO - File 1 approx splits=11.056029200553894
INFO - File 2 approx splits=19.52261757850647
INFO - File 3 approx splits=11.2651377171278
INFO - File 4 approx splits=8.67573507130146
INFO - File 5 approx splits=9.554178163409233
INFO - File 6 approx splits=6.9783875644207
INFO - File 7 approx splits=8.02258075773716
INFO - File 8 approx splits=7.250798925757408
INFO - File 9 approx splits=3.25227153301239
INFO - File 10 approx splits=3.524603247642517
INFO - File 11 approx splits=8.75064267218113
INFO - File 12 approx splits=12.29665707051754
INFO - File 13 approx splits=6.377697631716728
INFO - File 14 approx splits=4.812018930912018
INFO - File 15 approx splits=15.15673378109932
INFO - File 16 approx splits=1.0457783639431
INFO - File 17 approx splits=10.512042790651321
INFO - File 18 approx splits=21.401717394590378
INFO - File 19 approx splits=10.833241820335388
INFO - File 20 approx splits=3.486497238278389
INFO - File 21 approx splits=4.725020796060562
INFO - File 22 approx splits=7.931180149316788
INFO - File 23 approx splits=0.6753774285316467
INFO - File 24 approx splits=1.1868641674518585
INFO - File 25 approx splits=0.801939070224762
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: Negative offset is not supported. File: /Indices/ClueWeb09/TREC-B/classical/data.inverted.bf25
at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:722)
at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:703)
at org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:297)
at org.apache.hadoop.dfs.DFSClient.getBlockLocations(DFSClient.java:318)
at org.apache.hadoop.dfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:137)
at org.terrier.structures.indexing.singlepass.hadoop.BitPostingIndexInputFormat.getSplits(BitPostingIndexInputFormat.java:233)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce$Inv2DirectMultiReduceJob.runJob(Inv2DirectMultiReduce.java:166)
at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce.invertStructure(Inv2DirectMultiReduce.java:338)
at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce.main(Inv2DirectMultiReduce.java:282)
INFO - File 0 approx splits=37.04791417717934
INFO - File 1 approx splits=11.056029200553894
INFO - File 2 approx splits=19.52261757850647
INFO - File 3 approx splits=11.2651377171278
INFO - File 4 approx splits=8.67573507130146
INFO - File 5 approx splits=9.554178163409233
INFO - File 6 approx splits=6.9783875644207
INFO - File 7 approx splits=8.02258075773716
INFO - File 8 approx splits=7.250798925757408
INFO - File 9 approx splits=3.25227153301239
INFO - File 10 approx splits=3.524603247642517
INFO - File 11 approx splits=8.75064267218113
INFO - File 12 approx splits=12.29665707051754
INFO - File 13 approx splits=6.377697631716728
INFO - File 14 approx splits=4.812018930912018
INFO - File 15 approx splits=15.15673378109932
INFO - File 16 approx splits=1.0457783639431
INFO - File 17 approx splits=10.512042790651321
INFO - File 18 approx splits=21.401717394590378
INFO - File 19 approx splits=10.833241820335388
INFO - File 20 approx splits=3.486497238278389
INFO - File 21 approx splits=4.725020796060562
INFO - File 22 approx splits=7.931180149316788
INFO - File 23 approx splits=0.6753774285316467
INFO - File 24 approx splits=1.1868641674518585
INFO - File 25 approx splits=0.801939070224762
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: Negative offset is not supported. File: /Indices/ClueWeb09/TREC-B/classical/data.inverted.bf25
at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:722)
at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:703)
at org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:297)
at org.apache.hadoop.dfs.DFSClient.getBlockLocations(DFSClient.java:318)
at org.apache.hadoop.dfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:137)
at org.terrier.structures.indexing.singlepass.hadoop.BitPostingIndexInputFormat.getSplits(BitPostingIndexInputFormat.java:233)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce$Inv2DirectMultiReduceJob.runJob(Inv2DirectMultiReduce.java:166)
at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce.invertStructure(Inv2DirectMultiReduce.java:338)
at org.terrier.structures.indexing.singlepass.hadoop.Inv2DirectMultiReduce.main(Inv2DirectMultiReduce.java:282)
This was an issue created by earlier private issue. I have totally reworked the algorithm to create the splits. Empriical evidence suggests this works as expected now.