Class StringComparator

  • All Implemented Interfaces:
    java.util.Comparator<java.lang.String>

    public class StringComparator
    extends java.lang.Object
    implements java.util.Comparator<java.lang.String>
    Compares two strings which may have fixed length fields separated with a non word character (eg a dash), and a last field which corresponds to an integer. Two examples of such strings are XXX-XXX-012389 and XXX-XXX-1242 (XXX-XXX-1242 < XXX-XXX-012389 when compared using this comparator.

    This class is primarily used for comparing docnos, especially for TREC like collections. The docnos in the DocumentIndex as expected to be sorted in an order compatible with this comparator.

    Sorting Algorithm:

    • Split strings on non word characters
    • For each field, left-most first:
      1. Compare as number if both field contains only numbers, return if not equal
      2. Compare as string if both fields do not contain only numbers, return if not equal
    Author:
    Vassilis Plachouras, Craig Macdonald
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static StringComparator Me
      An instantiation of this class.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int compare​(java.lang.String s1, java.lang.String s2)
      Compares two Strings, which have a number of fields that are separated by one or more non-alphanumeric characters.
      static int compareObjects​(java.lang.Object o1, java.lang.Object o2)
      A static access method, to prevent having to instantiate a comparator This has the same parameters, return and implementation as compare(Object,Object)
      static int compareStrings​(java.lang.String s1, java.lang.String s2)
      A static access method, to prevent having to instantiate a comparator This has the same parameters, return and implementation as compare(Object,Object)
      static void main​(java.lang.String[] args)
      Will display the comparator value between two strings from the command line arguments.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Comparator

        equals, reversed, thenComparing, thenComparing, thenComparing, thenComparingDouble, thenComparingInt, thenComparingLong
    • Field Detail

    • Constructor Detail

      • StringComparator

        public StringComparator()
    • Method Detail

      • compare

        public int compare​(java.lang.String s1,
                           java.lang.String s2)
        Compares two Strings, which have a number of fields that are separated by one or more non-alphanumeric characters.
        Specified by:
        compare in interface java.util.Comparator<java.lang.String>
        Parameters:
        s1 - the first string object to compare.
        s2 - the second string object to compare.
        Returns:
        int -1, zero, or 1 if the first argument is less than, equal to, or greater than the second argument, respectively.
      • compareObjects

        public static int compareObjects​(java.lang.Object o1,
                                         java.lang.Object o2)
        A static access method, to prevent having to instantiate a comparator This has the same parameters, return and implementation as compare(Object,Object)
        Since:
        1.1.0
      • compareStrings

        public static int compareStrings​(java.lang.String s1,
                                         java.lang.String s2)
        A static access method, to prevent having to instantiate a comparator This has the same parameters, return and implementation as compare(Object,Object)
        Since:
        1.1.0
      • main

        public static void main​(java.lang.String[] args)
        Will display the comparator value between two strings from the command line arguments.