The Trap of Case-Insensitive String
Let's start from an example. What's the expect result of the following simple code?
What's the value of "isEqual" variable, "true" or "false"? If you try to compiler and run the above code, I believe, 99.9999 times out of 100, the value of "isEqual" is "true". What's the chance for the remaining 0.0001?
OK, let's run one more example. Copy/past the following code, and run it.
If your program runs into strange behaviors, and you happen to find the String.toUpperCase() or String.toLowerCase() calls in your code, you may think about the trap and above note.
Good luck!
Postscript: You may also want to read more about how to check the trap with a simple shell script.
// Example One String lower = "Simple Smiles!"; String upper = "SIMPLE SMILES!"; int lowerHashCode = lower.toUpperCase().hashCode(); int upperHashCode = upper.toUpperCase().hashCode(); boolean isEqual = (lowerHashCode == upperHashCode); System.out.println("The hash codes of the two case-insensitive " + "strings are the same: " + isEqual);
What's the value of "isEqual" variable, "true" or "false"? If you try to compiler and run the above code, I believe, 99.9999 times out of 100, the value of "isEqual" is "true". What's the chance for the remaining 0.0001?
OK, let's run one more example. Copy/past the following code, and run it.
// Example Two // // To illustrate the unexpected behaviors of case-insensitive strings // import java.util.Locale; public class InsensitiveCases { public static void main(String[] args) throws Exception { String lower = "Simple Smiles!"; String upper = "SIMPLE SMILES!"; for (Locale locale : Locale.getAvailableLocales()) { // reset the default locale Locale.setDefault(locale); System.out.println("In locale: " + locale); // check this local System.out.println("\tcomparing in lower case\n\t\t" + intuitiveCompare(lower.toLowerCase(), upper.toLowerCase())); System.out.println("\tcomparing in upper case\n\t\t" + intuitiveCompare(lower.toUpperCase(), upper.toUpperCase())); System.out.println("\thashcoding in lower case\n\t\t" + intuitiveHashCode(lower.toUpperCase(), upper.toUpperCase())); System.out.println("\thashcoding in upper case\n\t\t" + intuitiveHashCode(lower.toUpperCase(), upper.toUpperCase())); } } private static String intuitiveCompare(String lower, String upper) { int diff = lower.compareTo(upper); if (diff == 0) { return "lower == upper"; } else if (diff > 0) { return "lower > upper"; } return "lower < upper"; } private static String intuitiveHashCode(String lower, String upper) { int lowerCode = lower.hashCode(); int upperCode = upper.hashCode(); if (lowerCode == upperCode) { return "lower hash code == upper hash code "; } return "lower hash code != upper hash code "; } }You may find the record from the log:
In locale: tr_TR comparing in lower case lower < upper comparing in upper case lower > upper hashcoding in lower case lower hash code != upper hash code hashcoding in upper case lower hash code != upper hash codeIt's interesting that, in locale tr_TR, if we convert the strings into lower case, "Simple Smiles!" < "SIMPLE SMILES!"; while we convert the strings into upper case, the result is completely different, "Simple Smiles!" > "SIMPLE SMILES!". Of course, the hash code cannot be equal any more. What's wrong with it? We can get the answer from the spec of String.toUpperCase() or String.toLowerCase().
Note: This method is locale sensitive, and may produce unexpected results if used for strings that are intended to be interpreted locale independently. Examples are programming language identifiers, protocol keys, and HTML tags. For instance, "title".toUpperCase() in a Turkish locale returns "T\u0130TLE", where '\u0130' is the LATIN CAPITAL LETTER I WITH DOT ABOVE character. To obtain correct results for locale insensitive strings, use toUpperCase(Locale.ENGLISH).It's an important note to operate on case-insensitive strings. However, it's not easy to pay enough attention to this note in practice. So we may be find the following code here and there:
// Example Three public int hashCode() { return theStringObject.toUpperCase().hashCode(); } OR // Example Four theStringObject.toUpperCase().equals("MYSTRING"); OR // Example Five theStringObject.toUpperCase().compareTo("MYSTRING");From the above note, we know that we cannot expect the above three example always work correctly. To avoid the unexpected behaviors, pay attention to the above note, especially, "To obtain correct results for locale insensitive strings, use toUpperCase(Locale.ENGLISH)." Let's try revise the above three examples:
// Example Three: Revised public int hashCode() { return theStringObject.toUpperCase(Locale.ENGLISH).hashCode(); } OR // Example Four: Revised theStringObject.toUpperCase(Locale.ENGLISH).equals("MYSTRING"); OR // Example Five: Revised theStringObject.toUpperCase(Locale.ENGLISH).compareTo("MYSTRING");
If your program runs into strange behaviors, and you happen to find the String.toUpperCase() or String.toLowerCase() calls in your code, you may think about the trap and above note.
Good luck!
Postscript: You may also want to read more about how to check the trap with a simple shell script.