The data type char is primitive and has no methods. For this reason, the Character class in the java.lang core package includes a large number of methods that are useful for dealing with single characters; many of these methods are static. This class includes methods for testing, such as whether a character is a digit, a letter, or a special character.
What all test methods have in common is that they start with the prefix is and return a boolean. In addition, methods are available for converting, for example, to uppercase or lowercase. The following list includes a few examples:
All these methods “know” about the properties of each Unicode character. Furthermore, the code point of each Unicode character is always the same, no matter whether a program is executed in Germany or Mongolia.
Note: The term “letter” not only describes well-known letters like “a” or “?.” Unicode contains more than 100,000 characters, including hundreds of letters and numbers.5
In the following example, we’ll declare a method that will run through a string and test if the string consists only of digits. Although such functionality is useful in practice, Java Platform, Standard Edition (Java SE) doesn’t provide a simple method for it.
public class IsNumeric {
/**
* Returns {@code true} if the String contains only Unicode digits.
* An empty string or {@code null} leads to {@code false}.
*
* @param string Input String.
* @return {@code true} if string is numeric, {@code false} otherwise.
*/
public static boolean isNumeric( String string ) {
if ( string == null || string.length() == 0 )
return false;
for ( int i = 0; i < string.length(); i++ )
if ( ! Character.isDigit( string.charAt( i ) ) )
return false;
return true;
}
public static void main( String[] args ) {
System.out.println( isNumeric( "1234" ) ); // true
System.out.println( isNumeric( "12.4" ) ); // false
System.out.println( isNumeric( "-123" ) ); // false
}
}
Our method defines that null and an empty string aren’t considered numeric. You can also specify that null should lead to an exception and that an empty string is definitely numeric. Conventions like these are up to the author of the library, and different utility libraries with such helper functions have different uses.
Our example uses two String methods: length() returns the length of a string, and charAt(int) returns the character at the desired position. A loop iterates over the string and tests each character with isDigit(...). If a character is not a digit, return false automatically exits the loop. If the loop runs successfully, a return true can report that each character was a digit.
final class java.lang.Character
implements Serializable, Comparable<Character>
Is it a digit between 0 and 9?
Is it a letter?
Is it an alphanumeric character?
Is it a lowercase letter or an uppercase letter?
Is it a space, line feed, return, or tab (i.e., whitespace)?
To convert a character to uppercase/lowercase, the Character class declares the methods toUpperCase(...) and toLowerCase(...). The is*(...) methods that carry out the testing are often used when a string is traversed.
Our next example asks a user to enter a string. Valid letters should be converted to uppercase, and any whitespace should be replaced with an underscore. To run the input, we’ll again use the String methods length() and charAt(int).
String input = new java.util.Scanner( System.in ).nextLine();
for ( int i = 0; i < input.length(); i++ ) {
char c = input.charAt( i );
if ( Character.isWhitespace( c ) )
System.out.print( '_' );
else if ( Character.isLetter( c ) )
System.out.print( Character.toUpperCase( c ) );
}
For example, for the input “honiara brotherhood guesthouse1,” the output is “HONIARA_ BROTHERHOOD_GUESTHOUSE.” The “1” disappears because it’s neither whitespace nor a letter.
final class java.lang.Character
implements Serializable, Comparable<Character>
The static methods return the matching uppercase or lowercase letter.
Note: The methods toUpperCase(...) and toLowerCase(...) exist twice: once as static methods on Character—in which case, they accept exactly one char as argument—and once as object methods on String objects. Care should be taken with Character.toUpperCase('s') because the result is “ß,” unlike the String method "s".toUpperCase(), returns the result “SS,” that is, a string extended by one. Even though there’s now an uppercase “ß” (Unicode U+00DF), Java still returns Unicode U+00DF, not U+1E9E, for Character.toUpperCase('s').
To convert a Unicode character to a string, you can use the overloaded static String method valueOf(char). A comparable method also exists in Character, namely, the static method toString(char). Both methods are limited in that the Unicode character can be only 2 bytes long. The static method Character.toString(int) creates a string for any Unicode character, and so, Character.toString(128123) results in a string with a ghost.
When characters come from a user input, you are often required to convert them to numbers. The digit '5' is to become the numeric value 5. According to old hacker traditions, the solution was always to subtract the value of '0'. The ASCII zero '0' has the char value 48, and '1' then has the value 49, until '9' finally reaches 57. Logically, '5' - '0' = 53 - 48 = 5. The solution has the disadvantage of only working for ASCII digits.
For example, a neat Java solution is to convert a char to a string and then convert it via an Integer method, for example, in the following way:
char c = '5';
int i = Integer.parseInt( String.valueOf(c) ); // 5
The parseInt(...) method is fully internationalized and also converts decimal numbers from other scripts, such as Hindi/Sanskrit:
System.out.println( Integer.parseInt( " " ) ); // 5
This method works but isn’t efficient for single characters in loops. Two other ways, using static methods from the Character class, are available.
The Character method getNumericValue(char) returns the numeric value of a digit. Of course, this method has been internationalized too. Consider the following example:
int i = Character.getNumericValue( '5' );
System.out.println( i ); // 5
System.out.println( Integer.parseInt( " " ) ); // 5
The method is much more powerful because it knows the actual “value” of all Unicode characters, including, for example, also Roman numerals (I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, L, C, D, and M), which are placed in the Unicode alphabet starting from \u2160:
System.out.println( Character.getNumericValue( '\u216f' ) ); // 1000
The Integer.parseInt(...) method can’t handle \u216f, thus Integer.parseInt("\ u216f") throws an exception.
The Character class also has conversion methods for digits with respect to any base, and vice versa.
final class java.lang.Character
implements Serializable, Comparable<Character>
Returns the numeric value that the character ch has under the base radix; common is base 10. For example, Character.digit('f', 16) is equal to 15. Any number system with a base between Character.MIN_RADIX (2) and Character.MAX_RADIX (36) is allowed. If no conversion is possible, the return value is -1.
Converts a numeric value to a character. For example, Character.forDigit(6, 8) is “6,” and Character.forDigit(12, 16) is “c.”
Example: The following example converts a string of digits into an integer:
char[] chars = { '3', '4', '0' };
int result = 0;
for ( char c : chars ) {
result = result * 10 + Character.digit( c, 10 );
System.out.println( result );
}
The output is 3, 34, and 340.
Editor’s note: This post has been adapted from a section of the book Java: The Comprehensive Guide by Christian Ullenboom.