A second look at char.IsSymbol()

Let us begin by examining a rather simple looking code.

var input = "abc#ef";
var result = input.Any(char.IsSymbol);

What would the output of the above code ? Let’s hit F5 and check it.

False

Surprised ?? One should not feel guilty if he is surprised. It is rather surprising one does not look behind to understand what exactly char.IsSymbol does. After all, it is one of the rather underused method.

So why this pecular behaior ? What exactly is a Symbol according to the char.IsSymbol() method. The answer lies in the documentation of the method.

Valid symbols are members of the following categories in UnicodeCategory: >MathSymbol, CurrencySymbol, ModifierSymbol, and OtherSymbol.

The character ‘#’ naturally doesn’t fall under the required categories. Now, with that understanding, let us examine few other characters.

var charList = new[]{'!','@','$','*','+','%','-'};
foreach(var ch in charList)
{
Console.WriteLine($"{ch} = IsSymbol:{char.IsSymbol(ch)}");
}

The output again, has few curious facts to verify. Let’s check the output first.

! = IsSymbol:False
@ = IsSymbol:False
$ = IsSymbol:True
* = IsSymbol:False
+ = IsSymbol:True
% = IsSymbol:False
- = IsSymbol:False

Some of the results are self-explanatory, but what looks interesting for us would be the characters "*","-", and "%". All three of them looks to fall under Mathematical symbols. This might raise eyebrows on why they weren’t recognized as Symbols.

The answer lies in the UnicodeCategory of the character. Let us change the code a bit to include the unicode category as well for each character.

var charList = new[]{'!','@','$','*','+','%','-'};
foreach(var ch in charList)
{
Console.WriteLine($"{ch} = IsSymbol:{char.IsSymbol(ch)}"
+ $"UnicodeCategory:{Char.GetUnicodeCategory(ch)}");
}

Before further discussion let us examine the output as well.

! = IsSymbol:FalseUnicodeCategory:OtherPunctuation
@ = IsSymbol:FalseUnicodeCategory:OtherPunctuation
$ = IsSymbol:TrueUnicodeCategory:CurrencySymbol
* = IsSymbol:FalseUnicodeCategory:OtherPunctuation
+ = IsSymbol:TrueUnicodeCategory:MathSymbol
% = IsSymbol:FalseUnicodeCategory:OtherPunctuation
- = IsSymbol:FalseUnicodeCategory:DashPunctuation

The answer to previous question now stares on us. The characters "*,%,-" lies under the OtherPunctuation and DashPunctuation Categories.

That explains the behavior of char.IsSymbol(). In most cases, it would be better to use Regex for validating passwords or other strings that needs to be validated for special characters.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s