Regex: Captures versus Groups - what's the deal? (Part 1)

Published: Thu 13 July 2017
By EWS

In Blog.

I love the .NET Framework, and I love C# too. Humm... I suppose I should be more honest with myself and say that I love the tooling which makes the .NET framework and C# lovable (Visual Studio and Resharper). As much as the tooling makes development an absolute delight, I sometimes marvel at how Microsoft has conspired to force developers to jump through hoops (well, I guess not everything is a conspiracy).

Take regular expressions, for example. In *nix, they're ubiquitous:

$ echo "And the man said: veni vidi vici" | sed 's/\(v[a-z]\+\)/\1.../g'
And the man said: veni... vidi... vici...

In C#, however, access to a really simple interface for doing a regular expression match isn't an intuitive journey; that is, at least not for folks who don't use .NET regular expressions on a regular basis.

That's because the way in-which the API gets used isn't (IMHO) very intuitive; for example, this would seem to demonstrate that you can do pretty much what was done in the sed example above with not-too-much-fuss:

using System.Text.RegularExpressions;

foreach (var m in Regex.Matches("And the man said: veni vidi vici", @"v\w+"))
    Console.WriteLine(m + "...");

This works:

$ csharp content/code/netre-sed-in-cs.cs
veni...
vidi...
vici...

The problem is, a surprise is awaiting us if we try to treat each item as a string in other ways; that's because what's returned on each iteration isn't a string:

using System.Text.RegularExpressions;

foreach (var m in Regex.Matches("And the man said: veni vidi vici", @"v\w+"))
    if (m.Contains("e"))
        Console.WriteLine("That must be veni...");

This doesn't work:

$ csharp content/code/netre-sed-in-cs-not-string.cs
(2,11): error CS1928: Type `object' does not contain a member `Contains' and the best extension method overload `System.Linq.Enumerable.Contains<string>(this System.Collections.Generic.IEnumerable<string>, string)' has some invalid arguments
/usr/lib/mono/gac/System.Core/4.0.0.0__b77a5c561934e089/System.Core.dll (Location of the symbol related to previous error) (2,11): error CS1929: Extension method instance type `object' cannot be converted to `System.Collections.Generic.IEnumerable<string>'

So there are a couple of what I like to call 'magic nuggets' (my own made-up term borrowed from magic numbers [1]) at play here:

  1. We're to know that in order to "search"-match on a string, the method to use is Matches. Is this intuitive? Arguably, yes, it is. There's another method Match which has a very similar name, and yes, it only returns one match as we would expect, that is, the first match found. So far, so good;
  2. The Matches method returns a MatchCollection, which implements IEnumerable; but .. the non-generic version, so doing an enumeration will return object 's, that is, not IEnumerable<Match> (which is a least-surprise-highlighting surprise).

In subsequent parts we'll dive into the Match type and how the API allows one to get at the bits and pieces (and why things aren't particularly discoverable).

[1]Specifically, the definition: "Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants".

Comments !

social