In this lesson, we will look at additional features and different grouping types.
We have a group of symbols from which we choose either
Suppose we want to find only those substrings in which the left and right parts match:
ta - ta and
tu - tu.
Let us try to add another "or" condition to our expression. That way, we will see that we have not got what we wanted:
It is the case when backreferencing helps. It works as follows. We use the special notation
\1, which shows that we should substitute the characters from the first group for
Thus, we will find substrings with the same left and right parts:
By default, we create all character groups, write them to a specific memory area, and label them with characters from
When we use quantification, it does not affect the result. The quantification is not involved in the backreference, so we take only the first occurrence in the memory area:
When programmers have multiple groups, they do not find it very convenient to remember them by number. It is much easier to use names. To do this, you must add
?<name> after opening the bracket:
Now you can refer to the group using the name group1 to perform operations on the
group1 in your code.
We can turn off backreferencing by putting a
?: inside our group:
ta-tu ta-ta tu-tu
After that, we do not save the group to the memory area. An error can occur when calling it since the group does not exist in the memory.
If you use this approach, the regular expression will get very difficult to read, but it will work faster. This method works 100% of the time if:
- You have a lot of groups and do not need them
- You want to avoid using them to save up space and avoid interference with further grouping
Another interesting kind of grouping without backreferencing is atomic grouping.
For atomic grouping, we use
: instead of
If we remove
?>, the regex will find two substrings —
When we add the atomic grouping characters,
?>, the following happens: we find first
cc. Usually, in the example above, the search would have rolled back to
a and continued checking from
b since the alternation character
| is present. Then, we would get to
cc, and the check would work.
But with atomic grouping, the return along the string back to
a is disabled. It continues moving along the alternatives
x we find
Once we find the first match from the atomic group (?>bc|b|x), other variants from this group do not get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.
We would only be able to find a match for the whole string with atomic grouping if we added another
c to it:
Are there any more questions? Ask them in the Discussion section.
The Hexlet support team or other students will answer you.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.