Let's look at some additional features and different types of grouping.
We have a group of symbols from which we choose either ta
or tu
:
/(ta|tu)
/
ta
-tu
ta
-ta
tu
-tu
Suppose we want to find only those substrings in which the left and right parts match: ta - ta
and tu - tu
. Let's try to add another “or” condition to our expression; we'll see that we haven't got what we wanted:
/(ta|tu)-(ta|tu)
/
ta-tu
ta-ta
tu-tu
This is where backreferencing helps. It works as follows. We use special notation \1
, which says that the characters from the first group – and we have only one group – should be substituted for \1
. Thus, substrings with the same left and right parts will be found:
/(ta|tu)-\1
/
ta-tu ta-ta
tu-tu
By default, all character groups that we create are written to a special memory area and labeled with characters from \1
to \9
. If we used quantification, it wouldn't affect the result because it isn't involved in the backreference, and only the first occurrence in the memory region is taken:
/(ta|tu)+-\1
/
ta-tu ta-ta
tu-tu
If you use multiple groups, it's not very convenient to remember them by number. It's much easier to use names. To do this, you must add ?<name>
after opening the bracket.
/(?<group1>ta|tu)-\k<group1>
/
ta-tu ta-ta
tu-tu
Now you can refer to the group using the name group1 to perform operations on the group1
in your code.
We can turn off backreferencing by putting a ?:
inside our group:
/(?:ta|tu)-\1
/
ta-tu ta-ta tu-tu
After that, the group won't be saved to a special memory area, and an error will occur when calling it since the group doesn't exist in the memory. If you use this approach, the regular expression will get very difficult to read, but it will work faster. This is method works 100% of the time if you have a lot of groups and don't need them, or if you want to avoid using them so that they don't take up much space and don't interfere with further grouping.
Another interesting kind of grouping without backreferencing is called atomic grouping. NB! Atomic grouping isn't supported in some popular programming languages, including JavaScript and Python. But you can google solutions to emulate them with existing constructions.
For atomic grouping, we use :
instead of >
:
/a(?>bc|b|x)cc
/
abccaxcc
Let's have a look at how it works. If we remove ?>
, regex will find two substrings: abcc
and axcc
:
/a(bc|b|x)cc
/
abccaxcc
When we add the atomic grouping characters, ?>
, the following happens: first a
, is found, then bc
, then cc
. Normally, in the example above, the search would have rolled back to a
and continued checking from b
, since the alternation character |
is present. After that we would get to cc
and the check would work.
But with atomic grouping, the return along the string back to a
is disabled, and it continues moving along the alternatives bc
-> b
-> x
, and after x
we find cc
.
Once the first match from the atomic group is found (?>bc|b|x), other variants from this group don't get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.
We would only be able to find a match for the whole string with atomic grouping if we added another c
to the string:
/a(?>bc|b|x)cc
/
abcccaxcc
The Hexlet support team or other students will answer you.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.
Programming courses for beginners and experienced developers. Start training for free
Our graduates work in companies:
From a novice to a developer. Get a job or your money back!
Sign up or sign in
Ask questions if you want to discuss a theory or an exercise. Hexlet Support Team and experienced community members can help find answers and solve a problem.