Regexp anchors in Ruby
Some Rubyists, when faced with the task of matching against the
beginning or the end of a string, are prone to using ^
and $
in
their regular expressions. Most of the time the code will seem to work properly,
but… these anchors don’t actually match a string’s beginning and
end - they match a line’s beginning and end. Consider the
following example:
string = 'username'
string[/^username$/] # matches (as expected)
string = "some injection\nusername"
string[/^username$/] # matches again(WAT???)
The anchors for beginning and end of a string are actually \A
and
\z
(there’s also a similar \Z
anchor, but it’s rarely used in
practice):
string = "some injection\nusername"
string[/\Ausername\z/] # don't match
In an actual application the line string[/^username$/]
is a recipe for
disaster. That’s why Rails 4 started raising exceptions when ^
and
$
are used in validates :something, format: { with: /.../ }
.
By the way, this isn’t something specific to Ruby at all - \A
and \z
are not the same
thing as ^
and $
in most programming languages that have Perl-style regular expressions.
There’s something peculiar in Ruby, though - it automatically uses
multiline mode (which enables the aforementioned behaviour of
having ^
and $
match per line) for regular expressions. Other
languages support it as well, but usually you need to enable it
yourself, since it’s not consider a particularly intuitive
default. For example - by default Perl, Java and C# treat ^
and $
as
beginning/end of string until you explicitly enable multiline match mode
(/m
). In Ruby /m
simply allows .
to match newlines.
I guess people, who’ve recently switched to Ruby from another language, would be most susceptible to writing potentially dangerous code like this.
That’s all for today folks. I hope you’ll find this article useful. As usual I’m looking forward to hearing your thoughts here and on Twitter!