Regexp anchors in Ruby
Some Rubyists, when faced with the task of matching against the
beginning or the end of a string, are prone to using ^ and $ in
their regular expressions. Most of the time the code will seem to work properly,
but… these anchors don’t actually match a string’s beginning and
end - they match a line’s beginning and end. Consider the
following example:
string = 'username'
string[/^username$/] # matches (as expected)
string = "some injection\nusername"
string[/^username$/] # matches again(WAT???)
The anchors for beginning and end of a string are actually \A and
\z(there’s also a similar \Z anchor, but it’s rarely used in
practice):
string = "some injection\nusername"
string[/\Ausername\z/] # don't match
In an actual application the line string[/^username$/] is a recipe for
disaster. That’s why Rails 4 started raising exceptions when ^ and
$ are used in validates :something, format: { with: /.../ }.
By the way, this isn’t something specific to Ruby at all - \A and \z are not the same
thing as ^ and $ in most programming languages that have Perl-style regular expressions.
There’s something peculiar in Ruby, though - it automatically uses
multiline mode (which enables the aforementioned behaviour of
having ^ and $ match per line) for regular expressions. Other
languages support it as well, but usually you need to enable it
yourself, since it’s not consider a particularly intuitive
default. For example - by default Perl, Java and C# treat ^ and $ as
beginning/end of string until you explicitly enable multiline match mode
(/m). In Ruby /m simply allows . to match newlines.
I guess people, who’ve recently switched to Ruby from another language, would be most susceptible to writing potentially dangerous code like this.
That’s all for today folks. I hope you’ll find this article useful. As usual I’m looking forward to hearing your thoughts here and on Twitter!