4 minute read

One of the things that bothered me initially in OCaml was the poor support for working in regular expressions in the standard library. Technically speaking, there’s no support for them at all!

What do I mean by this? Well, there’s the older Str library that provides support for regular expressions, but it’s:

  • not really a part of the standard library (it’s bundled with OCaml, but not part of Stdlib)
  • it doesn’t work with unicode characters, as it treats strings as sequences of bytes
  • very confusingly named (when I see something named Str I’m thinking of strings)

Note: Use #require "str";; in the top-level to load Str.

Here’s a trivial example using it:

let text = "hello123world" in
let re = Str.regexp "[0-9]+" in
if Str.string_match re text 0 then
  Printf.printf "Matched: %s\n" (matched_string text)
else
  Printf.printf "No match\n";;

Str.global_replace (Str.regexp "[0-9]+") "#" "hello123world456";;
- : string = "hello#world#"

let re = Str.regexp {|hello \([A-Za-z]+\)|} in
      Str.replace_first re {|\1|} "hello world"
- : string = "world"

Str.split (Str.regexp "[ \t]+") "hello world";;
- : string list = ["hello"; "world"]

I hope the examples are self-explanatory. Str’s API is quite similar to what you’d find in most imperative languages, which is part of the reason the library is frowned upon.

Tip: If you find string literals like {|foo bar|} strange, please consult this article. They are useful when dealing with regular expressions to avoid additional escaping of \. If we used a regular string instead of {|hello \([A-Za-z]+\)|} it would be "hello \\([A-Za-z]+\\)".

I won’t dwell much on Str as few people use it these days, especially if they need to do more complex tasks with regular expressions. Enter the Re library. Before we do something with Re we’ll need to install it:

opam install re

One interesting thing about Re is that it supports various flavors of regular expressions:

  • Perl-style regular expressions (module Re.Perl);
  • Posix extended regular expressions (module Re.Posix);
  • Emacs-style regular expressions (module Re.Emacs);
  • Shell-style file globbing (module Re.Glob).

Okay, shell globbing is not exactly regular expressions, and I’m not sure who would want to use Emacs style regular expressions outside Emacs, but you sure have options! I’m a big fan of Perl’s regular expressions, so I’ll stick with them going forward.

Note: Str supports only Posix-style regular expressions, which usually involve using quite a lot of escaping.

Now, let’s see it in action (I encourage to try the examples below in utop):

#require "re";;

(* basic matching *)
let re = Re.Perl.re "[0-9]+" |> Re.compile in
let text = "hello123world" in
match Re.exec_opt re text with
| Some group -> Printf.printf "Matched: %s\n" (Re.Group.get group 0)
| None -> Printf.printf "No match\n"
;;

(* replace matches *)
let replace_digits str =
  let re = Re.Perl.re "[0-9]+" |> Re.compile in
  Re.replace_string re ~by:"#"
    str
;;

print_endline (replace_digits "hello123world456");;

(* use matching groups *)
let re = Re.Perl.re {|(\w+)-(\d+)|} |> Re.compile in
match Re.exec_opt re "item-42" with
| Some group ->
    let name = Re.Group.get group 1 in
    let number = Re.Group.get group 2 in
    Printf.printf "name: %s, number: %s\n" name number
| None -> print_endline "No match"
;;

(* composable regular expressions *)
let word = Re.rep1 Re.wordc;;
let dash = Re.char '-' ;;
let digits = Re.rep1 Re.digit;;

let re =
  Re.seq [word; dash; digits] |> Re.compile
;;

let input = "hello-123" in
match Re.exec_opt re input with
| Some g -> print_endline ("Matched: " ^ Re.Group.get g 0)
| None -> print_endline "No match"
;;

(* iterate over all matches *)
let re = Re.Perl.re "\\d+" |> Re.compile;;

let all_matches str =
  Re.all re str
  |> List.iter (fun g -> Printf.printf "Match: %s\n" (Re.Group.get g 0))
;;

all_matches "a1 b22 c333";;

I hope it’s clear that Re allows you to program in a more functional way. I’ve barely scratched the surface here, as the library has pretty big API, that everyone serious about it should eventually explore. Below is a list of its most useful combinators:

Combinator Meaning
Re.char c Match a single char
Re.string s Match exact string
Re.alt [r1; r2] Alternation (r1 | r2)
Re.seq [r1; r2] Concatenation (r1 r2)
Re.rep r Zero or more (r*)
Re.rep1 r One or more (r+)
Re.opt r Optional (r?)
Re.group r Capture group
Re.compile Compile the regex

And here’s a brief comparison of Str vs Re:

Feature Str (legacy) Re (modern)
Availability Built-in (kind of) External library (re package)
API Style Imperative, stateful Functional, composable
Regex Flavor POSIX-like Multiple backends (Perl, Str, Emacs, etc.)
Unicode support Poor Better (though OCaml string handling is limited)
Match iteration Awkward (search_forward loop) Elegant (Re.all, Re.iter)
Replacement String only Function or string
Error messages Vague Clear, structured
Composability Poor (regexp strings only) Excellent (regex combinators like seq, alt)

To sum it up:

  • Use Str only if you want zero dependencies and can tolerate legacy, clunky APIs.
  • Use Re if you care about code clarity, safety, composability, and are okay with pulling in an external dependency (which you should be in 2025).

That article sat in my backlog for quite a while, as regular expressions were one of the most frustrating aspects for me when I started to play with OCaml (Perl and Ruby had really spoiled me on that front), but eventually I kind of got used to them, so I no longer felt much need to write the article. Still, I hope some newcomers to OCaml will find it userful!

That’s all I have for you today. Keep hacking!