patkua@work

Keeping regular expressions readable

It’s easy for regular expressions to become one long string, involving lots of (round/square) brackets, backslashes, and other random symbols. It’s concise, yet often at cost of readability. We’ve recently been using a pattern that works well for us, where we’ve broken the expression down into its important constituents, still defined as a single string, but with a brief explanation for each part. Here’s a simple example:

Instead of public static final String SPECIAL_PATTERN = "(\\w)+-([a-zA-Z0-9])+-([a-zA-Z])+/\\w+" we modify the declaration slightly so it’s

public static final String SPECIAL_PATTERN
  = "(\\w)+" // at least 1 letter, number or underscore
  + "-([a-zA-Z])+" // dash with at least one number or letter 
  + "-([a-zA-Z])+" // dash with at least one number or letter 
  + "-(\\w)+" // dash with at least one letter, number or underscore

In our situation, applying Extract Constant or Extract Variable I think would have reduced readability so is a nice tradeoff of conciseness with readability.

Exit mobile version