Sunday, February 14, 2010

Regular simplification

Original post date: Sun Nov 19 12:00:00 2006
Hey, I should do this more often! While writing the previous item I thought it would be nice to format the regular expression a bit by indenting it and adding some comment, this was the result:

# gobble any spaces
(\s*)
# main group
(?:
 # keyword - gather letters until first space or =
 ([^=\s]*)
 # option group
 (?:
  # option 1 - attribute has no quotes
  (?:
   # gobble spaces and =
   (\s*?=\s*?)
   # value - gather anything that's not a space or quote
   ([^\s"]+)
  )
  |
  # option 2 - attribute has quotes
  (?:
   # gobble spaces and =
   (\s*?=\s*?)
   # value - gather anything between two quotes
   (".*?")
  )
 )?
)
  
And while setting this up I saw that this was actually more complicated than need be so I started pruning. When I put back the resulting expression everything still worked so I must have done something right. The indented version looks like this:

# gobble any spaces
(\s*)
# keyword - gather letters until first space or =
([^=\s]*)
# optional group containing the value
(?:
 # gobble spaces and =
 (\s*?=\s*?)
 # option group
 (?:
  # value - (option 1) gather anything that's not a space or quote
  ([^\s"]+)
 |
  # value - (option 2) gather anything between two quotes
  (".*?")
 )
)?
  
And the resulting expression looks like this:

(\s*)([^=\s]*)(?:(?:(\s*?=\s*?)([^\s"]+))|(?:(\s*?=\s*?)(".*?")))?
  
A bit shorter than the first version but still way too cryptic in my book.

No comments:

Post a Comment