Regex Quiz and Solution

In all your answers, explain what you are thinking so that I can give you partial credit in case you don't get it exactly right. In 2, 3, 4, be careful to use a backslash to “escape” characters that you mean literally, but have special meanings in regex.

1. What is the output of the following and why?

In [6]:
import re
s = '0 1a 1b 2 3 99 -1.23  777 .25 0. 2.0 3456. 45.6789 -66.0'
x = '(-?[0-9]+\.[0-9]*)'
re.findall(x,s)
Out[6]:
['-1.23', '0.', '2.0', '3456.', '45.6789', '-66.0']

Comments:

The expression is written to match strings that represent a floating point number. A decimal point and at least one digit before it are mandatory. Digits after the decimal point are optional, as is a leading minus sign.

2. Compose a regular expression that will match each word in a string that begins with a d or a D.

In [7]:
x = '\\b[Dd].+?\\b'
s = 'Definitely depressed, David is down in the dungeon.'
re.findall( x, s )
Out[7]:
['Definitely', 'depressed', 'David', 'down', 'dungeon']

Comments:

At a word boundary (\b), it begins with a D or d, and at least one arbitrary character before the next word boundary.

3. Compose a regular expression that will match any of the following 'Monday' , 'Tuesday' , 'Wednesday' , 'Thursday' , 'Friday' , 'Saturday' , 'Sunday', but not 'Toomanylettersday' nor 'Dday'.

In [9]:
s = 'Monday Tuesday, Wednesday: Thursday Friday Saturday Sunday Toomanylettersday Dday, Funday'
x = '\\b[A-Z][a-z]{2,5}day\\b'
re.findall( x, s )
Out[9]:
['Monday',
 'Tuesday',
 'Wednesday',
 'Thursday',
 'Friday',
 'Saturday',
 'Sunday',
 'Funday']
In [10]:
x = '(\\bMonday\\b|\\bTuesday\\b|\\bWednesday\\b|\\bThursday\\b|\\bFriday\\b|\\bSaturday\\b|\\bSunday\\b)'
re.findall( x, s )
Out[10]:
['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

Comments:

There are many possibilities here. The second option above is probably better because if the intent is to match only the days of the week that exist, we might as well just list the possibilities since there are only 7 of them. The first option matches some non-existent day names.

4. Compose a regular expression that will match all text enclosed by parentheses (like this) even if the text between parentheses includes newline characters. The results of re.findall() should not include the parentheses themselves.

In [12]:
s = 'I (personally) would like to totally eliminate all unnecessary parentheses (those \ncurved bracket things).'
s
Out[12]:
'I (personally) would like to totally eliminate all unnecessary parentheses (those \ncurved bracket things).'
In [14]:
x = '\(([^\)]*?)\)'
for item in re.findall( x, s ):
    print('!' + item)
!personally
!those 
curved bracket things

Comments:

We are looking for the match to begin with a literal (, and end with a literal ). In between, we want non-close-parentheses - at least one of them. We use lazy matching so we don't get everything between the first ( and the last ) as a single match.

The following simpler attempt doesn't meet the specifications because . matches any character except a newline.

In [15]:
x = '\((.+?)\)'
re.findall( x, s )
Out[15]:
['personally']