淘先锋技术网

首页 1 2 3 4 5 6 7
DiveIntoPython(十六)

英文书地址:
http://diveintopython.org/toc/index.html

Chapter 17.Dynamic functions

17.1.Diving in
the rules for making singular nouns into plural nouns are varied and complex.

If you grew up in an English-speaking country or learned English in a formal school setting, you're probably familiar with the basic rules:

1.If a word ends in S, X, or Z, add ES. “Bass” becomes “basses”, “fax” becomes “faxes”, and “waltz” becomes “waltzes”.

2.If a word ends in a noisy H, add ES; if it ends in a silent H, just add S. What's a noisy H? One that gets combined with other letters to make a sound that you can hear. So “coach” becomes “coaches” and “rash” becomes “rashes”, because you can hear the CH and SH sounds when you say them. But “cheetah” becomes “cheetahs”, because the H is silent.

3.If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else, just add S. So “vacancy” becomes “vacancies”, but “day” becomes “days”.

4.If all else fails, just add S and hope for the best.

5.there are a lot of exceptions. “Man” becomes “men” and “woman” becomes “women”, but “human” becomes “humans”. “Mouse” becomes “mice” and “louse” becomes “lice”, but “house” becomes “houses”. “Knife” becomes “knives” and “wife” becomes “wives”, but “lowlife” becomes “lowlifes”. And don't even get me started on words that are their own plural, like “sheep”, “deer”, and “haiku”

17.2.plural.py ,stage 1
So you're looking at words, which at least in English are strings of characters. And you have rules that say you need to find different combinations of characters, and then do different things to them. This sounds like a job for regular expressions.

example 17.1.plural1.py
import re

def plural(noun):
if re.search('[sxz]$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeioudgkprt]h$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'

The square brackets mean “match exactly one of these characters”. So [sxz] means “s, or x, or z”, but only one of them. The $ should be familiar; it matches the end of string. So you're checking to see if noun ends with s, x, or z.

example 17.2.Introducing re.sub
>>> import re
>>> re.search('[abc]','Mark')
<_sre.SRE_Match object at 0x0142F870>
>>> re.sub('[abc]','o','Mark')
'Mork'
>>> re.sub('[abc]','o','rock')
'rook'
>>> re.sub('[abc]','o','caps')
'oops'

You might think this would turn caps into oaps, but it doesn't. re.sub replaces all of the matches, not just the first one. So this regular expression turns caps into oops, because both the c and the a get turned into o.

example 17.3.Back to plural1.py
import re

def plural(noun):
if re.search('[sxz]$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeioudgkprt]h$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'

Look closely, this is another new variation. The ^ as the first character inside the square brackets means something special: negation. [^abc] means “any single character except a, b, or c”. So [^aeioudgkprt] means any character except a, e, i, o, u, d, g, k, p, r, or t. Then that character needs to be followed by h, followed by end of string. You're looking for words that end in H where the H can be heard.

Same pattern here: match words that end in Y, where the character before the Y is not a, e, i, o, or u. You're looking for words that end in Y that sounds like I.

example 17.4.More on negation regular expressions
>>> import re
>>> re.search('[^aeiou]y$','vacancy')
<_sre.SRE_Match object at 0x0142FA30>
>>> re.search('[^aeiou]y$','boy')
>>> re.search('[^aeiou]y$','day')
>>> re.search('[^aeiou]y$','pita')

vacancy matches this regular expression, because it ends in cy, and c is not a, e, i, o, or u.

boy does not match, because it ends in oy, and you specifically said that the character before the y could not be o. day does not match, because it ends in ay.

example 17.5.More on re.sub
>>> re.sub('y$','ies','vacancy')
'vacancies'
>>> re.sub('y$','ies','agency')
'agencies'
>>> re.sub('([^aeiou])y$',r'\1ies','vacancy')
'vacancies'

Most of it should look familiar: you're using a remembered group, which you learned in Section 7.6, “Case study: Parsing Phone Numbers”, to remember the character before the y. Then in the substitution string, you use a new syntax, \1, which means “hey, that first group you remembered? put it here”. In this case, you remember the c before the y, and then when you do the substitution, you substitute c in place of c, and ies in place of y. (If you have more than one remembered group, you can use \2 and \3 and so on.)

17.3.plural.py,stage 2

example 17.6.plura12.py
import re

def match_sxz(noun):
return re.search('[sxz]$', noun)

def apply_sxz(noun):
return re.sub('$', 'es', noun)

def match_h(noun):
return re.search('[^aeioudgkprt]h$', noun)

def apply_h(noun):
return re.sub('$', 'es', noun)

def match_y(noun):
return re.search('[^aeiou]y$', noun)

def apply_y(noun):
return re.sub('y$', 'ies', noun)

def match_default(noun):
return 1

def apply_default(noun):
return noun + 's'

rules = ((match_sxz, apply_sxz),
(match_h, apply_h),
(match_y, apply_y),
(match_default, apply_default)
)

def plural(noun):
for matchesRule, applyRule in rules:
if matchesRule(noun):
return applyRule(noun)

This version looks more complicated (it's certainly longer), but it does exactly the same thing: try to match four different rules, in order, and apply the appropriate regular expression when a match is found. The difference is that each individual match and apply rule is defined in its own function, and the functions are then listed in this rules variable, which is a tuple of tuples.

example 17.7.Unrolling the plural function
def plural(noun):
if match_sxz(noun):
return apply_sxz(noun)
if match_h(noun):
return apply_h(noun)
if match_y(noun):
return apply_y(noun)
if match_default(noun):
return apply_default(noun)

17.4.plural.py, stage 3

example 17.8.plural3.py
import re

rules = \
(
(
lambda word: re.search('[sxz]$', word),
lambda word: re.sub('$', 'es', word)
),
(
lambda word: re.search('[^aeioudgkprt]h$', word),
lambda word: re.sub('$', 'es', word)
),
(
lambda word: re.search('[^aeiou]y$', word),
lambda word: re.sub('y$', 'ies', word)
),
(
lambda word: re.search('$', word),
lambda word: re.sub('$', 's', word)
)
)

def plural(noun):
for matchesRule, applyRule in rules:
if matchesRule(noun):
return applyRule(noun)

This is the same set of rules as you defined in stage 2. The only difference is that instead of defining named functions like match_sxz and apply_sxz, you have “inlined” those function definitions directly into the rules list itself, using lambda functions.

17.5.plural.py, stage 4

example 17.9.plural4.py

import re

def buildMatchAndApplyFunctions((pattern, search, replace)):
matchFunction = lambda word: re.search(pattern, word)
applyFunction = lambda word: re.sub(search, replace, word)
return (matchFunction, applyFunction)

buildMatchAndApplyFunctions is a function that builds other functions dynamically. It takes pattern, search and replace (actually it takes a tuple, but more on that in a minute), and you can build the match function using the lambda syntax to be a function that takes one parameter (word) and calls re.search with the pattern that was passed to the buildMatchAndApplyFunctions function, and the word that was passed to the match function you're building. Whoa.

example 17.10.plural4.py continued
patterns = \
(
('[sxz]$', '$', 'es'),
('[^aeioudgkprt]h$', '$', 'es'),
('(qu|[^aeiou])y$', 'y$', 'ies'),
('$', '$', 's')
)
rules = map(buildMatchAndApplyFunctions, patterns)

This line is magic. It takes the list of strings in patterns and turns them into a list of functions. How? By mapping the strings to the buildMatchAndApplyFunctions function, which just happens to take three strings as parameters and return a tuple of two functions. This means that rules ends up being exactly the same as the previous example: a list of tuples, where each tuple is a pair of functions, where the first function is the match function that calls re.search, and the second function is the apply function that calls re.sub.

example 17.11.Unrolling the rules definition

example 17.12.plural4.py, finishing up

example 17.13.Another look at buildMatchAndApplyFunctions

example 17.14.Expanding tuples when calling functions
>>> def foo((a,b,c)):
... print c
... print b
... print a
...
>>> parameters = ('apple','bear','catnap')
>>> foo(parameters)
catnap
bear
apple

17.6.plural.py,stage 5
First, let's create a text file that contains the rules you want. No fancy data structures, just space- (or tab-)delimited strings in three columns. You'll call it rules.en; “en” stands for English. These are the rules for pluralizing English nouns. You could add other rule files for other languages later.

example 17.15.rules.en
[sxz]$ $ es
[^aeioudgkprt]h$ $ es
[^aeiou]y$ y$ ies
$ $ s

example 17.16.plural5.py
import re
import string

def buildRule((pattern, search, replace)):
return lambda word: re.search(pattern, word) and re.sub(search, replace, word)

def plural(noun, language='en'):
lines = file('rules.%s' % language).readlines()
patterns = map(string.split, lines)
rules = map(buildRule, patterns)
for rule in rules:
result = rule(noun)
if result: return result

return lambda word: re.search(pattern, word) and re.sub(search, replace, word)
This will let you accomplish the same thing as having two functions, but you'll need to call it differently, as you'll see in a minute.

17.7.plural.py, stage 6

example 17.17.plural6.py
import re

def rules(language):
for line in file('rules.%s' % language):
pattern, search, replace = line.split()
yield lambda word: re.search(pattern, word) and re.sub(search, replace, word)

def plural(noun, language='en'):
for applyRule in rules(language):
result = applyRule(noun)
if result: return result

This uses a technique called generators, which I'm not even going to try to explain until you look at a simpler example first.

example 17.18.Introducing generators
>>> def make_counter(x):
... print 'entering make_counter'
... while 1:
... yield x
... print 'incrementing x'
... x = x + 1
...
>>> counter = make_counter(2)
>>> counter
<generator object make_counter at 0x01367508>
>>> counter.next()
entering make_counter
2
>>> counter.next()
incrementing x
3
>>> counter.next()
incrementing x
4

The presence of the yield keyword in make_counter means that this is not a normal function. It is a special kind of function which generates values one at a time. You can think of it as a resumable function. Calling it will return a generator that can be used to generate successive values of x.

The make_counter function returns a generator object.

The first time you call the next() method on the generator object, it executes the code in make_counter up to the first yield statement, and then returns the value that was yielded. In this case, that will be 2, because you originally created the generator by calling make_counter(2).

Repeatedly calling next() on the generator object resumes where you left off and continues until you hit the next yield statement. The next line of code waiting to be executed is the print statement that prints incrementing x, and then after that the x = x + 1 statement that actually increments it. Then you loop through the while loop again, and the first thing you do is yield x, which returns the current value of x (now 3).

Since make_counter sets up an infinite loop, you could theoretically do this forever, and it would just keep incrementing x and spitting out values. But let's look at more productive uses of generators instead.

example 17.19.Using generators instead of recursion recursion [ri'kə:ʃən, -ʒən] n. 递归,循环;递归式
def fibonacci(max):
a, b = 0, 1
while a < max:
yield a
a, b = b, a+b

The Fibonacci sequence is a sequence of numbers where each number is the sum of the two numbers before it. It starts with 0 and 1, goes up slowly at first, then more and more rapidly. To start the sequence, you need two variables: a starts at 0, and b starts at 1.

a is the current number in the sequence, so yield it.

b is the next number in the sequence, so assign that to a, but also calculate the next value (a+b) and assign that to b for later use. Note that this happens in parallel; if a is 3 and b is 5, then a, b = b, a+b will set a to 5 (the previous value of b) and b to 8 (the sum of the previous values of a and b).

example 17.20.Generators in for loops
>>> for n in fibonacci(1000):
... print n,
...
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

You can use a generator like fibonacci in a for loop directly. The for loop will create the generator object and successively call the next() method to get values to assign to the for loop index variable (n).

Each time through the for loop, n gets a new value from the yield statement in fibonacci, and all you do is print it out. Once fibonacci runs out of numbers (a gets bigger than max, which in this case is 1000), then the for loop exits gracefully.

example 17.21.Generators that generate dynamic functions
def rules(language):
for line in file('rules.%s' % language):
pattern, search, replace = line.split()
yield lambda word: re.search(pattern, word) and re.sub(search, replace, word)

def plural(noun, language='en'):
for applyRule in rules(language):
result = applyRule(noun)
if result: return result

What do you yield? A function, built dynamically with lambda, that is actually a closure (it uses the local variables pattern, search, and replace as constants). In other words, rules is a generator that spits out rule functions.