I am having some trouble wrapping my head around Python regular expressions to come up with a regular expression to extract specific values.
The page I am trying to parse has a number of productIds which appear in the following format
\"productId\":\"111111\"
I need to extract all the values, 111111 in this case.
解决方案t = "\"productId\":\"111111\""
m = re.match("\W*productId[^:]*:\D*(\d+)", t)
if m:
print m.group(1)
meaning match non-word characters (\W*), then productId followed by non-column characters ([^:]*) and a :. Then match non-digits (\D*) and match and capture following digits ((\d+)).
Output
111111