There are 2000 lines of links that require crawling, which can be divided into the following three types, bold font. Www. coursera. orgcourseinforiskwww. coursera. orgspecializationscloudcomputingwww. coursera. orglearnpython-dat... There are 2000 lines of links that need crawler processing, which can be divided into three types, with bold fonts.
Https://www.coursera.org/Course/Inforisk
Https://www.coursera.org/Specializations/Cloudcomputing
Https://www.coursera.org/Learn/Python-data
You need to write a regular expression in python as the judgment condition for the if Condition Statement. The keywords are "/course/"/specializations/"and"/learn, the structure is as follows:
If a line of link match/course /:
'''
Match a line of elif connections/specializations /:
'''
Match a line of else connections/learn /:
'''
I am a beginner in python. I have read a regular expression and I am not very familiar with it. Please write a regular expression that can be used as a matching condition for judgment. Then I can recommend some regular expression learning materials! Thx!
Reply content:
There are 2000 lines of links that require crawling, which can be divided into the following three types, bold font.
Https://www.coursera.org/Course/Inforisk
Https://www.coursera.org/Specializations/Cloudcomputing
Https://www.coursera.org/Learn/Python-data
You need to write a regular expression in python as the judgment condition for the if Condition Statement. The keywords are "/course/"/specializations/"and"/learn, the structure is as follows:
If a line of link match/course /:
'''
Match a line of elif connections/specializations /:
'''
Match a line of else connections/learn /:
'''
I am a beginner in python. I have read a regular expression and I am not very familiar with it. Please write a regular expression that can be used as a matching condition for judgment. Then I can recommend some regular expression learning materials! Thx!
Just check if there is any in?
def check_url(word, url): return word in urlfor u in urls: if check_url('/course/', u): do something elif check_url('/specializations/', u): do something elif check_url('/learn/', u): do something
import reraw = '''https://www.coursera.org/course/inforiskhttps://www.coursera.org/specializations/cloudcomputinghttps://www.coursera.org/learn/python-data'''check_func = lambda url, key: re.search('^https://www.coursera.org/%s/.+' % key, url)for url in raw.split('\n'): if check_func(url, 'course'): print 'I m course' elif check_func(url, 'specializations'): print 'I m specializations' elif check_func(url, 'learn'): print 'I m learn'
Regular expressions are used here..+
, Where.
Any character,+
Represents more than one character,.+
Is to match more than one arbitrary character.
In Pythonre
The module is a regular expression-related module.re.search
If the string is matched successfully, an object is returned. Otherwise, None is returned.
The re module also has other functions, suchre.findall
,re.match
,re.replace
And so on
You cannot directly use in. For example, if the url is https://www.coursera.org/course/specializations, both specializationsand coursecan be matched.
Hope to help you.