python - scrapy avoid crawler logging out -
i using scrapy library facilitate crawling website.
the website uses authentication , can login page using scrapy.
the page has url log out user , destroy session.
how ensure scrapy avoids logout page when crawling?
if using link extractors , don't want follow particular "logout" link, can set deny
property:
rules = [rule(sgmllinkextractor(deny=[r'logout/']), follow=true),]
another option check response.url
inside spider's parse
method:
def parse(self, response): if 'logout' in response.url: return # extract items
hope helps.
Comments
Post a Comment