python - scrapy avoid crawler logging out -


i using scrapy library facilitate crawling website.

the website uses authentication , can login page using scrapy.

the page has url log out user , destroy session.

how ensure scrapy avoids logout page when crawling?

if using link extractors , don't want follow particular "logout" link, can set deny property:

rules = [rule(sgmllinkextractor(deny=[r'logout/']), follow=true),] 

another option check response.url inside spider's parse method:

def parse(self, response):     if 'logout' in response.url:         return      # extract items 

hope helps.


Comments

Popular posts from this blog

javascript - Unusual behaviour when drawing lots of images onto a large canvas -

how can i manage url using .htaccess in php? -

javascript - Chart.js - setting tooltip z-index -