scrapy模拟登陆豆瓣为何产生NotImplementedError错误

xieyao(first time)已解决:把callback = self.parse_login, 改成callback = self.parse_page,就可以登录了。看图:scrapy模拟登陆豆瓣为何产生NotImplementedError错误

话说after_login函数什么用呢? 它就是从start_urls中读取链接,然后使用make_requests_from_url生成Request所以说 after_login函数的下一个函数(parse_page函数)收到的Request是after_login函数产生的。after_login函数在需要爬行多个链接时很需要好像。(我也不太懂啦)就这样。
■网友
---------------------------------------------------------------------------------------------------------------------------------感谢勿语星空丶的回答,具体更改见勿语星空丶的回答和题主在其下面的评论。PS.修改start_requests方法后虽然成功解决,但又多了一个问题,留个戳,等明白了再加注解吧。---------------------------------------------------------------------------------------------------------------------------------补充:题主使用CrawlSpider爬虫进行测试,也能解决问题,但爬虫并没有根据Rule规则进一步爬取网页,不知为何?重新编辑后的DoubanSpider.py代码如下:from scrapy.contrib.spiders import CrawlSpider, Rulefrom scrapy.selector import Selectorfrom scrapy.contrib.linkextractors.sgml import SgmlLinkExtractorfrom douban.items import DoubanItemfrom scrapy.http import Request,FormRequestfrom scrapy import logclass doubanSpider(CrawlSpider): name = \u0026#39;douban\u0026#39; allowed_domains = start_urls = rules = ( Rule(SgmlLinkExtractor(allow=(\u0026#39;/subject/.+?/\\?from=showing$\u0026#39;,)), #callback=\u0026#39;parse_item\u0026#39;, follow=True), ) #爬取movie.douban.com网页中正在热映的电影,数据稍有问题,但不是重点。 def __init__(self): super(doubanSpider,self).__init__() self.http_user = \u0026#39;******@hotmail.com\u0026#39; self.http_pass = \u0026#39;******\u0026#39; #login form self.formdata = https://www.zhihu.com/api/v4/questions/35215975/{/u0026#39;source/u0026#39;:/u0026#39;index_nav/u0026#39;, /u0026#39;form_email/u0026#39;:self.http_user, /u0026#39;form_password/u0026#39;:self.http_pass } self.headers = {/u0026#39;User-Agent/u0026#39;:/u0026#39;Mozilla/5.0 (Windows NT 6.3; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0/u0026#39;, /u0026#39;Accept-Encoding/u0026#39;:/u0026#39;gzip, deflate/u0026#39;, /u0026#39;Accept-Language/u0026#39;:/u0026#39;zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3/u0026#39;, /u0026#39;Connection/u0026#39;:/u0026#39;keep-alive/u0026#39; } def start_requests(self): return def post_login(self,response): print /u0026#39;Preparing login/u0026#39; return }, headers = self.headers, formdata = self.formdata, callback = self.after_login, dont_filter = True)] def after_login(self,response): print /u0026#39;after_login/u0026#39; print response.url for url in self.start_urls: yield FormRequest(url,meta = {/u0026#39;cookiejar/u0026#39;:response.meta}, headers = self.headers, callback = self.parse_item ) def parse_item(self,response): print /u0026#39;parse_item/u0026#39; sel = Selector(response) answer = sel.xpath(/u0026#39;//a/span/text()/u0026#39;).extract() print response.url items = item = DoubanItem() item = answer items.append(item) return items


推荐阅读