Selenium+Scrapy 漫画堆漫画爬虫

采集自漫画堆，本意只是学习scrapy框架和selenium的整合，考虑到版权争议，爬取的结果就不放出来了。

代码是以妄想学生会为例，如果想爬别的漫画，可以自行修改 manhua.py 第10行

start_urls = ['https://www.manhuadui.com/manhua/wangxiangxueshenghui/']

和第13行

Rule(LinkExtractor(allow=r'https://www.manhuadui.com/manhua/wangxiangxueshenghui/\d+.html'), callback="parse_first_page", follow=False),

替换成对应漫画的链接。

下载好的图片会以 漫画名-第几话第几张图.jpg 的格式保存在项目的根目录下。

命令行输入

scrapy crawl manhua [-s LOG_FILE=manhua.log]

或直接运行 start.py