最近又懒又忙,又不想动,好久没更新了。其实我更想写一点日记,但是又总觉得无从下笔,不过我也不知道怎么有时间来弄这个乱七八糟的东西的。
写这种毫无营养的爬虫记录真是比写论文简单多了。【也有意思多了】
查询网站在这里:主机掌中宝,可以查到switch数字版游戏在不同地区的低价折扣,但是并不一定能满足每个人的需求。比如我,就想看看那些原价高,折扣力度大的游戏,所以我重拾自己那一点点爬虫技能。
一、准备数据库
首先建一个mysql数据库,名字就叫ns吧,然后创建一个数据表,名字就叫ns_discount,创建表直接用sql语句就行
CREATE TABLE IF NOT EXISTS `ns_discount`( `GAME_ID` INT NOT NULL AUTO_INCREMENT, `GAME` VARCHAR(100), `GAME_EN` VARCHAR(100), `HIT_COUNT` INT, `ID` INT, `IMAGE` VARCHAR(200), `IMAGE_M` VARCHAR(200), `PRICE` FLOAT, `REGION_NAME` VARCHAR(8), `SALE` INT, `TAGS` VARCHAR(20), PRIMARY KEY (`GAME_ID`) );
二、准备爬虫
用scrapy新建一个项目,定位到里面新建一个爬虫
scrapy startproject ns cd ns scrapy genspider nsp eshop-switch.com
要改的地方只有四个,其他的用框架自带的就行
1 修改nsp.py
import scrapy from scrapy.http import FormRequest from ns.items import NsItem class NspSpider(scrapy.Spider): name = 'nsp' allowed_domains = ['eshop-switch.com'] def start_requests(self): url = 'http://www.eshop-switch.com/game/queryGame' for page in range(1,233): data = { 'current_page': str(page), 'order_by': '0', 'search': '', 'tag': '', 'page_size': '24', } req = FormRequest(url, formdata=data, callback=self.parse_page) yield req def parse_page(self, response): sale_list = response.json()['list'] for i in sale_list: item = NsItem() for key in ['SALE','HIT_COUNT','GAME','IMAGE','REGION_NAME','PRICE','GAME_EN','IMAGE_M','ID','TAGS']: try: item[key] = i[key] except: pass yield item
2 修改item.py
import scrapy class NsItem(scrapy.Item): SALE = scrapy.Field() HIT_COUNT = scrapy.Field() GAME = scrapy.Field() IMAGE = scrapy.Field() REGION_NAME = scrapy.Field() PRICE = scrapy.Field() GAME_EN = scrapy.Field() IMAGE_M = scrapy.Field() ID = scrapy.Field() TAGS = scrapy.Field()
3 修改pipline.py
from itemadapter import ItemAdapter import pymysql class NsPipeline: def __init__(self): self.connection = pymysql.connect( host='127.0.0.1', user='root', password='hahahahahaha', db='ns', charset='utf8mb4', ) self.cursor = self.connection.cursor() def process_item(self, item, spider): columns = ', '.join(item.keys()) values = ', '.join(['\'{}\''.format(str(x).replace('\'','')) for x in item.values()]) insert_sql = "INSERT INTO ns_discount({}) VALUES ({})".format(columns, values) self.cursor.execute(insert_sql) self.connection.commit() return item def close_spider(self, spider): self.connection.close()
4 在settings里启用pipeline
然后运行爬虫就会爬到数据库里咯
三、查询结果
最后运行一下查询语句,看看有没有原价200以上,现价50以下的:
SELECT n.id,n.`GAME`,n.`PRICE`,n.`SALE`,round(n.`PRICE`/(1-n.`SALE`/100),2) AS origin,n.`REGION_NAME`,n.`TAGS` FROM `ns_discount` AS n WHERE N.`PRICE`<50 and round(n.`PRICE`/(1-n.`SALE`/100),2) >200 ORDER BY origin DESC, n.`SALE` DESC;
Comments | NOTHING