两个Python迷你爬虫

Yaodo·2019-10-22·1.14k 次阅读

CPA考完了，终于有时间放松放松，学习一点自己喜欢的东西啦！

犹记得之前的Python网课才学到第二章，真得感谢Coursera没有删掉我的账号。

先拿之前学的小爬虫练练手，看自己是不是全部都忘光啦：

1 爬取我的博客的所有文章标题和链接并输出：

import requests
from bs4 import BeautifulSoup
url = 'https://www.imtrq.com/page/'
index = 1
for i in range(0,17):
r = requests.get(url+str(i))
soup = BeautifulSoup(r.text,'lxml')
archieves = soup.find_all('h1','entry-title')
for items in archieves:
print(index,items.string,items.contents[0].get('href'))
index += 1
i += 1

效果：

2 爬取我博客的文章缩略图保存到本地

import requests
from bs4 import BeautifulSoup
url = 'https://www.imtrq.com/page/'
path = '/Users/XXXXX/Documents/photo/'
index = 1
for page in range(1,17):
response = requests.get(url+str(page))
soup = BeautifulSoup(response.text,'lxml')
fimg = soup.find_all('img','attachment-post-thumbnail size-post-thumbnail wp-post-image')
for items in fimg:
img = requests.get(items.get('src'))
with open(path+str(index)+'.jpg','wb') as file:
file.write(img.content)
file.flush
file.close
index+=1

查看评论 - 1 条评论

Comments | 1 条评论

博主傲娇的小基基

回复

发布于 2019-10-22 12:20

没看BeautifulSoup官方文档的时候我还以为只能用正则提取链接，代码多了一倍哈哈哈

取消回复

Markdown Supported while Forbidden

你是我一生只会遇见一次的惊喜 ...

戳我试试 OωO 嘿嘿嘿ヾ(≧∇≦*)ゝ

bilibili~	(=・ω・=)	Tieba

滴，学生卡

切换主题 | SCHEME TOOL