爬虫学习（12）：爬取诗词名句网并且下载保存-白红宇的个人博客

爬虫学习（12）：爬取诗词名句网并且下载保存

发布日期：2021-06-29 14:38:54 浏览次数：2 分类：技术文章

本文共 1210 字，大约阅读时间需要 4 分钟。

用BeautifulSoup爬取并且下载。仅仅用作学习用途哈，不然又侵权了。

效果：

在这里插入图片描述

由于我是正在自学爬虫，不是很能找到非常优化的办法，是一名计算机大二学生，代码可能不是很好，还请大神指点,这是我扣扣群：970353786，希望更多喜欢学习python的可以跟我一起学习交流。

上代码：

import requestsfrom bs4 import BeautifulSoupheaders = {
       'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36 Edg/86.0.622.63'}url = 'https://www.shicimingju.com/book/hongloumeng.html'page_text = requests.get(url=url,headers=headers).content.decode('utf-8')soup = BeautifulSoup(page_text,'lxml')mulu=soup.find_all(attrs={
   'class':'book-mulu'})# mulu=soup.select('.book-mulu')# print(mulu)fp = open('./论语.txt','w',encoding='utf-8')for ul in mulu:    a=ul.find_all(name='a')    for i in a:        title = i.string        new_url = 'https://www.shicimingju.com' + i['href']        # print(new_url)        # print(title)        html=requests.get(url=new_url,headers=headers).content.decode('utf-8')        new_soup=BeautifulSoup(html,'lxml')        # print(soup)        for  wenben in new_soup.find_all('div',{
   'class':'chapter_content'}):            print(wenben.text)            c=wenben.text            fp.write(title + ':' + c + '\n')            print('下载成功')

有问题群里找我，或者这里留言都可以

转载地址：https://chuanchuan.blog.csdn.net/article/details/113668163 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：爬虫学习（13）：爬取坑爹网gif图

下一篇：ResultSet object has no attribute ‘text‘

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

发表评论

最新留言

关于作者

推荐文章