爬虫：快速爬取一个网站

大大源码 • 2023年3月13日 pm11:14 • 其他

本次使用的是pycharm软件进行爬取的

首先要导入本次爬虫用到的包

from urllib.request import urlopen

然后确定你需要爬取网站的地址，我这边直接爬了百度的主页

代码如下

from urllib.request import urlopen
# 确定要爬取网址的路径
url = "http://www.baidu.com"
# 访问网址得到相应
resp = urlopen(url)
# 想得到内容
# decode 就是为了解码,以utf-8的形式解码
# print(resp.read().decode("utf-8"))
with open("mybaidu.html", mode="w", encoding="utf-8") as f:
    f.write(resp.read().decode("utf-8"))  # 从响应中读取到页面源代码
print("保存结束")