好资源导航 » 文章资讯 » Python爬虫实现HTTP网络请求多种实现方式

Python爬虫实现HTTP网络请求多种实现方式

2023-07-22 19:43:05 70

1、通过urllib.requests模块实现发送请求并读取网页内容的简单示例如下：

#导入模块
importurllib.request
#打开需要爬取的网页
response=urllib.request.urlopen('http://www.baidu.com')
#读取网页代码
html=response.read()
#打印读取的内容
print(html)

结果：

b'\n\n\n\n\n\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93#form.bdsug{top:39px}.bdsug{display:none;position:absolute;width:535px;background:#fff;border:1pxsolid
………………（太多省略）

以上示例中是通过get请求方式获取百度的网页内容。

下面是通过urllib.request模块的post请求实现获取网页信息的内容：

#导入模块
importurllib.parse
importurllib.request
#将数据使用urlencode编码处理后，再使用encoding设置为utf-8编码
data=bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf-8')
#打开指定需要爬取的网页
response=urllib.request.urlopen('http://httpbin.org/post',data=data)
html=response.read()
#打印读取的内容
print(html)

结果：

b'{\n"args":{},\n"data":"",\n"files":{},\n"form":{\n"word":"hello"\n},\n"headers":{\n"Accept-Encoding":"identity",\n"Content-Length":"10",\n"Content-Type":"application/x-www-form-urlencoded",\n"Host":"httpbin.org",\n"User-Agent":"Python-urllib/3.7",\n"X-Amzn-Trace-Id":"Root=1-5ec3f607-00f717e823a5c268fe0e0be8"\n},\n"json":null,\n"origin":"123.139.39.71",\n"url":"http://httpbin.org/post"\n}\n'

2、urllib3模块

通过urllib3模块实现发送网络请求的示例代码：

#导入模块
importurllib3
#创建PoolManager对象，用于处理与线程池的连接以及线程安全的所有细节
http=urllib3.PoolManager()
#对需要爬取的网页发送请求
response=http.request('GET','https://www.baidu.com/')
#打印读取的内容
print(response.data)

结果：

b'\r\n\r\n\r\n\t\r\n\t\r\n\t\r\n\t\r\n\t\r\n\t\r\n\t\r\n\t\r\n\t\r\n\t\r\n\t\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93\r\n\t\r\n\t\r\n\t\r\n\t

返回顶部
514930285
czq8825@qq.com

Python爬虫实现HTTP网络请求多种实现方式

随机推荐