Python3中使用urllib的方法详解(header,代理,超时,认证,异常处理)

2024-03-12 11:44:03 39

我们可以利用urllib来抓取远程的数据进行保存哦，以下是python3抓取网页资源的多种方法，有需要的可以参考借鉴。

1、最简单

importurllib.request
response=urllib.request.urlopen('http://python.org/')
html=response.read()

2、使用Request

importurllib.request
req=urllib.request.Request('http://python.org/')
response=urllib.request.urlopen(req)
the_page=response.read()

3、发送数据

#!/usr/bin/envpython3
importurllib.parse
importurllib.request
url='http://localhost/login.php'
user_agent='Mozilla/4.0(compatible;MSIE5.5;WindowsNT)'
values={
'act':'login',
'login[email]':'yzhang@i9i8.com',
'login[password]':'123456'
}
data=urllib.parse.urlencode(values)
req=urllib.request.Request(url,data)
req.add_header('Referer','http://www.python.org/')
response=urllib.request.urlopen(req)
the_page=response.read()
print(the_page.decode("utf8"))

4、发送数据和header

#!/usr/bin/envpython3
importurllib.parse
importurllib.request
url='http://localhost/login.php'
user_agent='Mozilla/4.0(compatible;MSIE5.5;WindowsNT)'
values={
'act':'login',
'login[email]':'yzhang@i9i8.com',
'login[password]':'123456'
}
headers={'User-Agent':user_agent}
data=urllib.parse.urlencode(values)
req=urllib.request.Request(url,data,headers)
response=urllib.request.urlopen(req)
the_page=response.read()
print(the_page.decode("utf8"))

5、http错误

#!/usr/bin/envpython3
importurllib.request
req=urllib.request.Request('https://www.nhooo.com')
try:
urllib.request.urlopen(req)
excepturllib.error.HTTPErrorase:
print(e.code)
print(e.read().decode("utf8"))

6、异常处理1

#!/usr/bin/envpython3
fromurllib.requestimportRequest,urlopen
fromurllib.errorimportURLError,HTTPError
req=Request("https://www.nhooo.com/")
try:
response=urlopen(req)
exceptHTTPErrorase:
print('Theservercouldn'tfulfilltherequest.')
print('Errorcode:',e.code)
exceptURLErrorase:
print('Wefailedtoreachaserver.')
print('Reason:',e.reason)
else:
print("good!")
print(response.read().decode("utf8"))

7、异常处理2

#!/usr/bin/envpython3
fromurllib.requestimportRequest,urlopen
fromurllib.errorimportURLError
req=Request("https://www.nhooo.com/")
try:
response=urlopen(req)
exceptURLErrorase:
ifhasattr(e,'reason'):
print('Wefailedtoreachaserver.')
print('Reason:',e.reason)
elifhasattr(e,'code'):
print('Theservercouldn'tfulfilltherequest.')
print('Errorcode:',e.code)
else:
print("good!")
print(response.read().decode("utf8"))

8、HTTP认证

#!/usr/bin/envpython3
importurllib.request
#createapasswordmanager
password_mgr=urllib.request.HTTPPasswordMgrWithDefaultRealm()
#Addtheusernameandpassword.
#Ifweknewtherealm,wecoulduseitinsteadofNone.
top_level_url="https://www.nhooo.com/"
password_mgr.add_password(None,top_level_url,'rekfan','xxxxxx')
handler=urllib.request.HTTPBasicAuthHandler(password_mgr)
#create"opener"(OpenerDirectorinstance)
opener=urllib.request.build_opener(handler)
#usetheopenertofetchaURL
a_url="https://www.nhooo.com/"
x=opener.open(a_url)
print(x.read())
#Installtheopener.
#Nowallcallstourllib.request.urlopenuseouropener.
urllib.request.install_opener(opener)
a=urllib.request.urlopen(a_url).read().decode('utf8')
print(a)

9、使用代理

#!/usr/bin/envpython3
importurllib.request
proxy_support=urllib.request.ProxyHandler({'sock5':'localhost:1080'})
opener=urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

a=urllib.request.urlopen("https://www.nhooo.com").read().decode("utf8")
print(a)

10、超时

#!/usr/bin/envpython3
importsocket
importurllib.request
#timeoutinseconds
timeout=2
socket.setdefaulttimeout(timeout)
#thiscalltourllib.request.urlopennowusesthedefaulttimeout
#wehavesetinthesocketmodule
req=urllib.request.Request('https://www.nhooo.com/')
a=urllib.request.urlopen(req).read()
print(a)

总结

以上就是这篇文章的全部内容，希望本文的内容对大家学习或使用python能有所帮助，如果有疑问大家可以留言交流。

Python3中使用urllib的方法详解(header,代理,超时,认证,异常处理)

热门推荐

随机推荐