Python3中使用urllib的方法详解(header,代理,超时,认证,异常处理)
我们可以利用urllib来抓取远程的数据进行保存哦,以下是python3抓取网页资源的多种方法,有需要的可以参考借鉴。
1、最简单
importurllib.request response=urllib.request.urlopen('http://python.org/') html=response.read()
2、使用Request
importurllib.request req=urllib.request.Request('http://python.org/') response=urllib.request.urlopen(req) the_page=response.read()
3、发送数据
#!/usr/bin/envpython3 importurllib.parse importurllib.request url='http://localhost/login.php' user_agent='Mozilla/4.0(compatible;MSIE5.5;WindowsNT)' values={ 'act':'login', 'login[email]':'yzhang@i9i8.com', 'login[password]':'123456' } data=urllib.parse.urlencode(values) req=urllib.request.Request(url,data) req.add_header('Referer','http://www.python.org/') response=urllib.request.urlopen(req) the_page=response.read() print(the_page.decode("utf8"))
4、发送数据和header
#!/usr/bin/envpython3 importurllib.parse importurllib.request url='http://localhost/login.php' user_agent='Mozilla/4.0(compatible;MSIE5.5;WindowsNT)' values={ 'act':'login', 'login[email]':'yzhang@i9i8.com', 'login[password]':'123456' } headers={'User-Agent':user_agent} data=urllib.parse.urlencode(values) req=urllib.request.Request(url,data,headers) response=urllib.request.urlopen(req) the_page=response.read() print(the_page.decode("utf8"))
5、http错误
#!/usr/bin/envpython3 importurllib.request req=urllib.request.Request('https://www.nhooo.com') try: urllib.request.urlopen(req) excepturllib.error.HTTPErrorase: print(e.code) print(e.read().decode("utf8"))
6、异常处理1
#!/usr/bin/envpython3 fromurllib.requestimportRequest,urlopen fromurllib.errorimportURLError,HTTPError req=Request("https://www.nhooo.com/") try: response=urlopen(req) exceptHTTPErrorase: print('Theservercouldn'tfulfilltherequest.') print('Errorcode:',e.code) exceptURLErrorase: print('Wefailedtoreachaserver.') print('Reason:',e.reason) else: print("good!") print(response.read().decode("utf8"))
7、异常处理2
#!/usr/bin/envpython3 fromurllib.requestimportRequest,urlopen fromurllib.errorimportURLError req=Request("https://www.nhooo.com/") try: response=urlopen(req) exceptURLErrorase: ifhasattr(e,'reason'): print('Wefailedtoreachaserver.') print('Reason:',e.reason) elifhasattr(e,'code'): print('Theservercouldn'tfulfilltherequest.') print('Errorcode:',e.code) else: print("good!") print(response.read().decode("utf8"))
8、HTTP认证
#!/usr/bin/envpython3 importurllib.request #createapasswordmanager password_mgr=urllib.request.HTTPPasswordMgrWithDefaultRealm() #Addtheusernameandpassword. #Ifweknewtherealm,wecoulduseitinsteadofNone. top_level_url="https://www.nhooo.com/" password_mgr.add_password(None,top_level_url,'rekfan','xxxxxx') handler=urllib.request.HTTPBasicAuthHandler(password_mgr) #create"opener"(OpenerDirectorinstance) opener=urllib.request.build_opener(handler) #usetheopenertofetchaURL a_url="https://www.nhooo.com/" x=opener.open(a_url) print(x.read()) #Installtheopener. #Nowallcallstourllib.request.urlopenuseouropener. urllib.request.install_opener(opener) a=urllib.request.urlopen(a_url).read().decode('utf8') print(a)
9、使用代理
#!/usr/bin/envpython3 importurllib.request proxy_support=urllib.request.ProxyHandler({'sock5':'localhost:1080'}) opener=urllib.request.build_opener(proxy_support) urllib.request.install_opener(opener) a=urllib.request.urlopen("https://www.nhooo.com").read().decode("utf8") print(a)
10、超时
#!/usr/bin/envpython3 importsocket importurllib.request #timeoutinseconds timeout=2 socket.setdefaulttimeout(timeout) #thiscalltourllib.request.urlopennowusesthedefaulttimeout #wehavesetinthesocketmodule req=urllib.request.Request('https://www.nhooo.com/') a=urllib.request.urlopen(req).read() print(a)
总结
以上就是这篇文章的全部内容,希望本文的内容对大家学习或使用python能有所帮助,如果有疑问大家可以留言交流。