python使用BeautifulSoup分析网页信息的方法
本文实例讲述了python使用BeautifulSoup分析网页信息的方法。分享给大家供大家参考。具体如下:
这段python代码查找网页上的所有链接,分析所有的span标签,并查找class包含titletext的span的内容
#importthelibraryusedtoqueryawebsite importurllib2
#specifytheurlyouwanttoquery url="http://www.python.org"
#Querythewebsiteandreturnthehtmltothevariable'page' page=urllib2.urlopen(url)
#importtheBeautifulsoupfunctionstoparsethedatareturnedfromthewebsite fromBeautifulSoupimportBeautifulSoup
#Parsethehtmlinthe'page'variable,andstoreitinBeautifulSoupformat soup=BeautifulSoup(page)
#toprintthesoup.headistheheadtagandsoup.head.titleisthetitletag printsoup.head printsoup.head.title
#toprintthelengthofthepage,usethelenfunction printlen(page)
#createanewvariabletostorethedatayouwanttofind. tags=soup.findAll('a')
#toprintallthelinks printtags
#togetalltitlesandprintthecontentsofeachtitle titles=soup.findAll('span',attrs={'class':'titletext'}) fortitleinallTitles: printtitle.contents