好资源导航 » 文章资讯 » python使用BeautifulSoup分析网页信息的方法

python使用BeautifulSoup分析网页信息的方法

2024-04-02 14:25:03 31

本文实例讲述了python使用BeautifulSoup分析网页信息的方法。分享给大家供大家参考。具体如下：

这段python代码查找网页上的所有链接，分析所有的span标签，并查找class包含titletext的span的内容

#importthelibraryusedtoqueryawebsite

importurllib2

#specifytheurlyouwanttoquery url="http://www.python.org"

#Querythewebsiteandreturnthehtmltothevariable'page' page=urllib2.urlopen(url)

#importtheBeautifulsoupfunctionstoparsethedatareturnedfromthewebsite fromBeautifulSoupimportBeautifulSoup

#Parsethehtmlinthe'page'variable,andstoreitinBeautifulSoupformat soup=BeautifulSoup(page)

#toprintthesoup.headistheheadtagandsoup.head.titleisthetitletag printsoup.head printsoup.head.title

#toprintthelengthofthepage,usethelenfunction printlen(page)

#createanewvariabletostorethedatayouwanttofind. tags=soup.findAll('a')

#toprintallthelinks printtags

#togetalltitlesandprintthecontentsofeachtitle titles=soup.findAll('span',attrs={'class':'titletext'}) fortitleinallTitles: printtitle.contents