python - How to select a class of div inside of a div with beautiful soup? -
i have bunch of div tags within div tags:
<div class="foo"> <div class="bar">i want this</div> <div class="unwanted">not this</div> </div> <div class="bar">don't want either </div>
so i'm using python , beautiful soup separate stuff out. need "bar" class when wrapped inside of "foo" class div. here's code
from bs4 import beautifulsoup soup = beautifulsoup(open(r'c:\test.htm')) tag = soup.div each_div in soup.findall('div',{'class':'foo'}): print(tag["bar"]).encode("utf-8")
alternately, tried:
from bs4 import beautifulsoup soup = beautifulsoup(open(r'c:\test.htm')) each_div in soup.findall('div',{'class':'foo'}): print(each_div.findall('div',{'class':'bar'})).encode("utf-8")
what doing wrong? happy simple print(each_div) if remove div class "unwanted" selection.
you can use find_all()
search every <div>
elements foo
attribute , each 1 of them use find()
bar
attribute, like:
from bs4 import beautifulsoup import sys soup = beautifulsoup(open(sys.argv[1], 'r'), 'html') foo in soup.find_all('div', attrs={'class': 'foo'}): bar = foo.find('div', attrs={'class': 'bar'}) print(bar.text)
run like:
python3 script.py htmlfile
that yields:
i want
update: assuming there exists several <div>
elements bar
attribute, previous script won't work. find first one. descendants , iterate them, like:
from bs4 import beautifulsoup import sys soup = beautifulsoup(open(sys.argv[1], 'r'), 'html') foo in soup.find_all('div', attrs={'class': 'foo'}): foo_descendants = foo.descendants d in foo_descendants: if d.name == 'div' , d.get('class', '') == ['bar']: print(d.text)
with input like:
<div class="foo"> <div class="bar">i want this</div> <div class="unwanted">not this</div> <div class="bar">also want this</div> </div>
it yield:
i want want
Comments
Post a Comment