python - How to select a class of div inside of a div with beautiful soup? -


i have bunch of div tags within div tags:

<div class="foo">      <div class="bar">i want this</div>      <div class="unwanted">not this</div> </div> <div class="bar">don't want either </div> 

so i'm using python , beautiful soup separate stuff out. need "bar" class when wrapped inside of "foo" class div. here's code

from bs4 import beautifulsoup soup = beautifulsoup(open(r'c:\test.htm')) tag = soup.div each_div in soup.findall('div',{'class':'foo'}):     print(tag["bar"]).encode("utf-8") 

alternately, tried:

from bs4 import beautifulsoup soup = beautifulsoup(open(r'c:\test.htm')) each_div in soup.findall('div',{'class':'foo'}):      print(each_div.findall('div',{'class':'bar'})).encode("utf-8") 

what doing wrong? happy simple print(each_div) if remove div class "unwanted" selection.

you can use find_all() search every <div> elements foo attribute , each 1 of them use find() bar attribute, like:

from bs4 import beautifulsoup import sys   soup = beautifulsoup(open(sys.argv[1], 'r'), 'html') foo in soup.find_all('div', attrs={'class': 'foo'}):     bar = foo.find('div', attrs={'class': 'bar'})     print(bar.text) 

run like:

python3 script.py htmlfile 

that yields:

i want 

update: assuming there exists several <div> elements bar attribute, previous script won't work. find first one. descendants , iterate them, like:

from bs4 import beautifulsoup import sys   soup = beautifulsoup(open(sys.argv[1], 'r'), 'html') foo in soup.find_all('div', attrs={'class': 'foo'}):     foo_descendants = foo.descendants     d in foo_descendants:         if d.name == 'div' , d.get('class', '') == ['bar']:             print(d.text) 

with input like:

<div class="foo">      <div class="bar">i want this</div>      <div class="unwanted">not this</div>      <div class="bar">also want this</div> </div> 

it yield:

i want want 

Comments

Popular posts from this blog

c# - How to get the current UAC mode -

postgresql - Lazarus + Postgres: incomplete startup packet -

javascript - Ajax jqXHR.status==0 fix error -