get values from table with BeautifulSoup Python -

- February 15, 2010

i have table extracting links , text. although can 1 or other. idea how both?

essentially need pull text: "text extract here"

 tr in rows:                     cols = tr.findall('td')                     count = len(cols)                     if len(cols) >1:                          third_column = tr.findall('td')[2].contents                         third_column_text = str(third_column)                         third_columnsoup = beautifulsoup(third_column_text)  #issue starts here. how can either text of elm <td>text here</td> or href text<a href="somewhere.html">text here</a>                         elm in third_columnsoup.findall("a"):                             #print elm.text, third_columnsoup                             item = { "code": random.upper(),                                         "name": elm.text }                             items.insert(item )

the html code following

<table cellpadding="2" cellspacing="0" id="listresults">     <tbody>         <tr class="even">             <td colspan="4">sort results: <a href=             "/~/search/af.aspx?some=lol&amp;category=all&amp;page=0&amp;string=&amp;s=a"             rel="nofollow" title=             "sort results in alphabetical order">alphabetical</a>&nbsp;&nbsp;|&nbsp;&nbsp;<strong>rank</strong>&nbsp;&nbsp;<a href="/as.asp#rank">?</a></td>         </tr>          <tr class="even">             <th>aaa</th>              <th>vvv.</th>              <th>gdfgd</th>              <td></td>         </tr>          <tr class="odd">             <td align="right" width="32">******</td>              <td nowrap width="60"><a href="/aaa.html" title=             "more info , direct link meaning...">aaa</a></td>              <td>text extract here</td>              <td width="24"></td>         </tr>          <tr class="even">             <td align="right" width="32">******</td>              <td nowrap width="60"><a href="/somelink.html"             title="more info , direct link meaning...">aaa</a></td>              <td><a href=             "http://www.fdssfdfdsa.com/aaa">text extract here</a></td>              <td width="24">                 <a href=                 "/~/search/google.aspx?q=lhfjl&amp;f=a&amp;cx=partner-pub-2259206618774155:1712475319&amp;cof=forid:10&amp;ie=utf-8"><img border="0"                 height="21" src="/~/st/i/find2.gif" width="21"></a>             </td>         </tr>          <tr>             <td width="24"></td>         </tr>          <tr>             <td align="center" colspan="4" style="padding-top:6pt">             <b>note:</b> have 5575 other definitions <strong><a href=             "http://www.ddfsadfsa.com/aaa.html">aaa</a></strong> in our             database</td>         </tr>    </tbody> </table>

you can use text property on td element:

from bs4 import beautifulsoup  html = """here goes html"""  soup = beautifulsoup(html, 'html.parser') tr in soup.find_all('tr'):     columns = tr.find_all('td')     if len(columns) > 2:         print columns[2].text

prints:

text extract here text extract here

hope helps.

Search This Blog

Cap

get values from table with BeautifulSoup Python -

Comments

Post a Comment

Popular posts from this blog

Need to Replace properties of single sql file using bat file -

postgresql - Lazarus + Postgres: incomplete startup packet -

c# - How to get the current UAC mode -