html - Python and Selenium - Scrape data from multiple siblings -
okay i'm new python , of course selenium. i'm trying scrape page data , work data in python , have selenium click links , store times etc...
the issue i've come across page isn't formatted way i'd like. instead of having this... title link1 link2 title2 link3 link4/a> have this
<tr> <td>title<td> </tr> <tr> <td> <a href>link1</a> </td> </tr> <tr> <td> <a href>link2</a> </td> </tr> <tr> <td> <a href>link3</a> </td> </tr>
heres html i'm working - http://pastebin.com/663t7mxc
what i'm trying is, of links categorise them based on title come under. e.g. title link 1 link 2 title 2 link 3 link 4 link 5 title 3 link 6
and on.
since links aren't children of same tag title i'm finding it's impossible me do.
this have far
def test(): print ("testing") browser = webdriver.chrome() browser.get("http://urlforpage.com") meetings = browser.find_elements_by_xpath('/html/body/div[2]/table[2]/tbody/tr/td') i=0 meet in meetings: venue = meet.get_attribute("class") if venue == "bold": print "venue: " + str(i) + " " + meet.text i+=1 elif venue == "racing-insert-linked-events nextoff-inner-wrapper nextoff-scrollable-wrapper": print ("links") print venue.href test()
i'm pulling title out based on "bold" class of class, issue is, don't know how pull url , link text links inside other tags.
any appreciated. thanks
trying change little of code possible, you're after?
def test(): print ('testing') browser = webdriver.chrome() browser.get('http://urlforpage.com') meetings = browser.find_elements_by_xpath('/html/body/div[2]/table[2]/tbody/tr/td') meet in meetings: if meet.get_attribute('class') == 'bold': print 'venue: {venue}'.format(venue=meet.text) else: try: anchor = meet.find_element_by_tag_name('a') print 'link: {link}, text: {text}'.format(link = anchor.get_attribute('href'), text = anchor.text) except nosuchelementexception: pass # worried if neither title (bold) nor contains anchor? test()
Comments
Post a Comment