Python BeautifulSoup similar divs in the same container sorting -
i able scrape data want, since divs in 1 container "content-body" when results dumped whole ( can test code , see) in match_date match tourny.
import requests bs4 import beautifulsoup, tag lxml import html import requests import mysqldb import urllib2 import itertools import re import sys datetime import date, timedelta td, datetime  urls =("http://www.esportsheaven.net/?page=match") hdr = {'user-agent': 'mozilla/5.0'} req = urllib2.request(urls,headers=hdr) page = urllib2.urlopen(req) soup = beautifulsoup(page)  tournament=soup.findall('div',{'class':['content-body']}) match_time = soup.find_all("div", style = "width:10%; float:left;") match = soup.find_all("div", style = "width:46%; float:left; margin-left:2%; margin-right:2%") tourny = soup.find_all("div", style = "width:40%; float:left; overflow:hidden;") tag in tournament:     tag in match_time:         print tag.text     tag1 in match:         print tag1.text     tag2 in tourny:         print tag2.text     print '===============' i have tried few other methods , loop did not result want want is:
match_date , match , tourny
==================
and loops on of them need store data in database
your parsing code correct respect extracting elements. however, find methods match_time, math , tourny should respect variable tournament , not soup. searching respect variable soup searches entire document. searching respect tournament searches content div interested in.
if @ pages html, there 1 div class content-body. so, find_all call makes no sense. do:
tournament = soup.find('div',{'class':['content-body']}) now find match_times, match_names , tourny's
match_times = tournament.find_all("div", style = "width:10%; float:left;") match_names = tournament.find_all("div", style = "width:46%; float:left; margin-left:2%; margin-right:2%") tournys = tournament.find_all("div", style = "width:40%; float:left; overflow:hidden;") the lengths of 3 arrays same. zip them access them follows:
for element in zip(match_times, match_names, tournys):     print element[0].text, element[1].text, element[2].text this should give looking for.
Comments
Post a Comment