Python BeautifulSoup similar divs in the same container sorting -
i able scrape data want, since divs in 1 container "content-body" when results dumped whole ( can test code , see) in match_date match tourny.
import requests bs4 import beautifulsoup, tag lxml import html import requests import mysqldb import urllib2 import itertools import re import sys datetime import date, timedelta td, datetime urls =("http://www.esportsheaven.net/?page=match") hdr = {'user-agent': 'mozilla/5.0'} req = urllib2.request(urls,headers=hdr) page = urllib2.urlopen(req) soup = beautifulsoup(page) tournament=soup.findall('div',{'class':['content-body']}) match_time = soup.find_all("div", style = "width:10%; float:left;") match = soup.find_all("div", style = "width:46%; float:left; margin-left:2%; margin-right:2%") tourny = soup.find_all("div", style = "width:40%; float:left; overflow:hidden;") tag in tournament: tag in match_time: print tag.text tag1 in match: print tag1.text tag2 in tourny: print tag2.text print '==============='
i have tried few other methods , loop did not result want want is:
match_date , match , tourny
==================
and loops on of them need store data in database
your parsing code correct respect extracting elements. however, find
methods match_time, math , tourny should respect variable tournament
, not soup
. searching respect variable soup
searches entire document. searching respect tournament
searches content div interested in.
if @ pages html, there 1 div
class content-body
. so, find_all
call makes no sense. do:
tournament = soup.find('div',{'class':['content-body']})
now find match_times, match_names , tourny's
match_times = tournament.find_all("div", style = "width:10%; float:left;") match_names = tournament.find_all("div", style = "width:46%; float:left; margin-left:2%; margin-right:2%") tournys = tournament.find_all("div", style = "width:40%; float:left; overflow:hidden;")
the lengths of 3 arrays same. zip them access them follows:
for element in zip(match_times, match_names, tournys): print element[0].text, element[1].text, element[2].text
this should give looking for.
Comments
Post a Comment