asp.net - PYTHON: Submitting queries in APSX, and scraping results from aspx pages -
i want scrap info people in "http://www.ratsit.se/bc/searchperson.aspx", doing following code written:
import urllib bs4 import beautifulsoup headers = { 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'origin': 'http://www.ratsit.se', 'user-agent': 'mozilla/5.0 (windows nt 6.1) applewebkit/537.17 (khtml, gecko) chrome/24.0.1312.57 safari/537.17', 'content-type': 'application/x-www-form-urlencoded', 'referer': 'http://www.ratsit.se/', 'accept-encoding': 'gzip,deflate,sdch', 'accept-language': 'en-us,en;q=0.8', 'accept-charset': 'iso-8859-1,utf-8;q=0.7,*;q=0.3' } class myopener(urllib.fancyurlopener): version = 'mozilla/5.0 (windows nt 6.1) applewebkit/537.17 (khtml, gecko) chrome/24.0.1312.57 safari/537.17' myopener = myopener() url = 'http://www.ratsit.se/bc/searchperson.aspx' # first http request without form data f = myopener.open(url) soup = beautifulsoup(f) # parse , retrieve 2 vital form values viewstate = soup.select("#__viewstate")[0]['value'] #eventvalidation = soup.select("#__eventvalidation")[0]['value'] formdata = ( ('__lastfocus',''), ('__eventtarget',''), ('__eventargument',''), #('__eventvalidation', eventvalidation), ('__viewstate', viewstate), ('ctl00$cphmain$txtfirstname', 'name'), ('ctl00$cphmain$txtlastname', ''), ('ctl00$cphmain$txtbirthdate', ''), # etc. (not listed) ('ctl00$cphmain$txtaddress', ''), ('ctl00$cphmain$txtzipcode', ''), ('ctl00$cphmain$txtcity', ''), ('ctl00$cphmain$txtkommun',''), #('btnsearchajax','sök'), ) encodedfields = urllib.urlencode(formdata) # second http request form data f = myopener.open(url, encodedfields) try: # we'd better use beautifulsoup once again # retrieve results(instead of writing out whole html file) # besides, since result split multipages, # need send more http requests fout = open('tmp.html', 'w') except: print('could not open output file\n') fout.writelines(f.readlines()) fout.close()
i getting response server "my ip block" not true cause when i'm doing browser working... suggest going wrong..
thanks
your code not work.
file "/users/florianoswald/git/webscraper/scrape2.py", line 16 version = 'mozilla/5.0 (windows nt 6.1) applewebkit/537.17 (khtml, gecko) chrome/24.0.1312.57 safari/537.17' ^ indentationerror: expected indented block
is supposed class definition? why need myopener
class anyway? works well:
myopener = urllib.fancyurlopener() my.open("http://www.google.com") <addinfourl @ 4411860752 fp = <socket._fileobject object @ 0x106ed1c50>>
Comments
Post a Comment