asp.net - PYTHON: Submitting queries in APSX, and scraping results from aspx pages -


i want scrap info people in "http://www.ratsit.se/bc/searchperson.aspx", doing following code written:

import urllib bs4 import beautifulsoup  headers = { 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'origin': 'http://www.ratsit.se', 'user-agent': 'mozilla/5.0 (windows nt 6.1) applewebkit/537.17 (khtml, gecko)  chrome/24.0.1312.57 safari/537.17', 'content-type': 'application/x-www-form-urlencoded', 'referer': 'http://www.ratsit.se/', 'accept-encoding': 'gzip,deflate,sdch', 'accept-language': 'en-us,en;q=0.8', 'accept-charset': 'iso-8859-1,utf-8;q=0.7,*;q=0.3' }  class myopener(urllib.fancyurlopener): version = 'mozilla/5.0 (windows nt 6.1) applewebkit/537.17 (khtml, gecko)     chrome/24.0.1312.57 safari/537.17'  myopener = myopener() url = 'http://www.ratsit.se/bc/searchperson.aspx' # first http request without form data f = myopener.open(url) soup = beautifulsoup(f) # parse , retrieve 2 vital form values viewstate = soup.select("#__viewstate")[0]['value'] #eventvalidation = soup.select("#__eventvalidation")[0]['value']  formdata = (  ('__lastfocus',''), ('__eventtarget',''), ('__eventargument',''), #('__eventvalidation', eventvalidation), ('__viewstate', viewstate), ('ctl00$cphmain$txtfirstname', 'name'),  ('ctl00$cphmain$txtlastname', ''),   ('ctl00$cphmain$txtbirthdate', ''),                                                          # etc. (not listed) ('ctl00$cphmain$txtaddress', ''),    ('ctl00$cphmain$txtzipcode', ''),   ('ctl00$cphmain$txtcity', ''),   ('ctl00$cphmain$txtkommun',''), #('btnsearchajax','sök'), )  encodedfields = urllib.urlencode(formdata)  # second http request form data f = myopener.open(url, encodedfields)  try:  # we'd better use beautifulsoup once again  # retrieve results(instead of writing out whole html file)  # besides, since result split multipages,  # need send more http requests  fout = open('tmp.html', 'w') except:  print('could not open output file\n')  fout.writelines(f.readlines())  fout.close() 

i getting response server "my ip block" not true cause when i'm doing browser working... suggest going wrong..

thanks

your code not work.

  file "/users/florianoswald/git/webscraper/scrape2.py", line 16   version = 'mozilla/5.0 (windows nt 6.1) applewebkit/537.17 (khtml, gecko)     chrome/24.0.1312.57 safari/537.17'       ^   indentationerror: expected indented block 

is supposed class definition? why need myopener class anyway? works well:

myopener = urllib.fancyurlopener() my.open("http://www.google.com") <addinfourl @ 4411860752 fp = <socket._fileobject object @ 0x106ed1c50>> 

Comments

Popular posts from this blog

c# - How to get the current UAC mode -

postgresql - Lazarus + Postgres: incomplete startup packet -

javascript - Ajax jqXHR.status==0 fix error -