Extracting content from html tags using java -

- July 15, 2013

i extracted data html page , parsed tags containing tags tried different ways extracting substring etc extract title , href tags. it'snot working..can me. small snippet of output

my code

     doc  = jsoup.connect("myurl").get();      elements link = doc.select("a[href]");     string stringlink = null;     (int = 0; < link.size(); i++)      {          stringlink = link.tostring();         system.out.println(stringlink);      }

output

<a class="link" title="waf ad" href="https://www.facebook.com/waf.ad.54"  data- jsid="anchor" target="_blank"><img class="_s0 _rw img" src="https: //fbcdn-profile-a.akamaihd.net/hprofile-ak-ash1/t5/186729_100007938933785_ 508764241_q.jpg" alt="waf ad" data-jsid="img" /></a> <a class="link" title="ana ga" href="https://www.facebook.com/ata.ga.31392410"  data-jsid="anchor" target="_blank"><img class="_s0 _rw img" src="https:// fbcdn-profile-a.akamaihd.net/hprofile-ak-ash1/t5/186901_100002334679352_ 162381693_q.jpg" alt="ana ga" data-jsid="img" /></a>

you can use attr() method of element class extract value of attributes.

for example:

string href = link.attr("href"); string title = link.attr("title");

see page more: extract attributes, text, , html elements

Search This Blog

Cap

Extracting content from html tags using java -

Comments

Post a Comment

Popular posts from this blog

Need to Replace properties of single sql file using bat file -

c# - How to get the current UAC mode -

postgresql - Lazarus + Postgres: incomplete startup packet -