Extracting content from html tags using java -
i extracted data html page , parsed tags containing tags tried different ways extracting substring etc extract title , href tags. it'snot working..can me. small snippet of output
my code
doc = jsoup.connect("myurl").get(); elements link = doc.select("a[href]"); string stringlink = null; (int = 0; < link.size(); i++) { stringlink = link.tostring(); system.out.println(stringlink); }
output
<a class="link" title="waf ad" href="https://www.facebook.com/waf.ad.54" data- jsid="anchor" target="_blank"><img class="_s0 _rw img" src="https: //fbcdn-profile-a.akamaihd.net/hprofile-ak-ash1/t5/186729_100007938933785_ 508764241_q.jpg" alt="waf ad" data-jsid="img" /></a> <a class="link" title="ana ga" href="https://www.facebook.com/ata.ga.31392410" data-jsid="anchor" target="_blank"><img class="_s0 _rw img" src="https:// fbcdn-profile-a.akamaihd.net/hprofile-ak-ash1/t5/186901_100002334679352_ 162381693_q.jpg" alt="ana ga" data-jsid="img" /></a>
you can use attr()
method of element class extract value of attributes.
for example:
string href = link.attr("href"); string title = link.attr("title");
see page more: extract attributes, text, , html elements
Comments
Post a Comment