asp.net - need to remove HTML tag from string in C# -
i having this(sample) html stored in database string
<div>    test </div> <ul>    <li>       link1    </li> </ul>   now, contain
<link rel="canonical" href="http://sample.com/somelink">   i check if string contains link rel tag replace href else. , if not have link rel tag add new one.
also, when load string in cms, see if exits, extract href string, , display somewhere on page separate stirng.
please help. have googled did not find helpful solution hence no code in question. not familiar regex.
note: sorry, forgot mention can not add external lib project because of pci implication .
you should use html agility pack, http://htmlagilitypack.codeplex.com, in combination xpath selection of elements , attributes
htmlagilitypack.htmldocument doc = new htmlagilitypack.htmldocument(); doc.loadhtml(htmlstring);  foreach(htmlnode link in doc.documentelement.selectnodes("//a[@href , @rel]") {    htmlattribute att = link["href"];    att.value = fixlink(att); }   explanation of xpath
//ameans select elements in code -[@href , @rel]means both attributes needs available in selection
you can refine pattern doing //a[@href , @rel='canonical']
Comments
Post a Comment