asp.net - need to remove HTML tag from string in C# -
i having this(sample) html stored in database string
<div> test </div> <ul> <li> link1 </li> </ul>
now, contain
<link rel="canonical" href="http://sample.com/somelink">
i check if string contains link rel
tag replace href else. , if not have link rel
tag add new one.
also, when load string in cms, see if exits, extract href
string, , display somewhere on page separate stirng.
please help. have googled did not find helpful solution hence no code in question. not familiar regex.
note: sorry, forgot mention can not add external lib project because of pci implication .
you should use html agility pack, http://htmlagilitypack.codeplex.com, in combination xpath selection of elements , attributes
htmlagilitypack.htmldocument doc = new htmlagilitypack.htmldocument(); doc.loadhtml(htmlstring); foreach(htmlnode link in doc.documentelement.selectnodes("//a[@href , @rel]") { htmlattribute att = link["href"]; att.value = fixlink(att); }
explanation of xpath
//a
means select elements in code -[@href , @rel]
means both attributes needs available in selection
you can refine pattern doing //a[@href , @rel='canonical']
Comments
Post a Comment