asp.net - need to remove HTML tag from string in C# -


i having this(sample) html stored in database string

<div>    test </div> <ul>    <li>       link1    </li> </ul> 

now, contain

<link rel="canonical" href="http://sample.com/somelink"> 

i check if string contains link rel tag replace href else. , if not have link rel tag add new one.

also, when load string in cms, see if exits, extract href string, , display somewhere on page separate stirng.

please help. have googled did not find helpful solution hence no code in question. not familiar regex.

note: sorry, forgot mention can not add external lib project because of pci implication .

you should use html agility pack, http://htmlagilitypack.codeplex.com, in combination xpath selection of elements , attributes

htmlagilitypack.htmldocument doc = new htmlagilitypack.htmldocument(); doc.loadhtml(htmlstring);  foreach(htmlnode link in doc.documentelement.selectnodes("//a[@href , @rel]") {    htmlattribute att = link["href"];    att.value = fixlink(att); } 

explanation of xpath

  • //a means select elements in code  - [@href , @rel] means both attributes needs available in selection

you can refine pattern doing //a[@href , @rel='canonical']


Comments

Popular posts from this blog

c# - How to get the current UAC mode -

postgresql - Lazarus + Postgres: incomplete startup packet -

angularjs - ng-repeat duplicating items after page reload -