Groovy - Encoding issue parsing Korean in XML -
i have xml file utf-8 encoded (both xmlspy , notepad++ show encoding). file contains korean strings display fine in both editors.
<table> <column ss:styleid="s63" ss:autofitwidth="0" ss:width="290.25"/> <row ss:autofitheight="0"> <cell> <data ss:type="string">왕복</data> </cell> </row> <row ss:autofitheight="0"> <cell> <data ss:type="string">..에서</data> </cell> </row> <row ss:autofitheight="0"> <cell> <data ss:type="string">편도</data> </cell> </row> <row ss:autofitheight="0"> <cell> <data ss:type="string">기내</data> </cell> </row> </table>
i'm using groovy parse xml file , write contents out new xml file.
xmlparser parser = new xmlparser(); def inputsource = new inputsource(new filereader(file)); inputsource.setencoding('utf-8'); def workbook = parser.parse(inputsource);
i write out new xml file, specifying utf-8 though don't think should needed.
def finalfilewriter = new filewriter(new file(file.getname()+"_clean.xml").aswritable('utf-8')); def printer = new xmlnodeprinter(new printwriter(finalfilewriter)); printer.preservewhitespace = true; printer.print(workbook);
the resulting xml file, according xmlspy, contains characters should not present in file using utf-8 , therefore replaced rubbish. displayed incorrectly in notepad++. both editors new file utf-8 encoded.
the above code works fine when operating on 3 other files of identicle structure, simplified chinese, traditional chinese , japanese. guidance @ great.
thanks
this seems work me if put input xml /tmp/input.xml
:
def workbook = new xmlparser(false, false).parse( '/tmp/input.xml' ) new file( '/tmp/test.xml' ).withwriter( 'utf-8' ) { w -> new xmlnodeprinter( new printwriter( w ) ).with { p -> preservewhitespace = true p.print( workbook ) } }
i have tell parser ignore namespaces don't specify ss:
namespace is.
but output in /tmp/test.xml
seems right?
Comments
Post a Comment