Groovy - Encoding issue parsing Korean in XML -


i have xml file utf-8 encoded (both xmlspy , notepad++ show encoding). file contains korean strings display fine in both editors.

<table>     <column ss:styleid="s63" ss:autofitwidth="0" ss:width="290.25"/>     <row ss:autofitheight="0">         <cell>             <data ss:type="string">왕복</data>         </cell>     </row>     <row ss:autofitheight="0">         <cell>             <data ss:type="string">..에서</data>         </cell>     </row>     <row ss:autofitheight="0">         <cell>             <data ss:type="string">편도</data>         </cell>     </row>     <row ss:autofitheight="0">         <cell>             <data ss:type="string">기내</data>         </cell>     </row> </table> 

i'm using groovy parse xml file , write contents out new xml file.

        xmlparser parser = new xmlparser();         def inputsource = new inputsource(new filereader(file));         inputsource.setencoding('utf-8');          def workbook = parser.parse(inputsource); 

i write out new xml file, specifying utf-8 though don't think should needed.

        def finalfilewriter = new filewriter(new file(file.getname()+"_clean.xml").aswritable('utf-8'));         def printer = new xmlnodeprinter(new printwriter(finalfilewriter));         printer.preservewhitespace = true;         printer.print(workbook); 

the resulting xml file, according xmlspy, contains characters should not present in file using utf-8 , therefore replaced rubbish. displayed incorrectly in notepad++. both editors new file utf-8 encoded.

the above code works fine when operating on 3 other files of identicle structure, simplified chinese, traditional chinese , japanese. guidance @ great.

thanks

this seems work me if put input xml /tmp/input.xml:

def workbook = new xmlparser(false, false).parse( '/tmp/input.xml' )  new file( '/tmp/test.xml' ).withwriter( 'utf-8' ) { w ->     new xmlnodeprinter( new printwriter( w ) ).with { p ->         preservewhitespace = true         p.print( workbook )     } } 

i have tell parser ignore namespaces don't specify ss: namespace is.

but output in /tmp/test.xml seems right?


Comments

Popular posts from this blog

c# - How to get the current UAC mode -

postgresql - Lazarus + Postgres: incomplete startup packet -

angularjs - ng-repeat duplicating items after page reload -