Display width of unicode strings in Python -


this question has answer here:

how can determine display width of unicode string in python 3.x, , there way use information align strings str.format()?

motivating example: printing table of strings console. of strings contain non-ascii characters.

>>> title in d.keys(): >>>     print("{:<20} | {}".format(title, d[title]))      zootehni-           | zooteh.     zootekni-           | zootek.     zoothèque          | zooth.     zooveterinar-       | zoovet.     zoovetinstitut-     | zoovetinst.     母                   | 母母  >>> s = 'è' >>> len(s)     2 >>> [ord(c) c in s]     [101, 768] >>> unicodedata.name(s[1])     'combining grave accent' >>> s2 = '母' >>> len(s2)     1 

as can seen, str.format() takes number of code-points in string (len(s)) width, leading skewed columns in output. searching through unicodedata module, have not found suggesting solution.

unicode normalization can fix problem è, not asian characters, have larger display width. similarly, zero-width unicode characters exist (e.g. zero-width space allowing line breaks within words). can't work around these issues normalization, please not suggest "normalize strings".

edit: added info normalization.

edit 2: in original dataset have european combining characters don't result in single code-point after normalization:

    zwemwater     | zwemw.     zwia̢z-       | zw.  >>> s3 = 'a\u0322'   # 'a + combining retroflex hook below' zwiaz >>> len(unicodedata.normalize('nfc', s3))     2 

you have several options:

  1. some consoles support escape sequences pixel-exact positioning of cursor. might cause overprinting, though.

    historical note: approach used in amiga terminal display images in console window printing line of text , advancing cursor down 1 pixel. leftover pixels of text line built image.

  2. create table in code contains real (pixel) widths of unicode characters in font used in console / terminal window. use ui framework , small python script generate table.

    then add code calculates real width of text using table. result might not multiple of character width in console, though. pixel-exact cursor movement, might solve issue.

    note: you'll have add special handling ligatures (fi, fl) , composites. alternatively, can load ui framework without opening window , use graphics primitives calculate string widths.

  3. use tab character (\t) indent. if shell uses real text width place cursor. many terminals count characters.

  4. create html file table , @ in browser.


Comments

Popular posts from this blog

c# - How to get the current UAC mode -

postgresql - Lazarus + Postgres: incomplete startup packet -

javascript - Ajax jqXHR.status==0 fix error -