Display width of unicode strings in Python -
this question has answer here:
- normalizing unicode 2 answers
how can determine display width of unicode string in python 3.x, , there way use information align strings str.format()
?
motivating example: printing table of strings console. of strings contain non-ascii characters.
>>> title in d.keys(): >>> print("{:<20} | {}".format(title, d[title])) zootehni- | zooteh. zootekni- | zootek. zoothèque | zooth. zooveterinar- | zoovet. zoovetinstitut- | zoovetinst. 母 | 母母 >>> s = 'è' >>> len(s) 2 >>> [ord(c) c in s] [101, 768] >>> unicodedata.name(s[1]) 'combining grave accent' >>> s2 = '母' >>> len(s2) 1
as can seen, str.format()
takes number of code-points in string (len(s)
) width, leading skewed columns in output. searching through unicodedata
module, have not found suggesting solution.
unicode normalization can fix problem è, not asian characters, have larger display width. similarly, zero-width unicode characters exist (e.g. zero-width space allowing line breaks within words). can't work around these issues normalization, please not suggest "normalize strings".
edit: added info normalization.
edit 2: in original dataset have european combining characters don't result in single code-point after normalization:
zwemwater | zwemw. zwia̢z- | zw. >>> s3 = 'a\u0322' # 'a + combining retroflex hook below' zwiaz >>> len(unicodedata.normalize('nfc', s3)) 2
you have several options:
some consoles support escape sequences pixel-exact positioning of cursor. might cause overprinting, though.
historical note: approach used in amiga terminal display images in console window printing line of text , advancing cursor down 1 pixel. leftover pixels of text line built image.
create table in code contains real (pixel) widths of unicode characters in font used in console / terminal window. use ui framework , small python script generate table.
then add code calculates real width of text using table. result might not multiple of character width in console, though. pixel-exact cursor movement, might solve issue.
note: you'll have add special handling ligatures (fi, fl) , composites. alternatively, can load ui framework without opening window , use graphics primitives calculate string widths.
use tab character (
\t
) indent. if shell uses real text width place cursor. many terminals count characters.create html file table , @ in browser.
Comments
Post a Comment