Each Unicode character is identified by a unique codepoint. You
can find information on character codepoints on official Unicode Web sites, but
a quick way to look at visual forms of characters is by generating an HTML page
with charts of Unicode characters. The script below does this:
mk_unicode_chart.py
# Create an HTML chart of Unicode characters by codepoint import sys head = 'Unicode Code Points \n' +\ '\n' +\ '\nUnicode Code Points
' foot = '' fp = sys.stdout fp.write(head) num_blocks = 32 # Up to 256 in theory, but IE5.5 is flaky for block in range(0,256*num_blocks,256): fp.write('\n\nRange %5d-%5d
' % (block,block+256)) start = unichr(block).encode('utf-16') fp.write('\n') for col in range(16): fp.write(str(col).ljust(3)) fp.write('') for offset in range(0,256,16): fp.write('\n') fp.write('+'+str(offset).rjust(3)+' ') line = ' '.join([unichr(n+block+offset) for n in range(16)]) fp.write(line.encode('UTF-8')) fp.write('') fp.write(foot) fp.close()
Exactly what you see when looking at the generated HTML page
depends on just what Web browser and OS platform the page is viewed on—as well
as on installed fonts and other factors. Generally, any character that cannot be
rendered on the current browser will appear as some sort of square, dot, or
question mark. Anything that is rendered is
generally accurate. Once a character is visually identified, further information
can be generated with the unicodedata
module:
>>> import unicodedata >>> unicodedata.name(unichr(1488)) 'HEBREW LETTER ALEF' >>> unicodedata.category(unichr(1488)) 'Lo' >>> unicodedata.bidirectional(unichr(1488)) 'R'
A variant here would be to include the information provided by
unicodedata within a generated HTML
chart, although such a listing would be far more verbose than the example
above.
No comments:
Post a Comment