recentpopularlog in

kme : decode   3

Unicode Utilities
This package has an open bug for being removed from Debian because it hard-codes the Unicode 5.1 standard in the binary (we're in the 12s now, so this is probably pre-emoji). Proposed alternative from this bug report (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930315): http://kassiopeia.juls.savba.sk/~garabik/software/unicode/
uniname defaults to printing the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. Command line options allow undesired information to be suppressed and the Unicode range to be added. Other options permit a specified number of bytes or characters to be skipped. For example, the default output for this text:

unidesc reports the character ranges to which different portions of the text belong. It can also be used to identify Unicode encodings (e.g. UTF-16be) flagged by magic numbers. Here is the output when given the above Japanese text as input:

ExplicateUTF8 is intended for debugging or for learning about Unicode. It determines and explains the validity of a sequence of bytes as a UTF8 encoding. Here is the output when given the above Japanese text as input:

Utf8lookup is a shell script which invokes uniname to provide an easy way to look up the character name corresponding to a codepoint from the command line. In addition to uniname it requires the utility Ascii2binary.

Unireverse is a filter that reverses UTF-8 strings character-by-character (as opposed to byte-by-byte). This is useful when dealing with text that is not encoded in the order in which you want to display it or analyze it. For example, if you want to display Arabic on a terminal window that does not support bidi text, Unirev will put it into the normal display order.

Unifuzz generates test input for programs that expect Unicode. It can generate a random string of characters, tokens of various potentially problematic characters and sequences, very long lines, strings with embedded nulls, and ill-formed UTF-8. Use it to find out whether your program reacts gracefully when given unexpected or ill-formed input.
unix  linux  unicode  textprocessing  decode  utility  sourcecode  commandline  fuckina  solution  revealcodes  nonprintingcharacters 
7 weeks ago by kme
Ubuntu Manpage: uniname - Name the characters in a Unicode text file
From the 'uniutils' package, apparently (via: https://unix.stackexchange.com/a/34278/278323).
uniname names the characters in a Unicode text file. For each character, uniname defaults to printing the character offset, the byte offset, the hexadecimal UTF-32 character code, the encoding as a sequence of hex byte values, the glyph, and the character's Unicode name. Command line flags allow undesired information to be suppressed. Glyphs that do not display nicely, such as control characters and spaces, are not displayed. For the Latin-1 control characters, whose official Unicode name is "control", the real name is given. Character and byte offsets both start from 0.
linux  unicode  decode  textprocessing  fuckina  solution 
7 weeks ago by kme

Copy this bookmark:





to read