recentpopularlog in

kme : utf8   25

python - UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128) - Stack Overflow |
So, even if your format string is Unicode, you *still* need to encode it?


<code class="language-python">
print(u"{}\u00a0{}\u00a0{}\u00a0".format('non', 'breaking', 'spaces').encode.('utf-8'))
This is a classic python unicode pain point! Consider the following:
a = u'bats\u00E0'
print a
=> batsà

All good so far, but if we call str(a), let's see what happens:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

Oh dip, that's not gonna do anyone any good! To fix the error, encode the bytes explicitly with .encode and tell python what codec to use:
=> 'bats\xc3\xa0'
print a.encode('utf-8')
=> batsà
python  unicode  stringconcatenation  encoding  utf8  solution  reference 
11 weeks ago by kme
windows - How to fix PuTTY showing garbled characters? - Server Fault
The analogous fix on CentOS (7) is to use 'localectl', maybe, except that requires DBus to be running, which it isn't for a fresh-out-of-the-box Docker container. ¯\_(ツ)_/¯
If the locale returns something like POSIX, issue
<code class="language-bash">update-locale LANG=en_US.utf8</code>
at the command line - see – koppor Dec 19 '15 at 11:05
docker  utf8  characterencoding  locale  mojibake  terminal  unix  shell  ubuntu  solution  centos  sortof 
june 2019 by kme
php - PCRE is compiled without UTF support - Stack Overflow |
I didn't have this problem at all, but this SO thread has the best google juice for "grep: this version of PCRE is compiled without UTF support".

How I worked around it was to unset the LANG environment variable (mine was "de_DE.UTF-8").

How I worked around it the second time I ran into this was to remove the copy of the 'libpcre' library that had been compiled without UTF-8 support from the lab drive, because we use LD_LIBRARY_PATH in the login scripts (still, for the moment).
shellscripting  pcre  grep  errormessage  utf8  unicode  workaround  solution 
january 2018 by kme
Decoding UTF-8 charset in an .EML file []
I ran into this problem saving an email to the .eml format with Thunderbird. on the Mac is an okay viewer for this kind of file, but what I really wanted was the UTF-8 source text to paste into LibreOffice.
email  encoding  utf8  importexport  conversion  maybesolution  sarcasm 
september 2016 by kme
python - UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1 - Stack Overflow
If you are working on a remote host, look at /etc/ssh/ssh_config on your local PC.

When this file contains a line:

SendEnv LANG LC_*

comment it out with adding # at the head of line. It might help.

With this line, ssh sends language related environment variables of your PC to the remote host. It causes a lot of problems.

No problems with my terminal. The above answers helped me looking in the right directions but it didn't work for me until I added 'ignore':

fix_encoding = lambda s: s.decode('utf8', 'ignore')

As indicated in the comment below, this may lead to undesired results. OTOH it also may just do the trick well enough to get things working.
locale  utf8  annoyance  headache  encoding  ssh  python  webdevel  fuckina  solution 
june 2016 by kme
inputenc Error: Unicode char u8: not set up for use with LaTeX - TeX - LaTeX Stack Exchange
In my case, it was an em-dash, and I received the error as Doxygen was generating LaTeX output from a .md file. This advice worked.
Copying the character after the hyphen and searching for it should help - unless your command window or editor "helps" by converting it to the more common equivalent or replacing unicode with blanks - in that case copy it from your .log and search all the input files.
latex  errormessage  encoding  utf8  solution 
june 2016 by kme
java - Reading File from Windows and Linux yields different results (character encoding?) - Stack Overflow
� is a sequence of three characters - 0xEF 0xBF 0xBD, and is UTF-8 representation of the Unicode codepoint 0xFFFD. The codepoint in itself is the replacement character for illegal UTF-8 sequences.
unicode  encoding  illegalcharacters  utf8  linux  woes  reference 
july 2015 by kme
java - How to replace � in a string - Stack Overflow
Based on other comments, it is most likely that the character that you are looking for is '�', that is the Unicode replacement character. This is the character that is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode".
unicode  errormessage  encoding  utf8  woes  reference 
july 2015 by kme
ls - Strange case: Text file that exist and doesn't exist - Unix & Linux Stack Exchange
Explanation: why does the high bit signify something gone wrong? The filename ‘Clon1918K_PCC1.gff’ appears to be 100% 7-bit US ASCII. Putting it through hexdump -C produces this:

Run LC_ALL=C ls -l --quoting-style=c *.gff to see a non-ambiguous representation of the file name.

Run mv -i *.gff Clon1918K_PCC1.gff to rename the file to a known name.
international  encoding  utf8  linux  woes  illegalcharacters  tipsandtricks 
july 2015 by kme
Linux Filesystem Support for Unicode Filenames
Didn't fix my problem, but was a cogent discussion of the issues at play with international characters in filenames on Linux.
linux  utf8  encoding  international 
july 2015 by kme

Copy this bookmark:

to read