recentpopularlog in

kme : collation   9

perl - Why CONCAT() does not default to default charset in MySQL? - Stack Overflow
This did NOT work (for me):
It probably is DBD::mysql issue/peculiarity. Try enabling utf8 in database handle as described in POD for DBD::mysql (mysql_enable_utf8 part).
perl  dbi  mysql  dba  database  query  collation  errormessage  maybesolution 
8 weeks ago by kme
maint: skip a check when en_US.UTF-8 collation rules are broken
On OS X, *.UTF-8 locales use ASCII collating rules(!?):
<code class="language-bash">readlink /usr/share/locale/*.UTF-8/LC_COLLATE|sort -u
# result: ../la_LN.US-ASCII/LC_COLLATE</code>

This means that sort, and any other program that relies on strcoll,
cannot be expected to work consistently on OS X in any UTF-8 locale.
sorting  collation  i18n  locale  lc_all  macos  elcapitan  brokenness 
12 weeks ago by kme
Issue 23195: Sorting with locale (strxfrm) does not work properly with Python3 on BSD or OS X - Python tracker
What is 'ln_LA' anyway?
The initial difference appears to be a long-standing BSD (including OS X) versus GNU/Linux platform difference. See, for example:

Why there is no difference between en and fr UTF-8 is obvious when you look under the covers at the system locale definitions. This is on FreeBSD 10, OS X 10.10 is the same:

$ cd /usr/share/locale/fr_FR.UTF-8/
$ ls -l
total 8
lrwxr-xr-x 1 root wheel 28 Jan 16 2014 LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE
lrwxr-xr-x 1 root wheel 17 Jan 16 2014 LC_CTYPE -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 30 Jan 16 2014 LC_MESSAGES -> ../fr_FR.ISO8859-1/LC_MESSAGES
-r--r--r-- 1 root wheel 36 Jan 16 2014 LC_MONETARY
lrwxr-xr-x 1 root wheel 29 Jan 16 2014 LC_NUMERIC -> ../fr_FR.ISO8859-1/LC_NUMERIC
-r--r--r-- 1 root wheel 364 Jan 16 2014 LC_TIME

For some reason US-ASCII is used for UTF-8 collation; this is also true for en_US.UTF-8 and de_DE.UTF-8, the only other ones I checked.

The postresq discussion and some earlier Python issues suggest using ICU to properly implement Unicode functions like collation across all platforms. But that has never been implemented in Python. Nosing Marc-Andre.
python  sorting  locale  collation  strings  macos  elcapitan  brokenness 
12 weeks ago by kme
Problems with sort order (UTF8 locales don't work) · Issue #216 · PostgresApp/PostgresApp
you can see the reason for this with ls -l /usr/share/locale/de_DE.UTF-8 you see that LC_COLLATE only symlinks to la_LN.US-ASCII. You get the same if you sort something on the shell, so it's a OS specific, not PG specific problem. AFAIR this affects all BSD OS.

Some ML posts of Tom Lane in that topic:

It seems we will have to wait for , use a different OS or use a function for sorting, that can internally use ICU form pl/perl, unaccent contrib, or your own implementation.

To me this is the biggest drawback of using PostgresApp (PostgreSQL on OSX in general) and something that should be clearly highlighted in the documentation.

@macarthy in principle, yes. You'd just need to create a new database with the collation. However, as a word of caution, UTF-8 locales seem to be fundamentally broken on OSX. Postgres uses the strcoll API, which unfortunately does not support multibyte encodings on OSX.
macos  collation  unicode  sorting  brokenness  postgresql  dba 
12 weeks ago by kme
Sorting strings properly is stupidly hard – Daniel Lemire's blog
However, I tried to test out the sorting on fr_ca locale and got the incorrect answer, which I found out was due to incorrect locale settings on Max OS X/BSD. On my machine, fr_FR.UTF-8 collation is linked to la_LN.US-ASCII
sorting  ishard  collation  strings  devel  javascript  python  pitfalls  macos  elcapitan  brokenness 
12 weeks ago by kme
macos - Add a locale in Mac OSX - Stack Overflow
Looking into this found that, as of Mac OS X 10.10.3, collation is still broken for Spanish and most European languages. Collation definitions for these locales are linked to an ASCII definition. This ends up breaking things such as ORDER BY clauses on PostgreSQL.

Also, WTF is does 'la_LN' mean anyway?
macos  elcapitan  annoyance  sorting  brokenness  collation  lc_all  dateandtime  unix  maybesolution 
12 weeks ago by kme
MySQL :: MySQL 5.7 Reference Manual :: 10.3.1 Collation Naming Conventions |
<code>Table 10.1 Collation Case Sensitivity Suffixes
Suffix Meaning
_ai Accent insensitive
_as Accent sensitive
_ci Case insensitive
_cs case-sensitive
_bin Binary</code>
mysql  dba  collation  casesensitivity  reference  solution 
february 2018 by kme

Copy this bookmark:

to read