recentpopularlog in

kme : sorting   38

maint: skip a check when en_US.UTF-8 collation rules are broken
On OS X, *.UTF-8 locales use ASCII collating rules(!?):
<code class="language-bash">readlink /usr/share/locale/*.UTF-8/LC_COLLATE|sort -u
# result: ../la_LN.US-ASCII/LC_COLLATE</code>

This means that sort, and any other program that relies on strcoll,
cannot be expected to work consistently on OS X in any UTF-8 locale.
sorting  collation  i18n  locale  lc_all  macos  elcapitan  brokenness 
12 weeks ago by kme
Issue 23195: Sorting with locale (strxfrm) does not work properly with Python3 on BSD or OS X - Python tracker
What is 'ln_LA' anyway?
The initial difference appears to be a long-standing BSD (including OS X) versus GNU/Linux platform difference. See, for example:
http://www.postgresql.org/message-id/18C8A481-33A6-4483-8C24-B8CE70DB7F27@eggerapps.at

Why there is no difference between en and fr UTF-8 is obvious when you look under the covers at the system locale definitions. This is on FreeBSD 10, OS X 10.10 is the same:

$ cd /usr/share/locale/fr_FR.UTF-8/
$ ls -l
total 8
lrwxr-xr-x 1 root wheel 28 Jan 16 2014 LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE
lrwxr-xr-x 1 root wheel 17 Jan 16 2014 LC_CTYPE -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 30 Jan 16 2014 LC_MESSAGES -> ../fr_FR.ISO8859-1/LC_MESSAGES
-r--r--r-- 1 root wheel 36 Jan 16 2014 LC_MONETARY
lrwxr-xr-x 1 root wheel 29 Jan 16 2014 LC_NUMERIC -> ../fr_FR.ISO8859-1/LC_NUMERIC
-r--r--r-- 1 root wheel 364 Jan 16 2014 LC_TIME

For some reason US-ASCII is used for UTF-8 collation; this is also true for en_US.UTF-8 and de_DE.UTF-8, the only other ones I checked.

The postresq discussion and some earlier Python issues suggest using ICU to properly implement Unicode functions like collation across all platforms. But that has never been implemented in Python. Nosing Marc-Andre.
python  sorting  locale  collation  strings  macos  elcapitan  brokenness 
12 weeks ago by kme
Problems with sort order (UTF8 locales don't work) · Issue #216 · PostgresApp/PostgresApp
you can see the reason for this with ls -l /usr/share/locale/de_DE.UTF-8 you see that LC_COLLATE only symlinks to la_LN.US-ASCII. You get the same if you sort something on the shell, so it's a OS specific, not PG specific problem. AFAIR this affects all BSD OS.

Some ML posts of Tom Lane in that topic:

http://www.postgresql.org/message-id/16510.1263450305@sss.pgh.pa.us
http://www.postgresql.org/message-id/22721.1264203310@sss.pgh.pa.us
http://www.postgresql.org/message-id/23053.1337036410@sss.pgh.pa.us

It seems we will have to wait for http://wiki.postgresql.org/wiki/Todo:ICU , use a different OS or use a function for sorting, that can internally use ICU form pl/perl, unaccent contrib, or your own implementation.

To me this is the biggest drawback of using PostgresApp (PostgreSQL on OSX in general) and something that should be clearly highlighted in the documentation.

@macarthy in principle, yes. You'd just need to create a new database with the collation. However, as a word of caution, UTF-8 locales seem to be fundamentally broken on OSX. Postgres uses the strcoll API, which unfortunately does not support multibyte encodings on OSX.
macos  collation  unicode  sorting  brokenness  postgresql  dba 
12 weeks ago by kme
Unicode collation table for swedish do no… - Apple Community
The problem is clearly in Mac OS X since other systems (Linux) do get the correct output. I've been told that the problem seems to be that sv_SE.UTF-8 locale's collation table is a symlink to "la
LN.US-ASCII/LCCOLLATE".
macos  elcapitan  sorting  colation  brokenness  sigh 
12 weeks ago by kme
Sorting strings properly is stupidly hard – Daniel Lemire's blog
However, I tried to test out the sorting on fr_ca locale and got the incorrect answer, which I found out was due to incorrect locale settings on Max OS X/BSD. On my machine, fr_FR.UTF-8 collation is linked to la_LN.US-ASCII
sorting  ishard  collation  strings  devel  javascript  python  pitfalls  macos  elcapitan  brokenness 
12 weeks ago by kme
macos - Add a locale in Mac OSX - Stack Overflow
Looking into this found that, as of Mac OS X 10.10.3, collation is still broken for Spanish and most European languages. Collation definitions for these locales are linked to an ASCII definition. This ends up breaking things such as ORDER BY clauses on PostgreSQL.


Also, WTF is does 'la_LN' mean anyway?
macos  elcapitan  annoyance  sorting  brokenness  collation  lc_all  dateandtime  unix  maybesolution 
12 weeks ago by kme
macos - Case-insensitive ls sorting in Mac OSX - Ask Different | https://apple.stackexchange.com/
I needed sort's '-V' (version string sort) option, and here's how I got that:
Install the GNU Coreutils package:

sudo port install coreutils
sorting  textprocessing  bsd  mac  osx  shellscripting  workaround  solution 
november 2018 by kme
sorting - Sort a Python dictionary by value - Stack Overflow
It is not possible to sort a dict, only to get a representation of a dict that is sorted. Dicts are inherently orderless, but other types, such as lists and tuples, are not. So you need a sorted representation, which will be a list—probably a list of tuples.

For instance,

import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1))
python  dict  dictionary  sorting  newbie  solution 
may 2017 by kme
HowTo/Sorting - Python Wiki
>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
arrays  python  sorting  solution 
february 2017 by kme
Trying to sort on two fields, second then first - Unix & Linux Stack Exchange


A key specification like -k2 means to take all the fields from 2 to the end of the line into account. So Villamor 44 ends up before Villamor 50. Since these two are not equal, the first comparison in sort -k2 -k1 is enough to discriminate these two lines, and the second sort key -k1 is not invoked. If the two Villamors had had the same age, -k1 would have caused them to be sorted by first name.

To sort by a single column, use -k2,2 as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.

sort -k2 -k3 <people.txt is redundant: it's equivalent to sort -k2 <people.txt. To sort by last names, then first names, then age, run

sort -k2,2 -k1,1 <people.txt

or equivalently sort -k2,2 -k1 <people.txt since there are only these three fields and the separators are the same. In fact, you will get the same effect from sort -k2,2 <people.txt, because sort uses the whole line as a last resort when all the keys in a subset of lines are identical.


The '-s' (stable sort) option for GNU sort is interesting; maybe someday I'll understand what it's for.
@manatwork That should not be necessary; if all the specified fields compare equal, sort will compare the entire line. Or with GNU sort you can use -s for stable sort. – augurar Mar 2 '15 at 19:08
unix  shell  bash  sorting  textprocessing  dammitbrain  fuckina  explained  solution 
december 2016 by kme
linux - Sort a tab delimited file based on column sort command bash - Stack Overflow
To sort on the fourth column use just the -k 4,4 selector.

sort -t $'\t' -k 4,4 <filename>

You might also want -V which sorts numbers more naturally. For example, yielding 1 2 10 rather than 1 10 2 (lexicographic order).

sort -t $'\t' -k 4,4 -V <filename>

If you're getting errors about the $'\t' then make sure your shell is bash. Perhaps you're missing #!/bin/bash at the top of your script?
bash  essential  sorting  textprocessing  dammitbrain  reference 
december 2016 by kme
how do i sort hash by key numerically [http://www.perlmonks.org/]
#!/usr/bin/perl -w
%hash=( 89=>3, 45=>2, 1 =>5, 40=>3);
foreach (sort { $a <=> $b } keys(%hash) )
{
print "key: $_ value: $hash{$_}\n"
}
perl  sorting  arrays  hashes  solution 
november 2016 by kme
Gnome Sort - The Simplest Sort Algorithm
void gnomesort(int n, int ar[]) {
int i = 0;

while (i < n) {
if (i == 0 || ar[i-1] <= ar[i]) i++;
else {int tmp = ar[i]; ar[i] = ar[i-1]; ar[--i] = tmp;}
}
}
algorithms  sorting  codegolf 
april 2014 by kme
c# - How to elegantly check if a number is within a range? - Stack Overflow
There are a lot of options:

int x = 30;
if (Enumerable.Range(1,100).Contains(x))
//true

if (x >= 1 && x <= 100)
//true



if(number > 1 && number < 100)

or

bool TestRange (int numberToCheck, int bottom, int top)
{
return (numberToCheck > bottom && numberToCheck < top);
}



As others said, use a simple if.

You should think about the ordering.

e.g

1 <= x && x <= 100

is easier to read than

x => 1 && x <= 100
sorting  csharp  programming  tipsandtricks  style 
january 2013 by kme

Copy this bookmark:





to read