Welcome!

Blog Feed Post

[eim] Alphabetical order explained in a mere 27,817 words

This is one of the most amazing examples I’ve seen of the complexity of even simple organizational schemes. “Unicode Collation Algorithm (Unicode Technical Standard #10)” spells out in precise detail how to sort strings in what we might colloquially call “alphabetical order.” But it’s way, way, way more complex than that.

Unicode is an international standard for how strings of characters get represented within computing systems. For example, in the familiar ASCII encoding, the letter “A” is represented in computers by the number 65. But ASCII is too limited to encode the world’s alphabets. Unicode does the job.

As the paper says, “Collation is the general term for the process and function of determining the sorting order of strings of characters” so that, for example, users can look them up on a list. Alphabetical order is a simple form of collation.

Sorting inconsistent alphabets is, well, a problem. But let Technical Standard #10 explain the problem:

It is important to ensure that collation meets user expectations as fully as possible. For example, in the majority of Latin languages, ø sorts as an accented variant of o, meaning that most users would expect ø alongside o. However, a few languages, such as Norwegian and Danish, sort ø as a unique element after z. Sorting “Søren” after “Sylt” in a long list, as would be expected in Norwegian or Danish, will cause problems if the user expects ø as a variant of o. A user will look for “Søren” between “Sorem” and “Soret”, not see it in the selection, and assume the string is missing, confused because it was sorted in a completely different location.

Heck, some French dictionaries even sort their accents in reverse order. (See Section 1.3.)

But that’s nothing. Here’s a fairly random paragraph from further into this magnificent document (section 7.2):

In the DUCET, characters are given tertiary weights according to Table 17. The Decomposition Type is from the Unicode Character Database [UAX44]. The Case or Kana Subtype entry refers either to a case distinction or to a specific list of characters. The weights are from MIN = 2 to MAX = 1F16, excluding 7, which is not used for historical reasons.

Or from section 8.2:

Users often find asymmetric searching to be a useful option. When doing an asymmetric search, a character (or grapheme cluster) in the query that is unmarked at the secondary and/or tertiary levels will match a character in the target that is either marked or unmarked at the same levels, but a character in the query that is marked at the secondary and/or tertiary levels will only match a character in the target that is marked in the same way.

You may think I’m being snarky. I’m not at all. This document dives resolutely into the brambles and does not give up. It incidentally exposes just how complicated even the simplest of sorting tasks is when looked at in their full context, where that context is history, language, culture, and the ambiguity in which they thrive.

Read the original blog entry...

More Stories By David Weinberger

David is the author of JOHO the blog (www.hyperorg.com/blogger). He is an independent marketing consultant and a frequent speaker at various conferences. "All I can promise is that I will be honest with you and never write something I don't believe in because someone is paying me as part of a relationship you don't know about. Put differently: All I'll hide are the irrelevancies."

Latest Stories
"We focus on SAP workloads because they are among the most powerful but somewhat challenging workloads out there to take into public cloud," explained Swen Conrad, CEO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Outscale was founded in 2010, is based in France, is a strategic partner to Dassault Systémes and has done quite a bit of work with divisions of Dassault," explained Jackie Funk, Digital Marketing exec at Outscale, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are still a relatively small software house and we are focusing on certain industries like FinTech, med tech, energy and utilities. We help our customers with their digital transformation," noted Piotr Stawinski, Founder and CEO of EARP Integration, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I think DevOps is now a rambunctious teenager – it’s starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We've been engaging with a lot of customers including Panasonic, we've been involved with Cisco and now we're working with the U.S. government - the Department of Homeland Security," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
"I'm here to leverage my secret sauce, which is using outsourced development and the company that I utilize is delaPlex Software and they've basically allowed me to win Fortune 500 companies," noted Justin Witz, CTO of FRA and PlanTools, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We provide IoT solutions. We provide the most compatible solutions for many applications. Our solutions are industry agnostic and also protocol agnostic," explained Richard Han, Head of Sales and Marketing and Engineering at Systena America, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"As we've gone out into the public cloud we've seen that over time we may have lost a few things - we've lost control, we've given up cost to a certain extent, and then security, flexibility," explained Steve Conner, VP of Sales at Cloudistics,in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...