Collate vs non collate

7/9/2023

The BIN2 collations were introduced in SQL Server 2005, and according to the MSDN page for Guidelines for Using BIN and BIN2 Collations: Unless you have specific backwards-compatibility requirements for your application, you should be using the BIN2 instead of BIN collations, assuming, of course, that you are wanting a binary collation in the first place. Again, was this the desired affect of making this change? Are end-users expecting this change of behavior? So, if "ch" should naturally come after "k", well, that ain't gonna happen using a binary collation.

Hence, by choosing a binary collation you are giving up language- / culture- specific weighting rules that order each character (even characters in some language, such as Hungarian, that are comprised of 2 letters) according to that culture's alphabet. A binary collation will sort based on the ASCII or UNICODE byte value (depending on VARCHAR or NVARCHAR, respectively) of each byte. Was this the desired affect of making this change? Are end-users expecting this change of behavior?Ĭollations affect not just comparisons, but also sorting. However, a few points need some additional detail, and there are several other points to add in relation to the current scenario of changing the collation of an existing DB, as opposed to setting the collation of a new DB.īinary collations are more than just case-sensitive: they are everything sensitive! So, by using a binary collation (ending in _BIN or _BIN2), your comparisons are now also accent sensitive, kana sensitive, width sensitive, and potentially gluten sensitive (at least that seems to be the trend these days -) ). Paul already provided good explanation and examples of the differences in performance and behavior between the different types of collations in his answer, so I won't repeat that here. There is very real impact to performance and functionality, and this change not only did not achieve the intended goal (at least not consistently), but just as likely altered the behavior (or will alter the behavior when new tables are created) in terms of how data is ordered and equated. Given that this is an existing database that already has tables defined in it, there are some very serious implications to the action of changing the database collation, beyond the potential performance impact to DML opertions (which actually was already there). and the SQL collation results change: ╔═════╗ Note though, if we use a Unicode literal with the SQL collation, the implicit conversion rules result in a Windows collation comparison: SELECT ES.string AS SQL Now we want to find strings greater than 'a': SELECT EB.string AS BIN Same sample data for each table: INSERT #Example_BIN You can see these definitions using sys.fn_helpcollations Examplesįour tables that are exactly the same except for the collation one binary, one case-sensitive, one case-insensitive, and one SQL case-sensitive: CREATE TABLE #Example_BIN SQL Server Sort Order 51 on Code 2 for non-Unicode Data Latin1-General, case-sensitive, accent-sensitive for Unicode Data, For example, both these are case-sensitive collations: - Latin1-General, case-sensitive, accent-sensitive If all you want is case-sensitive comparison and sorting semantics, you should choose the _CS_ (for Case Sensitive) variation of whichever base collation provides the expected behaviour for your users' language and culture. For modern use that includes Unicode data, a Windows collation is generally recommended. Nevertheless, these offer complete compatibility with Windows in SQL Server, and are regularly maintained to keep up with changes in the Unicode standard. Windows collations are the slowest, in general, because of the complex Unicode comparison and sorting rules. There are subtle traps here, so you need to have good reasons to choose a SQL collation these days (unless working on a US system, where it is still the default).

When working with Unicode data, these collations use a Windows collation instead, with the same performance characteristics. Next in raw performance terms (but only for non-Unicode strings) are the backward-compatibility SQL collations. So, although these offer the best performance (especially the pure code-point BIN2 versions) most implementations do not use them. Humans generally do not find that binary collations produce the sorting and comparison behaviours they expect. Normally, you would choose a collation first based on the comparison semantics and sorting order the consumers of the data require. Collations in SQL Server determine the rules for matching and sorting character data.

0 Comments

Collate vs non collate

Leave a Reply.

Author

Archives

Categories