Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 my own custom collation

Author  Topic 

dlh
Starting Member

28 Posts

Posted - 2010-04-23 : 16:21:50
Is it possible to invent my own collation algorithm?

The basic challenge is to sort strings which contain numbers intelligently. When comparing two strings, if the next character is a numeral, then the algorithm should treat it and all consecutive numerals together as an integer for purposes of order-comparison and equivalence. For example, I'd like the following sort order:

'abc - 1'
'abc - 2'
'abc - 10'

and not the standard binary sort order:

'abc - 1'
'abc - 10'
'abc - 2'

Leading zeroes would become unnecessary ('1' = '01').

Is there an intelligent or easy way of doing this? Would it be a big efficiency drain? I'm hoping to ask this be grafted onto an existing application and I'd like to know how feasible the request would be.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-04-24 : 00:48:18
yup do like

ORDER BY PARSENAME(REPLACE(field,' - ','.'),2),PARSENAME(REPLACE(field,' - ','.'),1)*1


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

ms65g
Constraint Violating Yak Guru

497 Posts

Posted - 2010-04-24 : 00:56:39
Or


ORDER BY SUBSTRING(col, 1, CHARINDEX(' - ', col)) ASC,
SUBSTRING(col, CHARINDEX(' - ', col) + 1, LEN(col)) + 0
Go to Top of Page

dlh
Starting Member

28 Posts

Posted - 2010-04-24 : 14:04:50
Thank you both for the valuable hints. Unfortunately, the space-hyphen-space delimiter I used was merely an example. The actual data may have any number of formats for embedding numbers within strings, some cleanly delimited and some not. There may be more than one numeral substring within a string and there may be more than one "string schema" in use in the same column. So I'm looking for something much more general.

'abc123'
'def - 123'
'abc - def - 123'
'ghi.123.jkl.456'
etc.

I know exactly how I'd do it if I were implementing my own sort in a procedural language, but is it possible to use SQL so as not to reinvent everything else?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-04-25 : 02:18:43
in above case how will the sequence be according to your rule? can you post that?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

dlh
Starting Member

28 Posts

Posted - 2010-04-25 : 14:25:54
More detailed explanation and an example:

The sorting rule is: every sequence of consecutive numerals should be treated as a single character with value equal to that substring converted to an integer.

So for example a string 'abc123xyz456' would be converted to 'abc#xyz#' The # doesn't represent the actual '#' character -- the first # represents the integer 123 and the second # represents 456. The string would then be sorted normally, treating these integers as single characters. (It doesn't really matter where in the alphabet sequence the integers belong, as long as it's consistent.)

An example:

'abcd123' --> 'abcd# (#=123)
'abc123' --> 'abc#' (#=123)
'def123' --> 'def#' (#=123)
'def23' --> 'def#' (#=23)
'123abc' --> '#abc' (#=123)

Would be sorted (assuming integers come before 'a'):

'123abc'
'abc123'
'abcd123'
'def23'
'def123'

Thanks very much for reading.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-04-25 : 14:50:40
how would 23 come in between 123 data?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2010-04-25 : 15:57:49
How would you sort the following:

'abc123ab12'
'abc12ab123'

It seems to me that you might have an issue with how the data is actually stored. If you need to do this, that tells me each part of the string has different meanings. Is that what you are trying to overcome?
Go to Top of Page

dlh
Starting Member

28 Posts

Posted - 2010-04-26 : 00:05:41
quote:
how would 23 come in between 123 data?


'def23' > 'abcd123' because, looking at the first characters of each, 'd' > 'a'

And 'def23' < 'def123' because the first three characters of each are equal but, in the fourth slot, the integer 23 is less than the integer 123.

quote:
How would you sort the following:
'abc123ab12'
'abc12ab123'


'abc12ab123' would come first because 'abc' = 'abc' and 12 < 123.

quote:
It seems to me that you might have an issue with how the data is actually stored. If you need to do this, that tells me each part of the string has different meanings. Is that what you are trying to overcome?


Exactly. The strings contain multiple kinds of information which, more properly, ought to be stored in separate columns. I wish I had access to the individual fields, but the application I have to use only exposes these strings.

These multi-field strings have been in use for several years, during which time their format has changed several times (sometimes numbers had leading zeroes and sometimes they didn't, for example). And the strings retain the format of the time they were created. So now I've got a column of these strings that's very difficult to browse and is impossible to parse as a set.
Go to Top of Page

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2010-04-26 : 22:25:05
Ouch - I don't envy you one bit, been there before and it is a nightmare. You could try using a numbers (tally) table to parse the string. The examples in this article (http://www.sqlservercentral.com/articles/T-SQL/62867/) use a delimiter to split the data, but I think you can modify the examples to split when you have a change from alpha to numeric.
Go to Top of Page
   

- Advertisement -