| Author |
Topic |
|
dlh
Starting Member
28 Posts |
Posted - 2010-04-23 : 16:21:50
|
| Is it possible to invent my own collation algorithm?The basic challenge is to sort strings which contain numbers intelligently. When comparing two strings, if the next character is a numeral, then the algorithm should treat it and all consecutive numerals together as an integer for purposes of order-comparison and equivalence. For example, I'd like the following sort order:'abc - 1''abc - 2''abc - 10'and not the standard binary sort order:'abc - 1''abc - 10''abc - 2'Leading zeroes would become unnecessary ('1' = '01').Is there an intelligent or easy way of doing this? Would it be a big efficiency drain? I'm hoping to ask this be grafted onto an existing application and I'd like to know how feasible the request would be. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-04-24 : 00:48:18
|
yup do likeORDER BY PARSENAME(REPLACE(field,' - ','.'),2),PARSENAME(REPLACE(field,' - ','.'),1)*1 ------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 2010-04-24 : 00:56:39
|
OrORDER BY SUBSTRING(col, 1, CHARINDEX(' - ', col)) ASC,SUBSTRING(col, CHARINDEX(' - ', col) + 1, LEN(col)) + 0 |
 |
|
|
dlh
Starting Member
28 Posts |
Posted - 2010-04-24 : 14:04:50
|
| Thank you both for the valuable hints. Unfortunately, the space-hyphen-space delimiter I used was merely an example. The actual data may have any number of formats for embedding numbers within strings, some cleanly delimited and some not. There may be more than one numeral substring within a string and there may be more than one "string schema" in use in the same column. So I'm looking for something much more general.'abc123''def - 123''abc - def - 123''ghi.123.jkl.456'etc.I know exactly how I'd do it if I were implementing my own sort in a procedural language, but is it possible to use SQL so as not to reinvent everything else? |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-04-25 : 02:18:43
|
| in above case how will the sequence be according to your rule? can you post that?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
dlh
Starting Member
28 Posts |
Posted - 2010-04-25 : 14:25:54
|
| More detailed explanation and an example:The sorting rule is: every sequence of consecutive numerals should be treated as a single character with value equal to that substring converted to an integer.So for example a string 'abc123xyz456' would be converted to 'abc#xyz#' The # doesn't represent the actual '#' character -- the first # represents the integer 123 and the second # represents 456. The string would then be sorted normally, treating these integers as single characters. (It doesn't really matter where in the alphabet sequence the integers belong, as long as it's consistent.)An example:'abcd123' --> 'abcd# (#=123)'abc123' --> 'abc#' (#=123)'def123' --> 'def#' (#=123)'def23' --> 'def#' (#=23)'123abc' --> '#abc' (#=123)Would be sorted (assuming integers come before 'a'):'123abc''abc123''abcd123''def23''def123'Thanks very much for reading. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2010-04-25 : 14:50:40
|
| how would 23 come in between 123 data?------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
jeffw8713
Aged Yak Warrior
819 Posts |
Posted - 2010-04-25 : 15:57:49
|
| How would you sort the following:'abc123ab12''abc12ab123'It seems to me that you might have an issue with how the data is actually stored. If you need to do this, that tells me each part of the string has different meanings. Is that what you are trying to overcome? |
 |
|
|
dlh
Starting Member
28 Posts |
Posted - 2010-04-26 : 00:05:41
|
quote: how would 23 come in between 123 data?
'def23' > 'abcd123' because, looking at the first characters of each, 'd' > 'a'And 'def23' < 'def123' because the first three characters of each are equal but, in the fourth slot, the integer 23 is less than the integer 123.quote: How would you sort the following:'abc123ab12''abc12ab123'
'abc12ab123' would come first because 'abc' = 'abc' and 12 < 123.quote: It seems to me that you might have an issue with how the data is actually stored. If you need to do this, that tells me each part of the string has different meanings. Is that what you are trying to overcome?
Exactly. The strings contain multiple kinds of information which, more properly, ought to be stored in separate columns. I wish I had access to the individual fields, but the application I have to use only exposes these strings.These multi-field strings have been in use for several years, during which time their format has changed several times (sometimes numbers had leading zeroes and sometimes they didn't, for example). And the strings retain the format of the time they were created. So now I've got a column of these strings that's very difficult to browse and is impossible to parse as a set. |
 |
|
|
jeffw8713
Aged Yak Warrior
819 Posts |
Posted - 2010-04-26 : 22:25:05
|
| Ouch - I don't envy you one bit, been there before and it is a nightmare. You could try using a numbers (tally) table to parse the string. The examples in this article (http://www.sqlservercentral.com/articles/T-SQL/62867/) use a delimiter to split the data, but I think you can modify the examples to split when you have a change from alpha to numeric. |
 |
|
|
|