Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Interesting Problem

Author  Topic 

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-03-08 : 18:39:11
Setup:
CREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))
INSERT #dcc(Low,High) VALUES(10,19),(20,49),(100,199),(200,219),(320,359),(400,599)
I'm looking for a query that gives the following results based on that source data:
Low	High	Number
10 19 10
20 49 20
20 49 30
20 49 40
100 199 100
200 219 200
200 219 210
320 359 320
320 359 330
320 359 340
320 359 350
400 599 400
400 599 500
This picture illustrates how the ranges need to be interpolated:

Note: the "span" value has to count the lower and upper bounds, not just their difference.

I'm having a brain fart on how to describe it in words. Basically I need to generate values in between Low and High IF the span has the same number of digits (same FLOOR(LOG10)) AND is a multiple of that FLOOR(LOG10). I also have to generate in-between values if the span does NOT have the same number of digits. I know this is vague and I apologize, hopefully the picture and the comments will suffice.

One additional challenge is to accomplish this without using string manipulation. I posted this on Twitter and will be directing some tweeps to this post. I've already gotten some possible solutions using PATINDEX/CHARINDEX/SUBSTRING. I don't mind using that but I'd prefer a pure arithmetic solution if possible.

I have a working solution but it doesn't manage the highlighted cases correctly (gray, blue, green and red in the image), it generates too many rows (intervals of 1 instead of 10, 10 instead of 100). I don't want to post it because it's even more convoluted than what I've already written, which is why I'm looking for something easier.

Thanks.

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-03-08 : 19:43:26
I must admit that I did not quite understand the requirement to "generate values in between Low and High IF the span has the same number of digits (same FLOOR(LOG10)) AND is a multiple of that FLOOR(LOG10)". So I am going by the sample results:
;WITH cte AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RN,
1 AS lvl,
*,
LOW AS Number,
POWER(10,FLOOR(LOG10(span))) AS interval
FROM
#dcc
UNION ALL
SELECT
RN,
lvl+1,
c.low,
c.high,
c.span,
Number + Interval,
Interval
FROM
cte c
WHERE
number + Interval < HIGH
)
SELECT LOW, HIGH, number FROM cte ORDER BY RN,lvl;
Go to Top of Page

HugoKornelis
SQL Server MVP

5 Posts

Posted - 2012-03-08 : 19:50:33
Hi Rob,

It took me a few tries to get it right, but I think this should work:

WITH Numbers
AS (SELECT 1 AS n
UNION ALL
SELECT n+1
FROM Numbers
WHERE n < 100),
DataWithSpanAndInterval
AS (SELECT Low,
High,
(High - Low + 1) AS Span,
POWER(10, CEILING(LOG10(High - Low + 2)) - 1)
AS Interval
FROM #dcc)
SELECT Low,
High,
Low + (n - 1) * Interval
AS Number
FROM DataWithSpanAndInterval
INNER JOIN Numbers
ON Low + (n - 1) * Interval <= High
ORDER BY Low, Number;


Notes:
1. If you have a table of numbers, use that instead of the CTE for added performance (and a shorter query).
2. The query assumes that you want intervals of 1,000 if the span exceeds 1,000, etc. I made this assumption because you say to be looking for a mathematical solution. If you only need a few defined intervals (as might be concluded from the remark "span > 100, interval = 100), you might be better off using a CASE expression instead of mathematics to determine the interval.

Hugo Kornelis, SQL Server MVP
Go to Top of Page

HugoKornelis
SQL Server MVP

5 Posts

Posted - 2012-03-08 : 19:55:19
Hmm, I see that Sunitabeck posted an even more elegant solution while I was still testing and typing. Well done!

--
Hugo Kornelis, SQL Server MVP
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-03-09 : 12:22:56
Thank you both! These are MUCH better than what I had originally.

I did however find some values in the data I'm working with that befuddle these queries somewhat. I can definitely work around it, but if you're up for an additional challenge.

Setup is the same as before with 2 new rows:
CREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))
INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999)
The results from your queries compared with what I need are shown below:



I didn't realize that the earlier sample data didn't include some of these weird ranges, I apologize. Basically the interval can change depending on the least significant digits of the span or the lower bound (see italic bold). The idea is to generate the fewest numbers needed to cover the span, while each step covers the widest interval it can and remain contiguous. (Sorry again if this isn't clear, my command of English falls apart on this problem).

The only problem with the original queries is that the intervals can cover too wide a range (10000,10100,10200, next value in sequence is 10300, which exceeds the upper bound). The software that is expecting this output will essentially "trim" any trailing zeros on the number. In other words, if it sees 1234, it assumes the range covers 12340 to 12349.

Thanks again for your help. If you figure this out make sure to hit me up for free drinks next time we meet. I'll be in Lisbon next week for SQL Saturday, then Dublin and London the weeks after.
Go to Top of Page

HugoKornelis
SQL Server MVP

5 Posts

Posted - 2012-03-09 : 14:09:48
Okay, before I dive into this - what is the output you want from this input:

INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);


--
Hugo Kornelis, SQL Server MVP
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-03-09 : 19:05:38
Rob, I didn't realize it was you who posted the question. If I had noticed, I would not have dared to pretend to answer!!

This is my attempt at your revised data. But, I cheated (sort of) - I used patindex to find the interval to be used at the start and at the end. While driving from work I was thinking about how to do that without using patindex and everything that came to mind was convoluted or used recursive cte's.

Also, I don't know what this should/would do with Hugo's examples. I think it would use the best possible interval. Anyway....
;WITH cteRawIntervals AS -- Candidate intervals for the starting and ending
(
SELECT
LOW,HIGH, span,
POWER(10,FLOOR(LOG10(span))) AS MaxInterval,
POWER(10,PATINDEX('%[^0]%', REVERSE(CAST(LOW AS VARCHAR(32))))-1) AS StartInterval,
POWER(10,PATINDEX('%[^0]%', REVERSE(CAST(HIGH+1 AS VARCHAR(32))))-1) AS EndInterval
FROM #dcc
),
cteIntervals AS -- Actual intervals at the start and end
(
SELECT
LOW,HIGH,span,MaxInterval,
CASE WHEN StartInterval > MaxInterval THEN MaxInterval ELSE StartInterval END AS StartInterval,
CASE WHEN EndInterval > MaxInterval THEN MaxInterval ELSE EndInterval END AS EndInterval
FROM cteRawIntervals
),
cteSpans AS
(
SELECT
1 AS lvl,
LOW, HIGH, StartInterval,EndInterval,MaxInterval,
LOW AS Number,
StartInterval AS CurrentInterval -- start with the StartInterval
FROM
cteIntervals
UNION ALL
SELECT
lvl+1,
c.low, c.high,c.StartInterval,c.EndInterval,c.MaxInterval,
Number + CurrentInterval,
CASE
WHEN CurrentInterval > EndInterval AND HIGH-(Number+CurrentInterval) < CurrentInterval
THEN CurrentInterval/10 -- scale down
WHEN CurrentInterval < MaxInterval AND (Number +CurrentInterval)%(CurrentInterval*10) = 0
AND HIGH < (Number+10*CurrentInterval)THEN CurrentInterval -- leave it alone
WHEN CurrentInterval < MaxInterval AND (Number +CurrentInterval)%(CurrentInterval*10) = 0
THEN CurrentInterval*10 -- scale up
ELSE CurrentInterval -- leave it alone
End
FROM
cteSpans c
WHERE
number + CurrentInterval < HIGH
)
SELECT LOW,HIGH,Number FROM cteSpans ORDER BY LOW,lvl;
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-03-09 : 22:36:26
quote:
Originally posted by HugoKornelis

Okay, before I dive into this - what is the output you want from this input:

INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);


This would be ideal:

12340
12350
12360
12370
12380
12390
12400 -- interval 100
12500 -- interval 100
12600 -- interval 100
12700 -- interval 100
12800 -- interval 100
12810
12820
12830
12840
12850
12860

14660
14670
14680
14690
14700 -- interval 100
14800 -- interval 100
14900 -- interval 100
15000 -- interval 100
15010
15020

However, using intervals of 10 for the entire sequence would be fine. These numbers are being transmitted to credit card terminals which have limited memory. For short spans of 500 or less that's fine, but some spans are 4000 to 6000 and would have to use intervals of 100 or higher wherever possible.
quote:
Originally posted by sunitabeck
Rob, I didn't realize it was you who posted the question. If I had noticed, I would not have dared to pretend to answer!!
I don't know why you'd say that, your answer was fine. Not your fault I've got flukey data that doesn't want to behave.

I don't know if this helps, but in case it does: these ranges are the leading digits of a fixed-length number, where the lower bound is padded with zeros on the right, and the upper bound it padded with nines, for example (12 digits):

12340,12869 -> 123400000000,128699999999

The terminal software will accept 1234 as a range and assume the remaining digits are zero for the lower, and nines for the upper:

123400000000,123499999999

The remaining interpolated ranges would follow the same pattern, and must prevent gaps or overlaps throughout the entire span.

Thanks again for your help and patience!
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-03-10 : 10:30:35
quote:
I don't know why you'd say that, your answer was fine. Not your fault I've got flukey data that doesn't want to behave.

Usually I see you answering questions that most others do not, and you answer them very well and precisely. So I felt honored and privileged to answer something you asked. That earlier comment I made was my weird and roundabout way of expressing that sentiment. Silly of me, I know!!
quote:
This would be ideal:

12340
12350
12360
12370
12380
12390
12400 -- interval 100
12500 -- interval 100
12600 -- interval 100
12700 -- interval 100
12800 -- interval 100
12810
12820
12830
12840
12850
12860

14660
14670
14680
14690
14700 -- interval 100
14800 -- interval 100
14900 -- interval 100
15000 -- interval 100
15010
15020
Yes! The code I posted yesterday gives exactly that sequence!!

Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2012-03-11 : 10:54:47
quote:
Originally posted by robvolk

Thank you both! These are MUCH better than what I had originally.

I did however find some values in the data I'm working with that befuddle these queries somewhat. I can definitely work around it, but if you're up for an additional challenge.

Setup is the same as before with 2 new rows:
CREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))
INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999)
The results from your queries compared with what I need are shown below:






Shouldnt the number increase in steps of 10 after 583900 till 583990 ??

After Monday and Tuesday even the calendar says W T F ....
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2012-03-11 : 11:11:14
My attempt at this one..


DROP TABLE #dcc

GO

CREATE TABLE #dcc
(
low INT NOT NULL PRIMARY KEY CHECK(low>=0),
high INT UNIQUE,
span AS high - low + 1,
CHECK (high>low)
)

INSERT #dcc
(low,
high)
VALUES(10000,
10279),
(583040,
583999);

WITH cte
AS (SELECT *,
span / Power(10, Len(10) - 1) RANGE,
Power(10, Len(10) - 1)INTERVAL
FROM #dcc),
cte1
AS (SELECT cte.low Low,
cte.high High,
( ( INTERVAL * NUMBER ) + cte.low - INTERVAL )Number,
( ( INTERVAL * NUMBER ) + cte.low - INTERVAL )%100 / 10 MOD
FROM cte
INNER JOIN MASTER..spt_values val
ON val.NUMBER <= range
WHERE TYPE = 'P'),
cte2
AS (SELECT low,
MIN(NUMBER)MN,
MAX(NUMBER)MX
FROM cte1
WHERE mod = 0
GROUP BY low)

SELECT c1.Low,
c1.High,
c1.Number
FROM cte1 c1
INNER JOIN cte2 c2
ON c1.Low = c2.Low
WHERE ( NUMBER NOT BETWEEN MN AND MX
OR MOD = 0 )
AND c2.low <= NUMBER


After Monday and Tuesday even the calendar says W T F ....
Go to Top of Page

HugoKornelis
SQL Server MVP

5 Posts

Posted - 2012-03-11 : 14:05:57
Rob,

It took me much longer than I expected - both in "CPU time" (time I spent thinking and trying) because I had several issues, and some approaches failed big time; and in "elapsed time" (clock time) because I had a weekend visit, cleaning, and garden work to do.

But here is a purely mathematical solution to the problem. I'm not sure how it scales (probably pretty bad!) It assumes that you have a table dbo.Numbers, with sufficient rows in it.

Note that I added rows to the #dcc table to test situations where the mathematical approach calls for steps of 1 or 1000, even though these are not in your sample data. Larger steps should be possible as well (but I didn't test them).

DROP TABLE #dcc;
DROP TABLE #powers;
go
CREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))
INSERT #dcc(Low,High) VALUES(10,19),(20,49),(100,199),(200,219),(320,359),(400,599)
INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999)
INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);
INSERT #dcc(Low,High) VALUES(334000,334999);
INSERT #dcc(Low,High) VALUES(1172, 1340);
go
CREATE TABLE #powers
(p int NOT NULL PRIMARY KEY,
pNext int NOT NULL);

INSERT INTO #powers
SELECT POWER(10, n-1) AS p, POWER(10, n) AS pNext
FROM dbo.Numbers
WHERE n >= 1
AND n <= CEILING(LOG10((SELECT MAX(Span+1) FROM #dcc)));

SELECT (d.Low / p.p + n - 1) * p.p AS Low,
(d.Low / p.p + n) * p.p - 1 AS High,
p.p AS Number,
d.Low, d.High, d.Span
FROM #dcc AS d
INNER JOIN #powers AS p
ON p.p <= Span
INNER JOIN Numbers AS n
ON n.n >= 1
AND (d.Low / p.p + n - 1) * p.p >= d.Low
AND (d.Low / p.p + n) * p.p - 1 <= d.High
AND(((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext < d.Low
OR ((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext + pNext - 1 > d.High)
ORDER BY 1, 2;
go


The solution posted by Sachin looks nice, but I think it needs some work. It returns incorrect results on the sample data he used, and the extra sample data I used gave me even more wrong results. But maybe some changes can make it work - I'd try if I understood what he is doing, but after studying his query for a few minutes, I gave up.

--
Hugo Kornelis, SQL Server MVP
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2012-03-11 : 23:59:48
@Hugo,

The extra result which you are talking about returned by my query actually is needed.If you look again at the screenshot posted by Rob of the expected result you can see that value of number for the last row in the resultset is 583900 and the high value is 583999.

So ideally the o/p for number should not stop at 583900(as seen in screenshot) but should increase in steps of 10 till 583990 which my query takes care of.

Here is the o/p of my query used with 2 sample data.

INSERT #dcc (low, high) VALUES(10000,10279),(583040, 583999);

Low High Number
10000 10279 10000
10000 10279 10100
10000 10279 10200
10000 10279 10210
10000 10279 10220
10000 10279 10230
10000 10279 10240
10000 10279 10250
10000 10279 10260
10000 10279 10270
583040 583999 583040
583040 583999 583050
583040 583999 583060
583040 583999 583070
583040 583999 583080
583040 583999 583090
583040 583999 583100
583040 583999 583200
583040 583999 583300
583040 583999 583400
583040 583999 583500
583040 583999 583600
583040 583999 583700
583040 583999 583800
583040 583999 583900
583040 583999 583910
583040 583999 583920
583040 583999 583930
583040 583999 583940
583040 583999 583950
583040 583999 583960
583040 583999 583970
583040 583999 583980
583040 583999 583990



INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);

Low High Number
12340 12869 12340
12340 12869 12350
12340 12869 12360
12340 12869 12370
12340 12869 12380
12340 12869 12390
12340 12869 12400
12340 12869 12500
12340 12869 12600
12340 12869 12700
12340 12869 12800
12340 12869 12810
12340 12869 12820
12340 12869 12830
12340 12869 12840
12340 12869 12850
12340 12869 12860
14660 15029 14660
14660 15029 14670
14660 15029 14680
14660 15029 14690
14660 15029 14700
14660 15029 14800
14660 15029 14900
14660 15029 15000
14660 15029 15010
14660 15029 15020


After Monday and Tuesday even the calendar says W T F ....
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2012-03-12 : 00:16:01
@Hugo

I ran my query with your sample data.Below is the resultset.Could you please point out where the o/p went wrong.Probably I can fix it.But I think it is those numbers in multiples of 2's

INSERT #dcc(Low,High) VALUES(10,19),(20,49),(100,199),(200,219),(320,359),(400,599)
INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999)
INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);
INSERT #dcc(Low,High) VALUES(334000,334999);
INSERT #dcc(Low,High) VALUES(1172, 1340);
Low	High	Number
10 19 10
100 199 100
100 199 110
100 199 120
100 199 130
100 199 140
100 199 150
100 199 160
100 199 170
100 199 180
100 199 190
200 219 200
200 219 210
400 599 400
400 599 500
400 599 510
400 599 520
400 599 530
400 599 540
400 599 550
400 599 560
400 599 570
400 599 580
400 599 590
1172 1340 1172
1172 1340 1182
1172 1340 1192
1172 1340 1202
1172 1340 1302
1172 1340 1312
1172 1340 1322
10000 10279 10000
10000 10279 10100
10000 10279 10200
10000 10279 10210
10000 10279 10220
10000 10279 10230
10000 10279 10240
10000 10279 10250
10000 10279 10260
10000 10279 10270
12340 12869 12340
12340 12869 12350
12340 12869 12360
12340 12869 12370
12340 12869 12380
12340 12869 12390
12340 12869 12400
12340 12869 12500
12340 12869 12600
12340 12869 12700
12340 12869 12800
12340 12869 12810
12340 12869 12820
12340 12869 12830
12340 12869 12840
12340 12869 12850
12340 12869 12860
14660 15029 14660
14660 15029 14670
14660 15029 14680
14660 15029 14690
14660 15029 14700
14660 15029 14800
14660 15029 14900
14660 15029 15000
14660 15029 15010
14660 15029 15020
334000 334999 334000
334000 334999 334100
334000 334999 334200
334000 334999 334300
334000 334999 334400
334000 334999 334500
334000 334999 334600
334000 334999 334700
334000 334999 334800
334000 334999 334900
334000 334999 334910
334000 334999 334920
334000 334999 334930
334000 334999 334940
334000 334999 334950
334000 334999 334960
334000 334999 334970
334000 334999 334980
334000 334999 334990
583040 583999 583040
583040 583999 583050
583040 583999 583060
583040 583999 583070
583040 583999 583080
583040 583999 583090
583040 583999 583100
583040 583999 583200
583040 583999 583300
583040 583999 583400
583040 583999 583500
583040 583999 583600
583040 583999 583700
583040 583999 583800
583040 583999 583900
583040 583999 583910
583040 583999 583920
583040 583999 583930
583040 583999 583940
583040 583999 583950
583040 583999 583960
583040 583999 583970
583040 583999 583980
583040 583999 583990


After Monday and Tuesday even the calendar says W T F ....
Go to Top of Page

HugoKornelis
SQL Server MVP

5 Posts

Posted - 2012-03-12 : 03:51:56
Hi Sachin and Rob,

Sachin, before I address your questions, a generic remark: I notice that I put some data in the wrong columns. I'm sure Rob would be able to work it out and correct it himself, but just for the sake of correctness, here is what my query should have looked like:

SELECT     d.Low, d.High,
(d.Low / p.p + n - 1) * p.p AS Number
FROM #dcc AS d
INNER JOIN #powers AS p
ON p.p <= Span
INNER JOIN Numbers AS n
ON n.n >= 1
AND (d.Low / p.p + n - 1) * p.p >= d.Low
AND (d.Low / p.p + n) * p.p - 1 <= d.High
AND(((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext < d.Low
OR ((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext + pNext - 1 > d.High)
ORDER BY 1, 2, 3;


Sachin: The differences in your results and mine are mainly due to a different interpretation. This already becomes clear in your first post in the topic, where you ask:
quote:
Shouldnt the number increase in steps of 10 after 583900 till 583990 ??

I believe the answer to that question is "no". The last range, that starts at 583900, ends at 583999. That's a range of exactly 100 numbers (the number 583900 itself is included!), and it's last number is the upper bound of the range.
But I may be wrong. We'll have to wait until Rob returns to the topic to see whose interpretation best matches his intentions.

The other difference is in handling stuff that was not explicitly mentioned by Rob - cases where (Low) and (High + 1) are not exact multiples of 10, or where the difference is large enough to allow for a step size of 1000, or even 10,000. Maybe that SHOULD not happen - but since Rob asked for an arithmetic solution, and I think that would automatically include getting steps of other powers of 10 if the input calls for them. And that's also why I added the two extra ranges: 1172-1340 (starts with a bunch of 1-step increments, and has one extra 1-step increment at the end), and 334000-334999 (one single 1000-step increment).
And for this, again, we'll have to wait until Rob returns to find out what the "proper" handling of these cases should be. (My prediction is that he'll say that he could not care less, because those cases never occur in his data anyway. )

Cheers,
Hugo

--
Hugo Kornelis, SQL Server MVP


EDIT: I forgot to include this: You asked me to point out where your results are "wrong". I won't do that, because only Rob knows if they are "wrong" or "right". Like I said, it's all a matter of interpretation of the requirements.
But I can tell you that it's easy to find where your output does not match my interpretation: simply run my query and your query, and then compare the results.
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2012-03-12 : 04:19:34
Thanks Hugo for such a concise and clear explanation.

Will work on my query to fit your queries o/p.

After Monday and Tuesday even the calendar says W T F ....
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-03-12 : 09:46:50
Yes Hugo, you have the correct interpretation. I appreciate the effort for covering all the intervals, right now the data I have will probably never have intervals smaller than 10, and will always be multiples. (Famous last words)

Sachin, there's nothing wrong with interpolating extra ranges, it adds precision. For me though, the focus is to generate the fewest necessary ranges due to memory limits on credit card readers. Some of these devices have only 1K of RAM.

I verified Hugo's and Sunita's queries and they generate the proper intervals for all my data! Thank you again for your help, I really appreciate it.

And sorry for the delayed replies, I was in Tampa for SQL Saturday #110 this weekend.
Go to Top of Page
   

- Advertisement -