Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-03-08 : 18:39:11
|
Setup:CREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))INSERT #dcc(Low,High) VALUES(10,19),(20,49),(100,199),(200,219),(320,359),(400,599) I'm looking for a query that gives the following results based on that source data:Low High Number10 19 1020 49 2020 49 3020 49 40100 199 100200 219 200200 219 210320 359 320320 359 330320 359 340320 359 350400 599 400400 599 500 This picture illustrates how the ranges need to be interpolated: Note: the "span" value has to count the lower and upper bounds, not just their difference.I'm having a brain fart on how to describe it in words. Basically I need to generate values in between Low and High IF the span has the same number of digits (same FLOOR(LOG10)) AND is a multiple of that FLOOR(LOG10). I also have to generate in-between values if the span does NOT have the same number of digits. I know this is vague and I apologize, hopefully the picture and the comments will suffice.One additional challenge is to accomplish this without using string manipulation. I posted this on Twitter and will be directing some tweeps to this post. I've already gotten some possible solutions using PATINDEX/CHARINDEX/SUBSTRING. I don't mind using that but I'd prefer a pure arithmetic solution if possible.I have a working solution but it doesn't manage the highlighted cases correctly (gray, blue, green and red in the image), it generates too many rows (intervals of 1 instead of 10, 10 instead of 100). I don't want to post it because it's even more convoluted than what I've already written, which is why I'm looking for something easier.Thanks. |
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-03-08 : 19:43:26
|
I must admit that I did not quite understand the requirement to "generate values in between Low and High IF the span has the same number of digits (same FLOOR(LOG10)) AND is a multiple of that FLOOR(LOG10)". So I am going by the sample results:;WITH cte AS( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RN, 1 AS lvl, *, LOW AS Number, POWER(10,FLOOR(LOG10(span))) AS interval FROM #dcc UNION ALL SELECT RN, lvl+1, c.low, c.high, c.span, Number + Interval, Interval FROM cte c WHERE number + Interval < HIGH)SELECT LOW, HIGH, number FROM cte ORDER BY RN,lvl; |
 |
|
HugoKornelis
SQL Server MVP
5 Posts |
Posted - 2012-03-08 : 19:50:33
|
Hi Rob,It took me a few tries to get it right, but I think this should work:WITH NumbersAS (SELECT 1 AS n UNION ALL SELECT n+1 FROM Numbers WHERE n < 100),DataWithSpanAndIntervalAS (SELECT Low, High, (High - Low + 1) AS Span, POWER(10, CEILING(LOG10(High - Low + 2)) - 1) AS Interval FROM #dcc)SELECT Low, High, Low + (n - 1) * Interval AS NumberFROM DataWithSpanAndIntervalINNER JOIN Numbers ON Low + (n - 1) * Interval <= HighORDER BY Low, Number; Notes:1. If you have a table of numbers, use that instead of the CTE for added performance (and a shorter query).2. The query assumes that you want intervals of 1,000 if the span exceeds 1,000, etc. I made this assumption because you say to be looking for a mathematical solution. If you only need a few defined intervals (as might be concluded from the remark "span > 100, interval = 100), you might be better off using a CASE expression instead of mathematics to determine the interval.Hugo Kornelis, SQL Server MVP |
 |
|
HugoKornelis
SQL Server MVP
5 Posts |
Posted - 2012-03-08 : 19:55:19
|
Hmm, I see that Sunitabeck posted an even more elegant solution while I was still testing and typing. Well done!-- Hugo Kornelis, SQL Server MVP |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-03-09 : 12:22:56
|
Thank you both! These are MUCH better than what I had originally.I did however find some values in the data I'm working with that befuddle these queries somewhat. I can definitely work around it, but if you're up for an additional challenge. Setup is the same as before with 2 new rows:CREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999) The results from your queries compared with what I need are shown below: I didn't realize that the earlier sample data didn't include some of these weird ranges, I apologize. Basically the interval can change depending on the least significant digits of the span or the lower bound (see italic bold). The idea is to generate the fewest numbers needed to cover the span, while each step covers the widest interval it can and remain contiguous. (Sorry again if this isn't clear, my command of English falls apart on this problem). The only problem with the original queries is that the intervals can cover too wide a range (10000,10100,10200, next value in sequence is 10300, which exceeds the upper bound). The software that is expecting this output will essentially "trim" any trailing zeros on the number. In other words, if it sees 1234, it assumes the range covers 12340 to 12349.Thanks again for your help. If you figure this out make sure to hit me up for free drinks next time we meet. I'll be in Lisbon next week for SQL Saturday, then Dublin and London the weeks after. |
 |
|
HugoKornelis
SQL Server MVP
5 Posts |
Posted - 2012-03-09 : 14:09:48
|
Okay, before I dive into this - what is the output you want from this input:INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029); -- Hugo Kornelis, SQL Server MVP |
 |
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-03-09 : 19:05:38
|
Rob, I didn't realize it was you who posted the question. If I had noticed, I would not have dared to pretend to answer!! This is my attempt at your revised data. But, I cheated (sort of) - I used patindex to find the interval to be used at the start and at the end. While driving from work I was thinking about how to do that without using patindex and everything that came to mind was convoluted or used recursive cte's.Also, I don't know what this should/would do with Hugo's examples. I think it would use the best possible interval. Anyway....;WITH cteRawIntervals AS -- Candidate intervals for the starting and ending( SELECT LOW,HIGH, span, POWER(10,FLOOR(LOG10(span))) AS MaxInterval, POWER(10,PATINDEX('%[^0]%', REVERSE(CAST(LOW AS VARCHAR(32))))-1) AS StartInterval, POWER(10,PATINDEX('%[^0]%', REVERSE(CAST(HIGH+1 AS VARCHAR(32))))-1) AS EndInterval FROM #dcc),cteIntervals AS -- Actual intervals at the start and end( SELECT LOW,HIGH,span,MaxInterval, CASE WHEN StartInterval > MaxInterval THEN MaxInterval ELSE StartInterval END AS StartInterval, CASE WHEN EndInterval > MaxInterval THEN MaxInterval ELSE EndInterval END AS EndInterval FROM cteRawIntervals),cteSpans AS( SELECT 1 AS lvl, LOW, HIGH, StartInterval,EndInterval,MaxInterval, LOW AS Number, StartInterval AS CurrentInterval -- start with the StartInterval FROM cteIntervals UNION ALL SELECT lvl+1, c.low, c.high,c.StartInterval,c.EndInterval,c.MaxInterval, Number + CurrentInterval, CASE WHEN CurrentInterval > EndInterval AND HIGH-(Number+CurrentInterval) < CurrentInterval THEN CurrentInterval/10 -- scale down WHEN CurrentInterval < MaxInterval AND (Number +CurrentInterval)%(CurrentInterval*10) = 0 AND HIGH < (Number+10*CurrentInterval)THEN CurrentInterval -- leave it alone WHEN CurrentInterval < MaxInterval AND (Number +CurrentInterval)%(CurrentInterval*10) = 0 THEN CurrentInterval*10 -- scale up ELSE CurrentInterval -- leave it alone End FROM cteSpans c WHERE number + CurrentInterval < HIGH)SELECT LOW,HIGH,Number FROM cteSpans ORDER BY LOW,lvl; |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-03-09 : 22:36:26
|
quote: Originally posted by HugoKornelis Okay, before I dive into this - what is the output you want from this input:INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);
This would be ideal:12340123501236012370123801239012400 -- interval 10012500 -- interval 10012600 -- interval 10012700 -- interval 10012800 -- interval 1001281012820128301284012850128601466014670146801469014700 -- interval 10014800 -- interval 10014900 -- interval 10015000 -- interval 1001501015020However, using intervals of 10 for the entire sequence would be fine. These numbers are being transmitted to credit card terminals which have limited memory. For short spans of 500 or less that's fine, but some spans are 4000 to 6000 and would have to use intervals of 100 or higher wherever possible.quote: Originally posted by sunitabeckRob, I didn't realize it was you who posted the question. If I had noticed, I would not have dared to pretend to answer!!
I don't know why you'd say that, your answer was fine. Not your fault I've got flukey data that doesn't want to behave. I don't know if this helps, but in case it does: these ranges are the leading digits of a fixed-length number, where the lower bound is padded with zeros on the right, and the upper bound it padded with nines, for example (12 digits):12340,12869 -> 123400000000,128699999999The terminal software will accept 1234 as a range and assume the remaining digits are zero for the lower, and nines for the upper:123400000000,123499999999The remaining interpolated ranges would follow the same pattern, and must prevent gaps or overlaps throughout the entire span.Thanks again for your help and patience! |
 |
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-03-10 : 10:30:35
|
quote: I don't know why you'd say that, your answer was fine. Not your fault I've got flukey data that doesn't want to behave.
Usually I see you answering questions that most others do not, and you answer them very well and precisely. So I felt honored and privileged to answer something you asked. That earlier comment I made was my weird and roundabout way of expressing that sentiment. Silly of me, I know!! quote: This would be ideal:12340123501236012370123801239012400 -- interval 10012500 -- interval 10012600 -- interval 10012700 -- interval 10012800 -- interval 1001281012820128301284012850128601466014670146801469014700 -- interval 10014800 -- interval 10014900 -- interval 10015000 -- interval 1001501015020
Yes! The code I posted yesterday gives exactly that sequence!! |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2012-03-11 : 10:54:47
|
quote: Originally posted by robvolk Thank you both! These are MUCH better than what I had originally.I did however find some values in the data I'm working with that befuddle these queries somewhat. I can definitely work around it, but if you're up for an additional challenge. Setup is the same as before with 2 new rows:CREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999) The results from your queries compared with what I need are shown below:
Shouldnt the number increase in steps of 10 after 583900 till 583990 ??After Monday and Tuesday even the calendar says W T F .... |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2012-03-11 : 11:11:14
|
My attempt at this one..DROP TABLE #dcc GO CREATE TABLE #dcc ( low INT NOT NULL PRIMARY KEY CHECK(low>=0), high INT UNIQUE, span AS high - low + 1, CHECK (high>low) ) INSERT #dcc (low, high) VALUES(10000, 10279), (583040, 583999); WITH cte AS (SELECT *, span / Power(10, Len(10) - 1) RANGE, Power(10, Len(10) - 1)INTERVAL FROM #dcc), cte1 AS (SELECT cte.low Low, cte.high High, ( ( INTERVAL * NUMBER ) + cte.low - INTERVAL )Number, ( ( INTERVAL * NUMBER ) + cte.low - INTERVAL )%100 / 10 MOD FROM cte INNER JOIN MASTER..spt_values val ON val.NUMBER <= range WHERE TYPE = 'P'), cte2 AS (SELECT low, MIN(NUMBER)MN, MAX(NUMBER)MX FROM cte1 WHERE mod = 0 GROUP BY low) SELECT c1.Low, c1.High, c1.NumberFROM cte1 c1 INNER JOIN cte2 c2 ON c1.Low = c2.Low WHERE ( NUMBER NOT BETWEEN MN AND MX OR MOD = 0 ) AND c2.low <= NUMBER After Monday and Tuesday even the calendar says W T F .... |
 |
|
HugoKornelis
SQL Server MVP
5 Posts |
Posted - 2012-03-11 : 14:05:57
|
Rob,It took me much longer than I expected - both in "CPU time" (time I spent thinking and trying) because I had several issues, and some approaches failed big time; and in "elapsed time" (clock time) because I had a weekend visit, cleaning, and garden work to do.But here is a purely mathematical solution to the problem. I'm not sure how it scales (probably pretty bad!) It assumes that you have a table dbo.Numbers, with sufficient rows in it.Note that I added rows to the #dcc table to test situations where the mathematical approach calls for steps of 1 or 1000, even though these are not in your sample data. Larger steps should be possible as well (but I didn't test them).DROP TABLE #dcc;DROP TABLE #powers;goCREATE TABLE #dcc(Low INT NOT NULL PRIMARY KEY CHECK(Low>=0), High INT UNIQUE, Span AS High-Low+1, CHECK (High>Low))INSERT #dcc(Low,High) VALUES(10,19),(20,49),(100,199),(200,219),(320,359),(400,599)INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999)INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);INSERT #dcc(Low,High) VALUES(334000,334999); INSERT #dcc(Low,High) VALUES(1172, 1340);goCREATE TABLE #powers (p int NOT NULL PRIMARY KEY, pNext int NOT NULL);INSERT INTO #powersSELECT POWER(10, n-1) AS p, POWER(10, n) AS pNextFROM dbo.NumbersWHERE n >= 1AND n <= CEILING(LOG10((SELECT MAX(Span+1) FROM #dcc)));SELECT (d.Low / p.p + n - 1) * p.p AS Low, (d.Low / p.p + n) * p.p - 1 AS High, p.p AS Number, d.Low, d.High, d.SpanFROM #dcc AS dINNER JOIN #powers AS p ON p.p <= SpanINNER JOIN Numbers AS n ON n.n >= 1 AND (d.Low / p.p + n - 1) * p.p >= d.Low AND (d.Low / p.p + n) * p.p - 1 <= d.High AND(((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext < d.Low OR ((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext + pNext - 1 > d.High)ORDER BY 1, 2;go The solution posted by Sachin looks nice, but I think it needs some work. It returns incorrect results on the sample data he used, and the extra sample data I used gave me even more wrong results. But maybe some changes can make it work - I'd try if I understood what he is doing, but after studying his query for a few minutes, I gave up. -- Hugo Kornelis, SQL Server MVP |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2012-03-11 : 23:59:48
|
@Hugo,The extra result which you are talking about returned by my query actually is needed.If you look again at the screenshot posted by Rob of the expected result you can see that value of number for the last row in the resultset is 583900 and the high value is 583999.So ideally the o/p for number should not stop at 583900(as seen in screenshot) but should increase in steps of 10 till 583990 which my query takes care of.Here is the o/p of my query used with 2 sample data.INSERT #dcc (low, high) VALUES(10000,10279),(583040, 583999); Low High Number10000 10279 1000010000 10279 1010010000 10279 1020010000 10279 1021010000 10279 1022010000 10279 1023010000 10279 1024010000 10279 1025010000 10279 1026010000 10279 10270583040 583999 583040583040 583999 583050583040 583999 583060583040 583999 583070583040 583999 583080583040 583999 583090583040 583999 583100583040 583999 583200583040 583999 583300583040 583999 583400583040 583999 583500583040 583999 583600583040 583999 583700583040 583999 583800583040 583999 583900583040 583999 583910583040 583999 583920583040 583999 583930583040 583999 583940583040 583999 583950583040 583999 583960583040 583999 583970583040 583999 583980583040 583999 583990 INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);Low High Number12340 12869 1234012340 12869 1235012340 12869 1236012340 12869 1237012340 12869 1238012340 12869 1239012340 12869 1240012340 12869 1250012340 12869 1260012340 12869 1270012340 12869 1280012340 12869 1281012340 12869 1282012340 12869 1283012340 12869 1284012340 12869 1285012340 12869 1286014660 15029 1466014660 15029 1467014660 15029 1468014660 15029 1469014660 15029 1470014660 15029 1480014660 15029 1490014660 15029 1500014660 15029 1501014660 15029 15020 After Monday and Tuesday even the calendar says W T F .... |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2012-03-12 : 00:16:01
|
@HugoI ran my query with your sample data.Below is the resultset.Could you please point out where the o/p went wrong.Probably I can fix it.But I think it is those numbers in multiples of 2'sINSERT #dcc(Low,High) VALUES(10,19),(20,49),(100,199),(200,219),(320,359),(400,599)INSERT #dcc(Low,High) VALUES(10000,10279),(583040,583999)INSERT #dcc(Low,High) VALUES(12340,12869),(14660,15029);INSERT #dcc(Low,High) VALUES(334000,334999); INSERT #dcc(Low,High) VALUES(1172, 1340);Low High Number10 19 10100 199 100100 199 110100 199 120100 199 130100 199 140100 199 150100 199 160100 199 170100 199 180100 199 190200 219 200200 219 210400 599 400400 599 500400 599 510400 599 520400 599 530400 599 540400 599 550400 599 560400 599 570400 599 580400 599 5901172 1340 11721172 1340 11821172 1340 11921172 1340 12021172 1340 13021172 1340 13121172 1340 132210000 10279 1000010000 10279 1010010000 10279 1020010000 10279 1021010000 10279 1022010000 10279 1023010000 10279 1024010000 10279 1025010000 10279 1026010000 10279 1027012340 12869 1234012340 12869 1235012340 12869 1236012340 12869 1237012340 12869 1238012340 12869 1239012340 12869 1240012340 12869 1250012340 12869 1260012340 12869 1270012340 12869 1280012340 12869 1281012340 12869 1282012340 12869 1283012340 12869 1284012340 12869 1285012340 12869 1286014660 15029 1466014660 15029 1467014660 15029 1468014660 15029 1469014660 15029 1470014660 15029 1480014660 15029 1490014660 15029 1500014660 15029 1501014660 15029 15020334000 334999 334000334000 334999 334100334000 334999 334200334000 334999 334300334000 334999 334400334000 334999 334500334000 334999 334600334000 334999 334700334000 334999 334800334000 334999 334900334000 334999 334910334000 334999 334920334000 334999 334930334000 334999 334940334000 334999 334950334000 334999 334960334000 334999 334970334000 334999 334980334000 334999 334990583040 583999 583040583040 583999 583050583040 583999 583060583040 583999 583070583040 583999 583080583040 583999 583090583040 583999 583100583040 583999 583200583040 583999 583300583040 583999 583400583040 583999 583500583040 583999 583600583040 583999 583700583040 583999 583800583040 583999 583900583040 583999 583910583040 583999 583920583040 583999 583930583040 583999 583940583040 583999 583950583040 583999 583960583040 583999 583970583040 583999 583980583040 583999 583990 After Monday and Tuesday even the calendar says W T F .... |
 |
|
HugoKornelis
SQL Server MVP
5 Posts |
Posted - 2012-03-12 : 03:51:56
|
Hi Sachin and Rob,Sachin, before I address your questions, a generic remark: I notice that I put some data in the wrong columns. I'm sure Rob would be able to work it out and correct it himself, but just for the sake of correctness, here is what my query should have looked like:SELECT d.Low, d.High, (d.Low / p.p + n - 1) * p.p AS NumberFROM #dcc AS dINNER JOIN #powers AS p ON p.p <= SpanINNER JOIN Numbers AS n ON n.n >= 1 AND (d.Low / p.p + n - 1) * p.p >= d.Low AND (d.Low / p.p + n) * p.p - 1 <= d.High AND(((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext < d.Low OR ((d.Low / p.p + n - 1) * p.p) / p.pNext * p.pNext + pNext - 1 > d.High)ORDER BY 1, 2, 3; Sachin: The differences in your results and mine are mainly due to a different interpretation. This already becomes clear in your first post in the topic, where you ask: quote: Shouldnt the number increase in steps of 10 after 583900 till 583990 ??
I believe the answer to that question is "no". The last range, that starts at 583900, ends at 583999. That's a range of exactly 100 numbers (the number 583900 itself is included!), and it's last number is the upper bound of the range.But I may be wrong. We'll have to wait until Rob returns to the topic to see whose interpretation best matches his intentions.The other difference is in handling stuff that was not explicitly mentioned by Rob - cases where (Low) and (High + 1) are not exact multiples of 10, or where the difference is large enough to allow for a step size of 1000, or even 10,000. Maybe that SHOULD not happen - but since Rob asked for an arithmetic solution, and I think that would automatically include getting steps of other powers of 10 if the input calls for them. And that's also why I added the two extra ranges: 1172-1340 (starts with a bunch of 1-step increments, and has one extra 1-step increment at the end), and 334000-334999 (one single 1000-step increment).And for this, again, we'll have to wait until Rob returns to find out what the "proper" handling of these cases should be. (My prediction is that he'll say that he could not care less, because those cases never occur in his data anyway. )Cheers,Hugo-- Hugo Kornelis, SQL Server MVPEDIT: I forgot to include this: You asked me to point out where your results are "wrong". I won't do that, because only Rob knows if they are "wrong" or "right". Like I said, it's all a matter of interpretation of the requirements.But I can tell you that it's easy to find where your output does not match my interpretation: simply run my query and your query, and then compare the results. |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2012-03-12 : 04:19:34
|
Thanks Hugo for such a concise and clear explanation.Will work on my query to fit your queries o/p.After Monday and Tuesday even the calendar says W T F .... |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-03-12 : 09:46:50
|
Yes Hugo, you have the correct interpretation. I appreciate the effort for covering all the intervals, right now the data I have will probably never have intervals smaller than 10, and will always be multiples. (Famous last words) Sachin, there's nothing wrong with interpolating extra ranges, it adds precision. For me though, the focus is to generate the fewest necessary ranges due to memory limits on credit card readers. Some of these devices have only 1K of RAM.I verified Hugo's and Sunita's queries and they generate the proper intervals for all my data! Thank you again for your help, I really appreciate it.And sorry for the delayed replies, I was in Tampa for SQL Saturday #110 this weekend. |
 |
|
|
|
|
|
|