Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-25 : 09:50:28
|
Alright... puzzle time, and this is actually a useful puzzle .I have dealt with this before, and I would like to see what others here come up with.Assume you have a database that contains SSNs (9-digit numbers, usually store as characters though). There are more than one method in which SSNs may get input into your system. One may be data entry, one may be file import, one may be online form.Now, becuase there can be entry error in one or more of your input paths, you need to have some method to 'QC' your data.So now you want to find for a particular list of SSNs potential matches in another list.requirements:- Find any matches up to 3 digits off (ie. 123456789 will match 133446799)
- Find matches that may have been shifted (ie. 123456789 will match 123345678)
- As fast as possible.
To let you know where I got to: I inherited a cursor based 'QC' procedure 2 years ago (it ran for 60hours, comparing 300k SSNs to 100k SSNs). It now is set-based, runs in 10 minutes and actually processes about twice the data.Here are some random SSNs to work with. Feel free to add more if you would like:Drop Table #List1Drop Table #List2Create Table #List1 (SSN varchar(9))Create Table #List2 (SSN varchar(9))Insert Into #List1Select Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumnsUnion AllSelect Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumnsInsert Into #List2Select Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumnsUnion AllSelect Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumns Cheers All Corey |
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-09-25 : 20:54:38
|
Consider;select top 9 id = identity(int, 1,1) into nine from sysobjects s1, sysobjects s2select l1.SSN, l2.SSN, count(1)from #List1 l1, #List2 l2, ninewhere substring(l1.SSN,id,1) = substring(l2.SSN,id,1) group by l1.SSN, l2.SSNhaving count(1) >=3order by 3 descI know that can't work for 300K rows and does not solve shifts but it brings a few questions. To you need to know what SSN's matched to what other SSN's or only that a match was made. Do you want to know all of the matches, or only the 'closest' match. For the 'shifted' SSN's do want no more than 1 shifted digit?--KenI want to die in my sleep like my grandfather, not screaming in terror like his passengers. |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-26 : 01:09:59
|
Show all matches, but order by best matches first. Shifted 1 digit would be fine...Corey |
 |
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2004-09-26 : 09:25:09
|
a question:don't USA SSN's have some kind of mathematical check that must be implemented in code?i know that our version of SSN has a strict math check that must be performed when inserts are done. and that is in the app not on the server. if the check fails the SSN can't be inserted.i would think that would be logical....just to clear it up for me: SSN stands for social security number, right?Go with the flow & have fun! Else fight the flow |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-26 : 10:34:48
|
SSN does stand for Social Security NumberUnfortunately, to my knowledge, there is no mathematical check to verify a SSN. They can be sequential, but the do not cover the entire range from 000000000-999999999. I believe they stop somewhere in the 600000000s.Corey |
 |
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2004-09-26 : 10:52:28
|
so what's the point of the puzzle? i mean what practical purpose does it have?error checking it this way seems a bit strange to me... but hey that's just me... Go with the flow & have fun! Else fight the flow |
 |
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-09-26 : 15:46:01
|
You're the math major so correct me if I'm wrong, but even with random SSN's, there is a 1 in 10 chance of any specific digit matching, so about 1 in 3 SSN's will match. That's 9,000,000 matches without the shifting. What use is a match of 3 out of 9 digits?--KenI want to die in my sleep like my grandfather, not screaming in terror like his passengers. |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-26 : 20:39:04
|
Well 1 in 3 will match, but the question is which ones match of the lists you are working with. If you are working with lists that are only a couple hundred thousand rows, then the number of matches reduce significantly.Practical use: Once we have determined 'matching' SSNs have a user interface that displays the information we have on both of the SSNs (first & last name, address,... etc) and a person will approve the match or not. We do some basic name filtering, but the unobvious (code wise) matches are verified by a user. We actually correct hundreds of miskeyed forms every week or two... so it seems to work. With the sample lists generated, there should be a relative small number of matches. Our typical comparison looks at only 2 digits off, but if someone is looking for a particular person, they may increase it to 3 digits off.Corey |
 |
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-09-26 : 20:40:34
|
3 digits off is not the same as 3 digits match (which is 6 digits off) as you described in the original problem. Am I still confused?--KenI want to die in my sleep like my grandfather, not screaming in terror like his passengers. |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-26 : 20:42:20
|
Oh wait!!!I see part of the confusion... up to 3 digits off.At least 6 of the 9 digits should matchDoes that make it seem more reasonable??Corey |
 |
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-09-26 : 20:44:15
|
Yeah that makes a lot more sense :)--KenI want to die in my sleep like my grandfather, not screaming in terror like his passengers. |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-26 : 20:50:48
|
Sorry, I guess i rushed too much when typing this up. I put it up, and then walked out the door to go to Duke for the weekend (visiting sister). Checked a couple of times, but I didn't see why people were confused. Now I get it Corey |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-09-27 : 04:50:05
|
This only does substitutions:SELECT DISTINCT L1.ssn, L2.ssn, CASE WHEN L1.c1 = L2.c1 THEN 0 ELSE 1 END + CASE WHEN L1.c2 = L2.c2 THEN 0 ELSE 1 END + CASE WHEN L1.c3 = L2.c3 THEN 0 ELSE 1 END AS distanceFROM ( SELECT ssn, STUFF(STUFF(STUFF(ssn, n1, 1, 'x'), n2, 1, 'x'), n3, 1, 'x') AS ssne, SUBSTRING(ssn, n1, 1) AS c1, SUBSTRING(ssn, n2, 1) AS c2, SUBSTRING(ssn, n3, 1) AS c3 FROM ( SELECT N1.n AS n1, N2.n AS n2, N3.n AS n3 FROM Numbers AS N1, Numbers AS N2, Numbers AS N3 WHERE N1.n BETWEEN 1 AND 9 AND N2.n BETWEEN N1.n + 1 AND 9 AND N3.n BETWEEN N2.n + 1 AND 9 ) AS N CROSS JOIN #List1 ) AS L1INNER HASH JOIN ( SELECT ssn, STUFF(STUFF(STUFF(ssn, n1, 1, 'x'), n2, 1, 'x'), n3, 1, 'x') AS ssne, SUBSTRING(ssn, n1, 1) AS c1, SUBSTRING(ssn, n2, 1) AS c2, SUBSTRING(ssn, n3, 1) AS c3 FROM ( SELECT N1.n AS n1, N2.n AS n2, N3.n AS n3 FROM Numbers AS N1, Numbers AS N2, Numbers AS N3 WHERE N1.n BETWEEN 1 AND 9 AND N2.n BETWEEN N1.n + 1 AND 9 AND N3.n BETWEEN N2.n + 1 AND 9 ) AS N CROSS JOIN #List2 ) AS L2ON L1.ssne = L2.ssneORDER BY L1.ssn, distance, L2.ssn |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-09-28 : 11:05:23
|
Did I win? I tested this on the sizes Corey mentioned (one table with 100k rows, the other with 300k). Assuming the shifting is a separate criterion, it took my home machine about 10 minutes, with >90% of that finding the matches with substitutions.For the shifting I used this:SELECT DISTINCT L1.ssn, L2.ssn, n, 1FROM #List1 AS L1CROSS JOIN Numbers AS NINNER MERGE JOIN #List2 AS L2 ON STUFF(L1.ssn, n, 1, '') = LEFT(L2.ssn, 8)WHERE N.n BETWEEN 1 AND 8UNION ALLSELECT DISTINCT L1.ssn, L2.ssn, n, -1FROM #List2 AS L2CROSS JOIN Numbers AS NINNER MERGE JOIN #List1 AS L1 ON STUFF(L2.ssn, n, 1, '') = LEFT(L1.ssn, 8)WHERE N.n BETWEEN 1 AND 8ORDER BY L1.ssn, L2.ssn |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-28 : 11:10:22
|
I definitely like your solution... I have yet to experiment with our production size data, but it looked great to me!I hadn't ever used stuff before, so that was great to learn...My original solution is too long to post here which is why I asked the question. I 'prepped' the columns by removing characters (similar to what you did with STUFF). I then indexed the prepped table, and join on the modified data. It runs in about 10 minutes as well, but is a lot of work to write or modify.So I say: You WinAnd I apologize to all who didn't enjoy this puzzle... Corey |
 |
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-09-28 : 11:17:24
|
I had tried this (strangely familiar) idea and abandonded it;select top 9 id = identity(int, 1,1) into nine from sysobjects s1, sysobjects s2select distinct x.ssn,y.ssn from (select ssn, stuff(stuff(stuff(l1.SSN,i0,1,'x'),i1,1,'x'),i2,1,'x') SSNcfrom #List1 l1, (SELECT n0.id i0, n1.id i1, n2.id i2 from nine n0, nine n1, nine n2where n1.id > n0.id and n2.id > n1.id) z) x,(select ssn, stuff(stuff(stuff(l2.SSN,i0,1,'x'),i1,1,'x'),i2,1,'x') SSNcfrom #List2 l2, (SELECT n0.id i0, n1.id i1, n2.id i2 from nine n0, nine n1, nine n2where n1.id > n0.id and n2.id > n1.id) t) ywhere x.SSNc = y.SSNcI cancled it after 20 minutes for 3000/1000 rows. Adding INNER HASH JOIN reduced that to 3 seconds. That's not something I ever would have thought to try. I guess I'll go read up on JOIN HINTS :)--KenI want to die in my sleep like my grandfather, not screaming in terror like his passengers. |
 |
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-09-28 : 11:29:34
|
BTW: I think it was a good puzzle because it is a good example of real-world data comparison techniques. I was looking forward to seeing the best answer because I can actually use this :)--KenI want to die in my sleep like my grandfather, not screaming in terror like his passengers. |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-09-28 : 12:20:19
|
Corey, I suspect using an intermediate table rather than trying to do it all in one go would give more predictable performance. As Ken pointed out, without the join hint the performance is pretty hopeless.The reason for this is quite straightforward: the estimate of the cardinality for joining the two stuffed SSNs is roughly O(n*m), so it will default to a loop join over table scans. However, the actual result is closer to O(m+n) -- the vast majority of the comparisons will fail to yield a row.With an intermediate table it will get a better estimate, so shouldn't need any join hints. With indexes on the stuffed SSNs, it will probably use a many-to-many merge join, which might prove a little better than the hash join. |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-28 : 15:01:22
|
Ok....Using the sample data (right at 10k rows for each list), this did just as well as the Join Arnold provided.The query cost for the two compared at 4% to 96% (HardCoded/Arnold), so I think that you are right about a hardcoded version being more stable. It seems that it would become faster in relation to the first attempt as the number of records increaed.And yes I generated a good portion of this  Drop Table #List1Drop Table #List2Drop table #prepped1Drop table #prepped2Drop table #matches1Create Table #List1 (SSN varchar(9) Primary Key (SSN))Create Table #List2 (SSN varchar(9) Primary Key (SSN))Insert Into #List1Select Distinct SSNFrom(Select SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumnsUnion AllSelect SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumns) AInsert Into #List2Select Distinct SSNFrom(Select SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumnsUnion AllSelect SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)from master.dbo.syscolumns) ACreate Index ix_SSN1 on #List1 (SSN)Create Index ix_SSN2 on #List2 (SSN)Create Table #prepped1 (SSN varchar(9), A char(1), B char(1), C char(1), D char(1), E char(1), F char(1), G char(1), H char(1), I char(1) Primary Key (SSN))Create Table #prepped2 (SSN varchar(9), A char(1), B char(1), C char(1), D char(1), E char(1), F char(1), G char(1), H char(1), I char(1) Primary Key (SSN))Insert Into #prepped1Select SSN, A = substring(SSN,1,1), B = substring(SSN,2,1), C = substring(SSN,3,1), D = substring(SSN,4,1), E = substring(SSN,5,1), F = substring(SSN,6,1), G = substring(SSN,7,1), H = substring(SSN,8,1), I = substring(SSN,9,1)From #list1Create index ix_a1 on #prepped1 (A)Create index ix_b1 on #prepped1 (B)Create index ix_c1 on #prepped1 (C)Create index ix_d1 on #prepped1 (D)Create index ix_e1 on #prepped1 (E)Create index ix_f1 on #prepped1 (F)Create index ix_g1 on #prepped1 (G)Create index ix_h1 on #prepped1 (H)Create index ix_i1 on #prepped1 (I)Insert Into #prepped2Select SSN, A = substring(SSN,1,1), B = substring(SSN,2,1), C = substring(SSN,3,1), D = substring(SSN,4,1), E = substring(SSN,5,1), F = substring(SSN,6,1), G = substring(SSN,7,1), H = substring(SSN,8,1), I = substring(SSN,9,1)From #list2Create index ix_a2 on #prepped2 (A)Create index ix_b2 on #prepped2 (B)Create index ix_c2 on #prepped2 (C)Create index ix_d2 on #prepped2 (D)Create index ix_e2 on #prepped2 (E)Create index ix_f2 on #prepped2 (F)Create index ix_g2 on #prepped2 (G)Create index ix_h2 on #prepped2 (H)Create index ix_i2 on #prepped2 (I)Drop Table #matches1Select SSNA, SSNB, DigitsOff = 4-count(*)Into #matches1From ( Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.g = B.g and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.f = B.f and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.f = B.f and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.f = B.f and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.h = B.h and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.g = B.g and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.g = B.g and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.i = B.i Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.h = B.h Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g Union All Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f ) ZGroup By SSNA, SSNBSelect * From #matches1 Corey |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-09-28 : 17:21:10
|
Did I really suggest creating a temporary for both tables with 84 times the number of rows in each. Erk! Bad move.I was probably wrong about the many-to-many merges too: hash joins seem about twice as quick here. Ho hum.On the other hand, I realized that building one enormous hash table when I already know that there are 84 matching parts is a bit silly. So yes, I've been writing stuff that generates enormous SELECTs with the 84 parts hard coded and strung together with UNION ALL. Although mine was just working on the tables as-is, using the stuffed join (oddly, I'm getting situations where it can't build a query plan if the tables are temporary, but can if they aren't...)Anyway, I was getting about 2 minutes for the larger pair of tables with such a monstrosity... half of it seemed to be the grouping at the end. Then I tried the original 9.5 minute one again and it had gone down to 4 minutes, so I expect SQL Server has just realized that there's 1GB RAM on this machine... sigh. Enough!BTW, Corey I think the outside of your query's slightly wrong, it's getting some negative DigitsOff values! And I'm not convinced that any of those indexes are getting used. |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-09-28 : 17:21:37
|
Actually... it seems it is better to insert the results of each join from the big union into a temp table one query at a time... only a few seconds faster for this set, but it actually made a 6 minute difference on my production set of data!!My results...-------------------------------------09/28/2004 17:04:22 - Start09/28/2004 17:05:10 - Created Expanded #NoCYPay09/28/2004 17:05:52 - Indexed Expanded #NoCYPay (755137 records)09/28/2004 17:06:10 - Created Expanded NoTCQ09/28/2004 17:06:28 - Indexed Expanded NoTCQ (163449 records)09/28/2004 17:06:28 - with assumption of max 3 digits off09/28/2004 17:10:37 - Processing Complete (807044 possible matches)09/28/2004 17:10:52 - Uploaded09/28/2004 17:12:57 - Filter with names (3837 probable matches)------------------------------------ Will test the single query tomorrow... Corey |
 |
|
Next Page
|
|
|
|
|