Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Site Related Forums
 The Yak Corral
 Reader Challenge - Similar SSNs

Author  Topic 

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-25 : 09:50:28
Alright... puzzle time, and this is actually a useful puzzle .

I have dealt with this before, and I would like to see what others here come up with.

Assume you have a database that contains SSNs (9-digit numbers, usually store as characters though). There are more than one method in which SSNs may get input into your system. One may be data entry, one may be file import, one may be online form.

Now, becuase there can be entry error in one or more of your input paths, you need to have some method to 'QC' your data.

So now you want to find for a particular list of SSNs potential matches in another list.

requirements:

  • Find any matches up to 3 digits off (ie. 123456789 will match 133446799)

  • Find matches that may have been shifted (ie. 123456789 will match 123345678)

  • As fast as possible.



To let you know where I got to: I inherited a cursor based 'QC' procedure 2 years ago (it ran for 60hours, comparing 300k SSNs to 100k SSNs). It now is set-based, runs in 10 minutes and actually processes about twice the data.

Here are some random SSNs to work with. Feel free to add more if you would like:


Drop Table #List1
Drop Table #List2

Create Table #List1 (SSN varchar(9))
Create Table #List2 (SSN varchar(9))

Insert Into #List1
Select
Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns
Union All
Select
Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns

Insert Into #List2
Select
Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns
Union All
Select
Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns



Cheers All

Corey

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-09-25 : 20:54:38
Consider;

select top 9 id = identity(int, 1,1) into nine from sysobjects s1, sysobjects s2

select l1.SSN, l2.SSN, count(1)
from #List1 l1, #List2 l2, nine
where substring(l1.SSN,id,1) = substring(l2.SSN,id,1)
group by l1.SSN, l2.SSN
having count(1) >=3
order by 3 desc

I know that can't work for 300K rows and does not solve shifts but it brings a few questions. To you need to know what SSN's matched to what other SSN's or only that a match was made. Do you want to know all of the matches, or only the 'closest' match. For the 'shifted' SSN's do want no more than 1 shifted digit?


--Ken
I want to die in my sleep like my grandfather, not screaming in terror like his passengers.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-26 : 01:09:59
Show all matches, but order by best matches first. Shifted 1 digit would be fine...

Corey
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-09-26 : 09:25:09
a question:
don't USA SSN's have some kind of mathematical check that must be implemented in code?
i know that our version of SSN has a strict math check that must be performed when inserts are done. and that is in the app not on the server. if the check fails the SSN can't be inserted.
i would think that would be logical....

just to clear it up for me: SSN stands for social security number, right?

Go with the flow & have fun! Else fight the flow
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-26 : 10:34:48
SSN does stand for Social Security Number

Unfortunately, to my knowledge, there is no mathematical check to verify a SSN. They can be sequential, but the do not cover the entire range from 000000000-999999999. I believe they stop somewhere in the 600000000s.

Corey
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-09-26 : 10:52:28
so what's the point of the puzzle? i mean what practical purpose does it have?
error checking it this way seems a bit strange to me... but hey that's just me...

Go with the flow & have fun! Else fight the flow
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-09-26 : 15:46:01
You're the math major so correct me if I'm wrong, but even with random SSN's, there is a 1 in 10 chance of any specific digit matching, so about 1 in 3 SSN's will match. That's 9,000,000 matches without the shifting. What use is a match of 3 out of 9 digits?

--Ken
I want to die in my sleep like my grandfather, not screaming in terror like his passengers.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-26 : 20:39:04
Well 1 in 3 will match, but the question is which ones match of the lists you are working with. If you are working with lists that are only a couple hundred thousand rows, then the number of matches reduce significantly.

Practical use: Once we have determined 'matching' SSNs have a user interface that displays the information we have on both of the SSNs (first & last name, address,... etc) and a person will approve the match or not. We do some basic name filtering, but the unobvious (code wise) matches are verified by a user. We actually correct hundreds of miskeyed forms every week or two... so it seems to work.


With the sample lists generated, there should be a relative small number of matches. Our typical comparison looks at only 2 digits off, but if someone is looking for a particular person, they may increase it to 3 digits off.

Corey
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-09-26 : 20:40:34
3 digits off is not the same as 3 digits match (which is 6 digits off) as you described in the original problem. Am I still confused?

--Ken
I want to die in my sleep like my grandfather, not screaming in terror like his passengers.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-26 : 20:42:20
Oh wait!!!

I see part of the confusion... up to 3 digits off.

At least 6 of the 9 digits should match

Does that make it seem more reasonable??

Corey
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-09-26 : 20:44:15
Yeah that makes a lot more sense :)

--Ken
I want to die in my sleep like my grandfather, not screaming in terror like his passengers.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-26 : 20:50:48
Sorry, I guess i rushed too much when typing this up. I put it up, and then walked out the door to go to Duke for the weekend (visiting sister). Checked a couple of times, but I didn't see why people were confused. Now I get it

Corey
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-09-27 : 04:50:05
This only does substitutions:

SELECT DISTINCT L1.ssn, L2.ssn,
CASE WHEN L1.c1 = L2.c1 THEN 0 ELSE 1 END +
CASE WHEN L1.c2 = L2.c2 THEN 0 ELSE 1 END +
CASE WHEN L1.c3 = L2.c3 THEN 0 ELSE 1 END AS distance
FROM (
SELECT ssn, STUFF(STUFF(STUFF(ssn, n1, 1, 'x'), n2, 1, 'x'), n3, 1, 'x') AS ssne,
SUBSTRING(ssn, n1, 1) AS c1,
SUBSTRING(ssn, n2, 1) AS c2,
SUBSTRING(ssn, n3, 1) AS c3
FROM (
SELECT N1.n AS n1, N2.n AS n2, N3.n AS n3
FROM Numbers AS N1, Numbers AS N2, Numbers AS N3
WHERE N1.n BETWEEN 1 AND 9
AND N2.n BETWEEN N1.n + 1 AND 9
AND N3.n BETWEEN N2.n + 1 AND 9
) AS N
CROSS JOIN #List1
) AS L1
INNER HASH JOIN (
SELECT ssn, STUFF(STUFF(STUFF(ssn, n1, 1, 'x'), n2, 1, 'x'), n3, 1, 'x') AS ssne,
SUBSTRING(ssn, n1, 1) AS c1,
SUBSTRING(ssn, n2, 1) AS c2,
SUBSTRING(ssn, n3, 1) AS c3
FROM (
SELECT N1.n AS n1, N2.n AS n2, N3.n AS n3
FROM Numbers AS N1, Numbers AS N2, Numbers AS N3
WHERE N1.n BETWEEN 1 AND 9
AND N2.n BETWEEN N1.n + 1 AND 9
AND N3.n BETWEEN N2.n + 1 AND 9
) AS N
CROSS JOIN #List2
) AS L2
ON L1.ssne = L2.ssne
ORDER BY L1.ssn, distance, L2.ssn

Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-09-28 : 11:05:23
Did I win?
I tested this on the sizes Corey mentioned (one table with 100k rows, the other with 300k). Assuming the shifting is a separate criterion, it took my home machine about 10 minutes, with >90% of that finding the matches with substitutions.

For the shifting I used this:

SELECT DISTINCT L1.ssn, L2.ssn, n, 1
FROM #List1 AS L1
CROSS JOIN Numbers AS N
INNER MERGE JOIN #List2 AS L2
ON STUFF(L1.ssn, n, 1, '') = LEFT(L2.ssn, 8)
WHERE N.n BETWEEN 1 AND 8
UNION ALL
SELECT DISTINCT L1.ssn, L2.ssn, n, -1
FROM #List2 AS L2
CROSS JOIN Numbers AS N
INNER MERGE JOIN #List1 AS L1
ON STUFF(L2.ssn, n, 1, '') = LEFT(L1.ssn, 8)
WHERE N.n BETWEEN 1 AND 8
ORDER BY L1.ssn, L2.ssn
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-28 : 11:10:22
I definitely like your solution... I have yet to experiment with our production size data, but it looked great to me!

I hadn't ever used stuff before, so that was great to learn...

My original solution is too long to post here which is why I asked the question. I 'prepped' the columns by removing characters (similar to what you did with STUFF). I then indexed the prepped table, and join on the modified data. It runs in about 10 minutes as well, but is a lot of work to write or modify.

So I say: You Win



And I apologize to all who didn't enjoy this puzzle...

Corey
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-09-28 : 11:17:24
I had tried this (strangely familiar) idea and abandonded it;

select top 9 id = identity(int, 1,1) into nine from sysobjects s1, sysobjects s2
select distinct x.ssn,y.ssn from
(
select ssn, stuff(stuff(stuff(l1.SSN,i0,1,'x'),i1,1,'x'),i2,1,'x') SSNc
from #List1 l1, (SELECT n0.id i0, n1.id i1, n2.id i2 from nine n0, nine n1, nine n2
where n1.id > n0.id and n2.id > n1.id) z
) x,
(
select ssn, stuff(stuff(stuff(l2.SSN,i0,1,'x'),i1,1,'x'),i2,1,'x') SSNc
from #List2 l2, (SELECT n0.id i0, n1.id i1, n2.id i2 from nine n0, nine n1, nine n2
where n1.id > n0.id and n2.id > n1.id) t
) y
where x.SSNc = y.SSNc

I cancled it after 20 minutes for 3000/1000 rows. Adding INNER HASH JOIN reduced that to 3 seconds.
That's not something I ever would have thought to try. I guess I'll go read up on JOIN HINTS :)


--Ken
I want to die in my sleep like my grandfather, not screaming in terror like his passengers.
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-09-28 : 11:29:34
BTW: I think it was a good puzzle because it is a good example of real-world data comparison techniques. I was looking forward to seeing the best answer because I can actually use this :)

--Ken
I want to die in my sleep like my grandfather, not screaming in terror like his passengers.
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-09-28 : 12:20:19
Corey, I suspect using an intermediate table rather than trying to do it all in one go would give more predictable performance. As Ken pointed out, without the join hint the performance is pretty hopeless.
The reason for this is quite straightforward: the estimate of the cardinality for joining the two stuffed SSNs is roughly O(n*m), so it will default to a loop join over table scans. However, the actual result is closer to O(m+n) -- the vast majority of the comparisons will fail to yield a row.
With an intermediate table it will get a better estimate, so shouldn't need any join hints. With indexes on the stuffed SSNs, it will probably use a many-to-many merge join, which might prove a little better than the hash join.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-28 : 15:01:22
Ok....

Using the sample data (right at 10k rows for each list), this did just as well as the Join Arnold provided.

The query cost for the two compared at 4% to 96% (HardCoded/Arnold), so I think that you are right about a hardcoded version being more stable. It seems that it would become faster in relation to the first attempt as the number of records increaed.


And yes I generated a good portion of this

Drop Table #List1
Drop Table #List2

Drop table #prepped1
Drop table #prepped2
Drop table #matches1

Create Table #List1 (SSN varchar(9) Primary Key (SSN))
Create Table #List2 (SSN varchar(9) Primary Key (SSN))

Insert Into #List1
Select Distinct SSN
From
(
Select
SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns
Union All
Select
SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns
) A

Insert Into #List2
Select Distinct SSN
From
(
Select
SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns
Union All
Select
SSN = Left(Replace(Replace(Replace(Replace(Replace(Replace(Replace(newid(),'A',''),'B',''),'C',''),'D',''),'E',''),'F',''),'-',''),9)
from master.dbo.syscolumns
) A


Create Index ix_SSN1 on #List1 (SSN)
Create Index ix_SSN2 on #List2 (SSN)

Create Table #prepped1 (SSN varchar(9), A char(1), B char(1), C char(1), D char(1), E char(1), F char(1), G char(1), H char(1), I char(1) Primary Key (SSN))
Create Table #prepped2 (SSN varchar(9), A char(1), B char(1), C char(1), D char(1), E char(1), F char(1), G char(1), H char(1), I char(1) Primary Key (SSN))


Insert Into #prepped1
Select
SSN,
A = substring(SSN,1,1),
B = substring(SSN,2,1),
C = substring(SSN,3,1),
D = substring(SSN,4,1),
E = substring(SSN,5,1),
F = substring(SSN,6,1),
G = substring(SSN,7,1),
H = substring(SSN,8,1),
I = substring(SSN,9,1)
From #list1

Create index ix_a1 on #prepped1 (A)
Create index ix_b1 on #prepped1 (B)
Create index ix_c1 on #prepped1 (C)
Create index ix_d1 on #prepped1 (D)
Create index ix_e1 on #prepped1 (E)
Create index ix_f1 on #prepped1 (F)
Create index ix_g1 on #prepped1 (G)
Create index ix_h1 on #prepped1 (H)
Create index ix_i1 on #prepped1 (I)

Insert Into #prepped2
Select
SSN,
A = substring(SSN,1,1),
B = substring(SSN,2,1),
C = substring(SSN,3,1),
D = substring(SSN,4,1),
E = substring(SSN,5,1),
F = substring(SSN,6,1),
G = substring(SSN,7,1),
H = substring(SSN,8,1),
I = substring(SSN,9,1)
From #list2

Create index ix_a2 on #prepped2 (A)
Create index ix_b2 on #prepped2 (B)
Create index ix_c2 on #prepped2 (C)
Create index ix_d2 on #prepped2 (D)
Create index ix_e2 on #prepped2 (E)
Create index ix_f2 on #prepped2 (F)
Create index ix_g2 on #prepped2 (G)
Create index ix_h2 on #prepped2 (H)
Create index ix_i2 on #prepped2 (I)


Drop Table #matches1
Select
SSNA,
SSNB,
DigitsOff = 4-count(*)
Into #matches1
From
(
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.f = B.f and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.e = B.e and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.d = B.d and A.e = B.e and A.f = B.f and A.g = B.g Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.g = B.g and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.f = B.f and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.f = B.f and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.f = B.f and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.e = B.e and A.f = B.f and A.g = B.g Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.h = B.h and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.g = B.g and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.g = B.g and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.f = B.f and A.g = B.g Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.i = B.i Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.h = B.h Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.g = B.g Union All
Select SSNA = A.SSN, SSNB = B.SSN From #prepped1 A Inner Join #prepped2 B On A.a = B.a and A.b = B.b and A.c = B.c and A.d = B.d and A.e = B.e and A.f = B.f
) Z
Group By SSNA, SSNB

Select * From #matches1


Corey
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-09-28 : 17:21:10
Did I really suggest creating a temporary for both tables with 84 times the number of rows in each. Erk! Bad move.
I was probably wrong about the many-to-many merges too: hash joins seem about twice as quick here. Ho hum.

On the other hand, I realized that building one enormous hash table when I already know that there are 84 matching parts is a bit silly. So yes, I've been writing stuff that generates enormous SELECTs with the 84 parts hard coded and strung together with UNION ALL. Although mine was just working on the tables as-is, using the stuffed join (oddly, I'm getting situations where it can't build a query plan if the tables are temporary, but can if they aren't...)
Anyway, I was getting about 2 minutes for the larger pair of tables with such a monstrosity... half of it seemed to be the grouping at the end. Then I tried the original 9.5 minute one again and it had gone down to 4 minutes, so I expect SQL Server has just realized that there's 1GB RAM on this machine... sigh. Enough!

BTW, Corey I think the outside of your query's slightly wrong, it's getting some negative DigitsOff values! And I'm not convinced that any of those indexes are getting used.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-09-28 : 17:21:37
Actually... it seems it is better to insert the results of each join from the big union into a temp table one query at a time... only a few seconds faster for this set, but it actually made a 6 minute difference on my production set of data!!

My results...

-------------------------------------
09/28/2004 17:04:22 - Start
09/28/2004 17:05:10 - Created Expanded #NoCYPay
09/28/2004 17:05:52 - Indexed Expanded #NoCYPay (755137 records)
09/28/2004 17:06:10 - Created Expanded NoTCQ
09/28/2004 17:06:28 - Indexed Expanded NoTCQ (163449 records)
09/28/2004 17:06:28 - with assumption of max 3 digits off
09/28/2004 17:10:37 - Processing Complete (807044 possible matches)
09/28/2004 17:10:52 - Uploaded
09/28/2004 17:12:57 - Filter with names (3837 probable matches)
------------------------------------


Will test the single query tomorrow...

Corey
Go to Top of Page
    Next Page

- Advertisement -