Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Fuzzy match

Author  Topic 

petek
Posting Yak Master

192 Posts

Posted - 2008-07-28 : 11:15:20
Hi all,

Thanks for looking, i was wondering if there was a fuzzy search option in SQL 2000 i have seen a SSIS package in SQL 2005 http://www.sqlteam.com/article/using-fuzzy-lookup-transformations-in-sql-server-integration-services could this be replicated in SQL 2000 using T-SQL if so how?

Kind Regards

Pete.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-28 : 11:16:36
You can try using DIFFERENCE & SOUNDEX functions:-

http://doc.ddart.net/mssql/sql70/setu-sus_8.htm

http://doc.ddart.net/mssql/sql70/de-dz_7.htm
Go to Top of Page

petek
Posting Yak Master

192 Posts

Posted - 2008-07-28 : 11:57:19
Hi visakh16,

Thanks for the prompt reply....I should have explained further.

I have a table which holds customer information. at the moment we are going through a process whereby we are sent information from companies about employees basically if the sit a course or have any training we need to update this new information and if there are new employees that we don’t currently see on our database need to be inserted. this information then has to be checked against the customer table. the process being any no match’s are inserted into the table as new entries any full match’s (people already in the database) if they exist then we update the records with any additional info the rest are the fuzzy searches where they might be in the database but need checking.




Kind Regards

Pete.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-07-28 : 12:02:48
SOUNDEX is for phonetic matching. It is inappropriate for text matching.
Use this instead: http://sqlblindman.googlepages.com/fuzzysearchalgorithm
It was designed to do exactly what you describe.

e4 d5 xd5 Nf6
Go to Top of Page

petek
Posting Yak Master

192 Posts

Posted - 2008-07-29 : 03:59:17
you guys are the best thanks for your support i will run away now and try and link it to my two tables, this might not be the last time you might hear from me.

Again thank you very much.

Kind Regards

Pete.
Go to Top of Page

petek
Posting Yak Master

192 Posts

Posted - 2008-07-29 : 09:32:45
hello all me again,

blindman i am having problems joing the two tables together to run the comparetext function using

select dbo.CompareText (a.Forenames, b.Forenames)
from temp1 a,
temp2 b


i have 1000 rows returned but the tables only have 100 rows in them each.



Kind Regards

Pete.
Go to Top of Page

petek
Posting Yak Master

192 Posts

Posted - 2008-07-29 : 09:50:00
Hi sorted the probelm i wasnt narrowing the search
select dbo.CompareText (a.Forenames, b.Forenames)
from temp1 a,
temp2 b
where a. NINumber = b. NINumber
and a. Dob = b. Dob


blindman how do i use the data from the temp column for example if i wanted to manipulate it further for example

If "DataReturned" >90 insert into another table and mark source table as exported

IF "DataReturned" <90 insert into another table for manual input

Kind Regards

Pete.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-07-29 : 11:11:24
This is a basic compare statement:

select dbo.CompareText (a.Forenames, b.Forenames) as MatchValue,
a.Forenames,
b.Fornames
from temp1 a,
temp2 b
where dbo.CompareText (a.Forenames, b.Forenames) > 90
order by a.Forenames,
dbo.CompareText (a.Forenames, b.Forenames) desc

In my experience, any match over 90 is virtually certain to be the same person.
Between 80 and 90 is likely a match.
Between 60 and 80 needs human confirmation.

Note that this compares every record in the first table with every record in the second table.
This can take a while.
If you can join the records on something like zip-code (or DOB, as you have done), you will speed up the query and increase the accuracy of your results.

What is NINUmber?


e4 d5 xd5 Nf6
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2008-07-30 : 03:29:51
"NINUmber"...National Insurance number.....equivalent of the US Social Security number.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-07-30 : 09:48:58
quote:
Originally posted by AndrewMurphy

"NINUmber"...National Insurance number.....equivalent of the US Social Security number.

It is also secretly implanted in a chip in their brains?

e4 d5 xd5 Nf6
Go to Top of Page

petek
Posting Yak Master

192 Posts

Posted - 2008-08-21 : 05:14:22
Hi i am running the following script:
select dbo.CompareText (a.Forenames, b.Forenames) as MatchValue,
a.Forenames,
b.Fornames
from temp1 a,
temp2 b
where dbo.CompareText (a.Forenames, b.Forenames) > 90
order by a.Forenames,
dbo.CompareText (a.Forenames, b.Forenames) desc

the hits that are greater than 90 are ok as they are unique, however the results less than 90 or between 60 and 90 return dups
how can i return the closest match only for example i have 4 hits for the same name and with hits ranging from 62 to 70 how do i return the highest match?

Kind Regards

Pete.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-21 : 05:23:00
If you want max value apply grouping using GROUP BY and take MAX(MatchValue)
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-08-21 : 14:03:01
You can do this by using a subquery and only pulling back the top value, or by using one of the new ranking functions in 2005, or by selecting the data into a new table and deleting the associations you do not want.
A lot of this depends upon whether this is going to be an automated process, or a one-time manual process, and how many errors you can tolerate (incorrect matches, or missed matches).
It is a "fuzzy search" algorithm, so by definition it is not going to be 100% accurate. In my experience, any values over 80 or 90 are almost certain to be good matches, but below that a person should verify the results.

Boycott Beijing Olympics 2008
Go to Top of Page

petek
Posting Yak Master

192 Posts

Posted - 2008-08-22 : 04:55:14
it will be an automated process.

So will i need to use create table for the filterd results?

Kind Regards

Pete.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-08-22 : 09:58:44
I'd run it in a procedure.
First store the comparetext results in a temporary table.
Then, delete all but the highest matches from the temporary table.
Then, use the remaining records in the temporary table to effect your data changes.

Or...generate your comparetext results in a Common Table Expression and use one of SQL Server 2005's ranking functions to grab the top matches.

Boycott Beijing Olympics 2008
Go to Top of Page
   

- Advertisement -