How to find duplicated records regardless of the v

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

How to find duplicated records regardless of the v

Author

Topic

pras2007
Posting Yak Master

216 Posts

Posted - 2010-06-12 : 13:54:09

Hello All,

I have a table similar to the one shown below. The goal is to find duplicated values based on the entire records regardless what field the values are stored, excluding the ID field.

Ex:

Based on the table record 1 and 4 are duplicated, because they have the same values regardless of what field they are stored in.

ID Price1 Price2 Price3 Price4
1 $2.00 $3.00 $6.00 $8.00
2 $5.00 $7.00 $3.00 $7.00
3 $3.00 $6.00 $4.00 $1.00
4 $3.00 $8.00 $2.00 $6.00

Does anyone know how to write the query for this task?

Please advice.

Thanks

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-06-12 : 15:22:30

What datatype are the columns?

N 56°04'39.26"
E 12°55'05.63"

pras2007
Posting Yak Master

216 Posts

Posted - 2010-06-12 : 15:31:57

The datatype of the colums are char.

Thanks

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-06-12 : 15:48:58

Oh, why..?

DECLARE	@Sample TABLE
	(
		ID INT,
		Price1 VARCHAR(20),
		Price2 VARCHAR(20),
		Price3 VARCHAR(20),
		Price4 VARCHAR(20)
	)

INSERT	@Sample
	(
		ID,
		Price1,
		Price2,
		Price3,
		Price4
	)
SELECT	1, '$2.00', '$3.00', '$6.00', '$8.00' UNION ALL
SELECT	2, '$5.00', '$7.00', '$3.00', '$7.00' UNION ALL
SELECT	3, '$3.00', '$6.00', '$4.00', '$1.00' UNION ALL
SELECT	4, '$3.00', '$8.00', '$2.00', '$6.00'

-- Peso 1
;WITH cteSource(ID, thePrice, recID)
AS (
	SELECT	u.ID,
		u.thePrice,
		ROW_NUMBER() OVER (PARTITION BY u.ID ORDER BY u.thePrice) AS recID
	FROM	@Sample AS s
	UNPIVOT	(
			thePrice
			FOR theCol IN (s.Price1, s.Price2, s.Price3, s.Price4)
		) AS u
)
SELECT		ID,
		CHECKSUM_AGG(CHECKSUM(*))
FROM		cteSource
GROUP BY	ID

-- Peso 2
;WITH cteSource(ID, thePrice, recID)
AS (
	SELECT	u.ID,
		u.thePrice,
		ROW_NUMBER() OVER (PARTITION BY u.ID ORDER BY u.thePrice) AS recID
	FROM	@Sample AS s
	UNPIVOT	(
			thePrice
			FOR theCol IN (s.Price1, s.Price2, s.Price3, s.Price4)
		) AS u
)
SELECT		s1.ID
FROM		cteSource AS s1
LEFT JOIN	cteSource AS s2 ON s2.recID = s1.recID
			AND s2.thePrice = s1.thePrice
			AND s2.ID <> s1.ID
GROUP BY	s1.ID
HAVING		COUNT(s1.recID) = COUNT(s2.recID)

N 56°04'39.26"
E 12°55'05.63"

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-06-12 : 15:52:27

[code]
declare @test table(id int, P1 char(5), P2 char(5), P3 char(5), P4 char(5))
insert @test
select 1, '$2.00', '$3.00', '$6.00', '$8.00' union all
select 2, '$5.00', '$7.00', '$3.00', '$7.00' union all
select 3, '$3.00', '$6.00', '$4.00', '$1.00' union all
select 4, '$3.00', '$8.00', '$2.00', '$6.00'

select * from @test t1
where exists(
select * from @test t2
where (t1.P1=t2.P1 or t1.P1=t2.P2 or t1.P1=t2.P3 or t1.P1=t2.P4)
and (t1.P2=t2.P1 or t1.P2=t2.P2 or t1.P2=t2.P3 or t1.P2=t2.P4)
and (t1.P3=t2.P1 or t1.P3=t2.P2 or t1.P3=t2.P3 or t1.P3=t2.P4)
and (t1.P4=t2.P1 or t1.P4=t2.P2 or t1.P4=t2.P3 or t1.P4=t2.P4)
and t1.id <> t2.id
)
[/code]

No, you're never too old to Yak'n'Roll if you're too young to die.

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-06-12 : 16:18:38

Both solutions will give a wrong output if we add
SELECT 5, '$3.00', '$8.00', '$3.00', '$6.00'
to the test data.

Peter I think your approach with checksum can solve it.

No, you're never too old to Yak'n'Roll if you're too young to die.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-06-12 : 16:41:52

A small rewrite of Peso 2

-- Peso 2
;WITH cteSource(ID, thePrice, recID)
AS (
	SELECT	u.ID,
		u.thePrice,
		ROW_NUMBER() OVER (PARTITION BY u.ID ORDER BY u.thePrice) AS recID
	FROM	@Sample AS s
	UNPIVOT	(
			thePrice
			FOR theCol IN (s.Price1, s.Price2, s.Price3, s.Price4)
		) AS u
)
SELECT		s1.ID
FROM		cteSource AS s1
LEFT JOIN	cteSource AS s2 ON s2.recID = s1.recID
			AND s2.thePrice = s1.thePrice
			AND s2.ID <> s1.ID
GROUP BY	s1.ID,
		s2.ID
HAVING		COUNT(s2.ID) = 4

N 56°04'39.26"
E 12°55'05.63"

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources