join to a table via an attribute-value index

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

join to a table via an attribute-value index

Author

Topic

dlh
Starting Member

28 Posts

Posted - 2010-05-24 : 14:51:50

I'd appreciate any recommendations for how to tackle this problem.

I have the following tables:

TableA
contains Field1, Field2, Field3, ...

TableB
contains RecordID, ...

Index_TableB
contains RecordID (foreign key), IndexType, IndexValue

A_B_JoinCriteria
contains TableA_FieldName (quasi-foreign key from metadata), TableB_IndexType (foreign key)

A row in TableA corresponds to a row in TableB if and only if for all rows in A_B_JoinCriteria, there is a match between a field-value pair in TableA and an IndexType-IndexValue pair in Index_TableB.

The problem: for each record in TableB, count the number of matching records in TableA.

Complications: there may be zero to many rows in Index_TableB with any given IndexType, so we're not guaranteed existence or uniqueness of indexes. And I'm not allowed to store anything in permanent tables in the database; everything must be done on the fly.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-05-24 : 15:28:12

We are really going to need sample data and expected output to help you with this. I'm very unclear how TableA relates to table A_B_JoinCritera..

maybe this'll help:
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

dlh
Starting Member

28 Posts

Posted - 2010-05-24 : 17:01:51

I was worried my description wouldn't be clear enough... I'll try to be more explicit.

My question: for each row in TableB, I want to count the number of "matching" rows in TableA, where "match" is defined by all the rows in my join-criteria table.

Pseudocode for creating the 4 tables:

CREATE TABLE TableA
(
    RecordID INT PRIMARY KEY IDENTITY(1,1),
    Field1 varchar(200),
    Field2 varchar(200),
    Field3 varchar(200),
    ... 
)

CREATE TABLE TableB
(
    RecordID INT PRIMARY KEY IDENTITY(1,1),
    ...
)

CREATE TABLE Index_TableB
(
    RecordID INT REFERENCES TableB(RecordID),
    IndexType varchar(50),
    IndexValue varchar(200)
)

CREATE TABLE A_B_JoinCriteria
(
    RecordID INT PRIMARY KEY IDENTITY(1,1),
    TableA_FieldName varchar(50),
    TableB_IndexType varchar(50) REFERENCES Index_TableB(IndexType)
)

The one constraint on the join-criteria table that cannot be explicitly specified is TableA_FieldName is restricted to column names from TableA, e.g. "Field1" or "Field2", etc.

Here's a simple illustration of data in each of the tables. For this example I'm going to rename Field1 -> Color, Field2 -> Size, Field3 -> Shape for extra clarity; note there could be any number of these fields.

INSERT INTO TableA (Color, Size, Shape)
SELECT 'Red', 'Medium', 'Circle' UNION ALL
SELECT 'Blue', 'Medium', 'Square' UNION ALL
SELECT 'Blue', 'Medium', 'Circle'

INSERT INTO TableB (...)
SELECT ... UNION ALL
SELECT ... UNION ALL
SELECT ...
--the contents of TableB aren't important, just the primary key identity.

INSERT INTO Index_TableB (RecordID, IndexType, IndexValue)
SELECT 1, 'Color', 'Red' UNION ALL
SELECT 1, 'Color', 'Blue' UNION ALL
SELECT 1, 'Size', 'Medium' UNION ALL
SELECT 2, 'Color', 'Blue' UNION ALL
SELECT 2, 'Size', 'Small' UNION ALL
SELECT 2, 'Shape', 'Circle' UNION ALL
SELECT 3, 'Color', 'Blue' UNION ALL
SELECT 3, 'Size', 'Medium' UNION ALL
SELECT 3, 'Shape', 'Square'

INSERT INTO A_B_JoinCriteria (TableA_FieldName, TableB_IndexType)
SELECT 'Color', 'Color' UNION ALL
SELECT 'Size', 'Size'

So to join rows in TableA with rows in TableB, this example requires a 'Color' match and a 'Size' match. In particular, look at row 1 in TableB which is 'Red', 'Blue' and 'Medium'. This matches all three records in TableA. So the count for that row would be 3. And look at row 2, which is 'Blue' and 'Small'. This matches 0 rows in TableA.

For now I'm assuming there will always be rows in Index_TableB for each IndexType in A_B_JoinCriteria.

********************

REVISION: I've solved the problem. Thanks for reading.

Julien.Crawford
Starting Member

21 Posts

Posted - 2010-05-27 : 21:40:02

Well done - I've been monitoring this thread hoping to see the solution.
I was going to have a shot at this, but its been high pressure.
We use attribute tables and I'm keen to see your solution.

Any chance of posting?

Julien

dlh
Starting Member

28 Posts

Posted - 2010-05-28 : 19:17:57

Sure. I'm away from my workstation so I haven't checked the following for typos.

First imagine a static A_B_JoinCriteria, i.e. join conditions that are known in advance and can be hard-coded. Say all three possible fields in the above example ('Color', 'Shape', and 'Size') are required for the TableA-to-TableB match.

Then the following query would give the counts of TableA matches for all rows in TableB:

SELECT
    TableB.RecordID,
    COUNT(TableA.RecordID)
FROM
    TableA
    INNER JOIN Index_TableB AS b1
        ON b1.IndexValue = TableA.Color
        AND b1.RecordID = b1.RecordID
    INNER JOIN Index_TableB AS b2
        ON b2.IndexValue = TableA.Size
        AND b2.RecordID = b1.RecordID
    INNER JOIN Index_TableB AS b3
        ON b3.IndexValue = TableA.Shape
        AND b3.RecordID = b1.RecordID
WHERE
    1 = 1
    AND b1.IndexType = 'Color'
    AND b2.IndexType = 'Size'
    AND b3.IndexType = 'Shape'
GROUP BY
    TableB.RecordID

Notice that there are a couple redundancies in the above to emphasize symmetry. So the above text needs to be generated dynamically based on the contents of A_B_JoinCriteria.

DECLARE @SQL VARCHAR(MAX)

SELECT @SQL = 'SELECT
    TableB.RecordID,
    COUNT(TableA.RecordID)
FROM
    TableA'

SELECT @SQL = @SQL + '
    INNER JOIN Index_TableB AS b' + CAST(jc.RecordID AS VARCHAR) + '
        ON b' + CAST(jc.RecordID AS VARCHAR) + '.IndexValue = TableA.' + jc.TableA_FieldName + '
        AND b' + CAST(jc.RecordID AS VARCHAR) + '.RecordID = b1.RecordID'
FROM A_B_JoinCriteria AS jc

SELECT @SQL = @SQL + '
WHERE
    1 = 1'

SELECT @SQL = @SQL + '
    AND b' + CAST(jc.RecordID AS VARCHAR) + '.IndexType = ''' + jc.TableA_FieldName + ''''
FROM A_B_JoinCriteria AS jc

SELECT @SQL = @SQL + '
GROUP BY
    TableB.RecordID'

EXEC(@SQL)

An added problem: I realized that the runtime can vary greatly based on the order of rows in A_B_JoinCriteria. With my actual data, it was 10 seconds at best and a half hour at worst. For greatest efficiency, A_B_JoinCriteria needs to be sorted so that the first row has the IndexType with the smallest average number of records in TableB sharing the same IndexValue.

I hope this helps and I hope I haven't made any mistakes.

Julien.Crawford
Starting Member

21 Posts

Posted - 2010-05-31 : 19:04:56

Ah, Dynamic SQL.
I thought of that also.

I was thinking you would need to do some magic with to see if there was any non matching rows.
Or do a count(*) on the number of matching rows.
I'm not being very clear here - sorry.

I would love to have a go at this, I just don't have the time :-(

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources