![]() |
SQL for Threaded Discussion ForumsBy Bill Graziano on 4 February 2001 | Tags: Application Design One of questions we get on a regular basis involves threaded discussion forums. Everyone wants to know an easy way to do this. Everyone is also trying to do recursive SQL or self joins to make this work. That's hard. Here's an easy way to do this. (UPDATE: Fixed a problem and thought I'd repost for everyone to see the changes)
UPDATE (2/4/01): Carlos discovered a little error with the stored procedure that would allow posts to get out of order. I added the two lines in red to the stored procedure below.
When you first look at this problem your first thought IS to use some type of self join or a recursive stored procedure into a temp table. Those are hard to write and hard to test. As you should know by now, I'm lazy and I wanted an easier way. One of my fundamental principles of database design is this: If you can't get data in or out easily, you're not storing it right. Let start with a table that looks like this: CREATE TABLE [dbo].[Posts] ( [PostID] [int] IDENTITY (1, 1) NOT NULL , [ThreadID] [int] NOT NULL , [ParentID] [int] NOT NULL , [PostLevel] [smallint] NOT NULL , [SortOrder] [int] NOT NULL , [Subject] [char] (20) NOT NULL , [PostDate] [datetime] NOT NULL ) ON [PRIMARY] GO PostID is my single field Primary Key for this table. I'm a huge fan of a single field primary key. It makes joins and subqueries so much easier. ThreadID is the field I'm going to use to keep all the posts in a thread together. The ThreadID is going to be the PostID of the first post in a thread (which is a post with a PostLevel = 1 ). ParentID is the PostID of the parent of this post. For the first post in a thread, ParentID will be set to the PostID. The first post in a thread will have the PostLevel equal 1. It's child posts are PostLevel = 2 and so on. Subject is the user entered subject and PostDate is the date and time the post was put into the database. I don't have a field for the actual post in this example. I wanted to keep this simple. You can easily add a varchar or text field to hold the users post. SortOrder is the key to this whole crazy scheme. That's the field I'm going to use to order the posts. My trick is to always keep the data sorted just like I want it in the table. That means my queries need to do as little work as possible. Let's take a look at a SELECT statment to display a threaded discussion: SELECT Subject = convert(varchar, SPACE(2 * (PostLevel - 1) ) + Subject), PostID, ThreadID, ParentID, PostLevel, SortOrder, PostDate = convert(varchar(19), PostDate, 120) FROM Posts ORDER BY ThreadID, SortOrder and the output it generates: Subject PostID ThreadID ParentID PostLevel SortOrder PostDate ------------------------------ ----------- ----------- ----------- --------- ----------- ------------------- First Post 1 1 1 1 1 2000-11-05 13:23:46 First Reply 10 1 1 2 2 2000-11-14 18:42:14 First Sub Reply 11 1 10 3 3 2000-11-14 18:42:33 Reply to #10 13 1 10 3 4 2000-11-14 18:47:18 Reply to #13 14 1 13 4 5 2000-11-14 18:48:00 Reply to #10 15 1 10 3 6 2000-11-14 18:48:12 Second Reply 12 1 1 2 7 2000-11-14 18:43:17 Reply to #12 17 1 12 3 8 2000-11-14 18:49:02 Reply to #1 16 1 1 2 9 2000-11-14 18:48:53 Top Level 18 18 18 1 1 2000-11-14 18:55:28 Next 20 18 18 2 2 2000-11-14 18:55:56 nextdown 22 18 20 3 3 2000-11-14 18:56:15 farther 23 18 22 4 4 2000-11-14 18:56:31 farther 24 18 23 5 5 2000-11-14 18:56:35 farther 25 18 24 6 6 2000-11-14 18:56:43 farther 26 18 25 7 7 2000-11-14 18:56:46 farther 27 18 26 8 8 2000-11-14 18:56:49 farther 28 18 27 9 9 2000-11-14 18:56:53 three 21 18 18 2 10 2000-11-14 18:56:03 HERE 29 18 18 2 11 2000-11-14 18:57:31 Top Level 19 19 19 1 1 2000-11-14 18:55:47 As you can see my SELECT statement is pretty darn simple. Add an index on ThreadID and SortOrder and you'll get great performance. This is very important in this type of application. Most of the transactions against this table are going to be queries so this needs to be optimized to maximize query response time. Looking at the simplicity of this query I think you'll agree we've done that. In this example I've used the SPACE function to generate my indents. In your application you'll probably do that in ASP, VB or whatever you're writing in. The key to this approach is getting the records in the database in the right order. All the work is done in the stored procedure that puts the records into the table. Which looks like this: CREATE PROCEDURE spPost (@ReplyToID int, @Subject char(20) ) AS DECLARE @MaxSortOrder int, @ParentLevel int, @ThreadID int DECLARE @ParentSortOrder int, @NextSortOrder int, @NewPostID int BEGIN TRAN IF @ReplyToID = 0 -- New Post BEGIN INSERT Posts ( ThreadID, ParentID, PostLevel, SortOrder, Subject, PostDate ) VALUES (0, 0, 1, 1, @Subject, getdate()) SELECT @NewPostID = @@IDENTITY UPDATE Posts SET ThreadID = @NewPostID, ParentID = @NewPostID WHERE PostID = @NewPostID END ELSE -- @ReplyToID <> 0 means reply to an existing post BEGIN -- Get Post Information for what we are replying to SELECT @ParentLevel = PostLevel, @ThreadID = ThreadID, @ParentSortOrder = SortOrder FROM Posts WHERE PostID = @ReplyToID -- Is there another post at the same level or higher IF EXISTS (SELECT * FROM Posts WHERE PostLevel <= @ParentLevel AND SortOrder > @ParentSortOrder AND ThreadID = @ThreadID ) BEGIN -- Find the next post at the same level or higher SELECT @NextSortOrder = Min(SortOrder) FROM Posts WHERE PostLevel <= @ParentLevel AND SortOrder > @ParentSortOrder AND ThreadID = @ThreadID -- Move the existing posts down UPDATE Posts SET SortOrder = SortOrder + 1 WHERE ThreadID = @ThreadID AND SortOrder >= @NextSortOrder -- And put this one into place INSERT Posts (ThreadID, ParentID, PostLevel, SortOrder, Subject, PostDate ) VALUES (@ThreadID, @ReplyToID, @ParentLevel + 1, @NextSortOrder, @Subject, getdate() ) END ELSE -- There are no posts at this level or above BEGIN -- Find the highest sort order for this parent SELECT @MaxSortOrder = MAX(SortOrder) FROM Posts WHERE ThreadID = @ThreadID INSERT Posts (ThreadID, ParentID, PostLevel, SortOrder, Subject, PostDate ) VALUES (@ThreadID, @ReplyToID, @ParentLevel + 1, @MaxSortOrder + 1, @Subject, getdate() ) END END COMMIT TRAN GO So let's break down this procedure. The first main section handles a new post at the top level. That's pretty easy. I insert the record and update the record using the identity value generated. I'm sure there are faster ways to do that but I'm all about simple. The next case is replying to a post. The two options are a reply in the middle of a thread or appending to the end. Appending to the end is pretty easy. It's just like adding a new record. Putting a record in the middle is difficult. That is where the SortOrder field comes into play. We always keep this table sorted just like we want it to display. I have to "move down" the existing posts to insert a new one. I've also thought about writing a stored procedure to generate this type of table given a parent child relationship in a table. All you really need to do is run through the original table and call this stored procedure for each record. This will put them in the proper order. Hope this helps. Let me know how this works for you or if there's anything you'd like me to change. I'd suggest you test this strongly. I haven't had as much time to test it as I'd like. UPDATE (2/4/01): And it looks like that last sentence proved prophetic. -graz
|
- Advertisement - |