Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SSIS and Import/Export (2005)
 !!! ETL issue !!!

Author  Topic 

frank.svs
Constraint Violating Yak Guru

368 Posts

Posted - 2009-03-23 : 09:43:53

Dear All,

Need help regarding MAX_WORKER_THREADS option.

We have few ETL's running on Testing Server.

We have different cycles involved in the ETL developement.

Basically we have 3 databases, 1 for staging , 1 for ODS and for actual warehouse which contains the
Aggregated tables.

We are following the below architecture,

From front-end we are uploading an Excel file and which inturn calls a stored procedure which eventually
loads the data from Excel into 1st DB(i.e staging DB which contains all varchar datatypes).

From there we perform one more ETL to intermediate database i.e ODS. The tables in DB1(staging) and
ODS database are same except the datatypes. and again from there we do the ETL which contains the logic for Aggregations and load into 3rd DB(i.e datawarehouse containing aggregated tables)

Now the problem is ,

Year wide we have 6 cycles to load the data into datawarehouse.

For Each Cycle , we have set of SSIS packages which does the ETL.

For all Cycles, we have few common tables getting involved and If we run all the cycles at a single GO, then we are getting locking issues. But if we manually run Cycle by Cycle, then it would not be a problem for us.

Once Staging Database is Ready and if we are trying to invoke all Cycles (ETLS), we are running into Locking issues.

To avoid that , what we had done is SET the MAX_WORKER_THREADS = 1 , ensuring that only 1 thread( i.e ETL) is running at any momment of time.

select * from msdb..syssubsystems where subsystem = 'CmdExec'

UPDATE msdb..syssubsystems
SET max_worker_threads = 1
WHERE subsystem = 'CmdExec'

We implemented on TEST environment and tried to call the cycles simultaneously. It worked fine.

But now, we are running into problems with this options

Now i have doubts here ,

1. Do we need to set / turn on any option at OS level eventhough ,
we have set the MAX_WORKER_THREADS for 'CmdExec' subsystem (i.e executes at DOS level). Please
correct me if am wrong.

2. Is there any Algorithm or way u can suggest so that we can execute the ETL's in FIFO order eventhough
we call all the ETL's simutaneously.

3. Or can we implement a Queing System by maintaining any set of tables(MetaData tables and SQL Server Job Agent Tables)
to execute them 1 by 1.

Looking forward for u r suggestions and would be greatly appreciated.

/* Sample Code for invoking the SSIS package from the stored procedure */


CREATE procedure [dbo].[usp_StartDataLoads]
as
Begin
declare @jid uniqueidentifier
declare @cmd varchar(4000)

SET @cmd = '"C:\Program Files\Microsoft SQL Server\90\DTS\Binn\DTExec.exe" /F "E:\ETL\Load_Data001.dtsx" '

print @cmd

declare @jname varchar(128)
set @jname = cast(newid() as char(36))

-- Create job
exec msdb.dbo.sp_add_job
@job_name = @jname,
@enabled = 1,
@category_name = '<<JobCategoryName>>',
@delete_level = 1,
@job_id = @jid OUTPUT

exec msdb.dbo.sp_add_jobserver
@job_id = @jid,
@server_name = '<<ServerName/InstanceNmae>>'

exec msdb.dbo.sp_add_jobstep
@job_id = @jid,
@step_name = 'ExecutePackage0001',
@subsystem = 'CMDEXEC', /* Make a note of Subsystem */
@proxy_name = '<<ProxyName>>',
@command = @cmd

-- Start job
exec msdb.dbo.sp_start_job
@job_id = @jid

End



Thanks in Advance!



   

- Advertisement -