BI-Framework: ETL

Showing posts with label ETL. Show all posts

Thursday, 9 June 2011

Capture Changed Data (in Teradata)

Preserving history in a fact table is a common process in ETL.
If one ore more columns have changed you can add the changed one.

What if you need to know which field/column has changed?

You can write a construction that compares every field.
However, execution of such a query is performance killer.
Besides that, if the tables has, say, 50 columns it's a hell
of a job to write that.

Here you can find a small subset of an invoice fact table
(in reality there are a lot more fields). Amount, status and country
have changed.

This could be the result of the analysis script:

Here you can find the script that compares every column
per non unique index (e.g. invoicenumber).
Assumption is that the primary index is NUPI. If you designed it
otherwise, you have to change the script (hard-coding).

CREATE MULTISET TABLE DEVK425WGE.DataChangeCapture ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
databasename VARCHAR(50),
tablename VARCHAR(100),
obj_id DECIMAL(15,0),
field VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
item_old VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
item_new VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
date_original TIMESTAMP(6),
date_change TIMESTAMP(6))
PRIMARY INDEX ( databasename, tablename,field,obj_id );

--

create procedure pDataChangeCapture
(
IN OBJECTID CHAR(50)
, IN DATABASENAME VARCHAR(50)
, IN TABLENAME VARCHAR(100)
, IN FIELD VARCHAR(100)
)

BEGIN

CALL DBC.SysExecSQL('

insert into DataChangeCapture
(
databasename
, tablename
, obj_id
, field
, item_old
, item_new
, date_original
, date_change
)

sel
'||'27'xc||:DATABASENAME||'27'xc||'
, '||'27'xc||:TABLENAME||'27'xc||'
, a.'||:OBJECTID||' as obj_id
, '||'27'xc||:FIELD||'27'xc||' as field
, a.'||:FIELD||' as oldvalue
, b.'||:FIELD||' as newvalue
, b.preceding_date as date_original
, b.UPDATE_DT as date_change
from '||DATABASENAME||'.'||TABLENAME||' a,
(sel
'||:OBJECTID||' as objectid
, '||:FIELD||'
, UPDATE_DT' as UPDATE_DT
, max('UPDATE_DT') over (partition by '||:OBJECTID||' order by '||:OBJECTID||' , UPDATE_DT rows between 1
preceding and 1 preceding) as
preceding_date
from '||DATABASENAME||'.'||TABLENAME||'
) b
where
a.'||:OBJECTID||' = b.objectid
and a.UPDATE_DT = b.preceding_date
and (a.'||:FIELD||' <> b.'||:FIELD||'
or (a.'||:FIELD||' is NULL
and b.'||:FIELD||' is not null)
or (b.'||:FIELD||' is NULL
and a.'||:FIELD||' is not null) )'
)
;

END;

--

create procedure pDataChangeCaptureStart (databasename varchar(50), tablename varchar(100))

begin

declare objectid varchar(50);

sel columnname into :objectid from dbc.indices
where databasename = :databasename
and tablename = :tablename
and indextype = 'P';

create volatile table tDataChangeCapture (rownum int, FIELDNAME varchar(100)) ON COMMIT PRESERVE ROWS;

ins into tDataChangeCapture
sel csum(1,1), columnname from dbc.columns
where databasename = :databasename
and tablename = :tablename
and columnname <> :objectid
and columnname <> 'UPDATE_DT';

delete DataChangeCapture;

looper:begin

declare maxfields int;
declare x int default 0;
declare field varchar(100);

sel count(*) from tDataChangeCapture into :maxfields;

loopit: while x<=maxfields

do

set x = x+1;

sel fieldname from tDataChangeCapture into :field where rownum = :x;

call pDataChangeCapture(objectid, databasename, tablename, field);

end while loopit;

end looper;

drop table tDataChangeCapture;

end;

Kick of the procedure by command:

call pDataChangeCaptureStart ('Databasename','Tablename')

Tuesday, 30 November 2010

Ad hoc Mini Mart

If you need to improve performance for complex reports, think about implementing a minimart on top of your datamart.

On of my customers was specifically interested in analysing linehauls with a certain origin and a certain destination.

I created a stored procedure that can be kicked off via a cognos report.

CREATE procedure [dbo].[pLoadVolumeAnalysisMiniMart]
@origin varchar(6)
, @destination varchar(6)
as
declare @loaddate datetime
set @loaddate = getdate()

truncate table VolumeAnalysis MiniMart

insert into VolumeAnalysisMiniMart
select
,Sector
,Sectortype
,Week
,Weight
,Pieces
,Sender
,Receiver
,Origin
,Destination
,Service
,Departuretime
,Arrivaltime
,@loaddate
,Runid
,Rundate
from VolumeAnalysisDataMart
where Origin = @origin
and Destination = @destination

select 'Mini Mart has been loaded' as Result

end

This procedure loads a table with a specific origin and a destination (in casu the number of records decreased from 21 million to several thousands). The dataload takes about 1-2 seconds.

In framework manager I created a star schema with the mini mart as a fact table and aliases of the (original) dimension tables.

I also added the stored procedure as query object:

call pLoadReportPromptSector;1(
#prompt('origin')#,
#prompt('destination')#
);

In Report Studio I created a query with data item 'Result', prompts 'Origin' and 'Destination'.
As soon as the parameters are filled out, the procedure will be kicked off and the data will be loaded.

I also created hyperlink buttons on the report that links the user to several reports which retrieve the data from the minimart.

If you do not have schema's for each user, you can create separate tables or add column 'user' with #account. You can pass this variable to the procedure, so every user has it's own set of data.

See also:

http://www.sqlserverclub.com/pdfs/minimarts-organizational-segmentation-in-datawarehousing.pdf :

Many decisions makers not only need to analyze data from within the business unit,
but also across business units or the many segments within a business unit. Think of
it as market segmentation – cubes for more a selective audience.
Your customers can be grouped by segments of business units or smaller functional
groups with unique needs for reporting. Most segments will contain unique, pertinent
subsets of data along with business unit-wide data for decision making. Therefore,
cubes can be made from subsets of cubes in what I would call “minimarts.” These
are datasets that can be made “precise” for more targeted queries thereby
increasing performance, especially across a network, intranet, extranet (or even the
general public internet).
The pros are pretty clear by targeting your audience. Performance will be greater
than querying vast cubes of data that can go unused for most members of your
customer base.

Monday, 9 November 2009

Automatically drop indexes

I'm developing several star schema's for several data marts / data warehouses.
Over and over again, I have to key-in all the drop index scripts for the fact tables
(drop indexes --> load data --> create indexes).

I created a script to automate this (kick it off by
EXECUTE pDropIndexes '[TableName]':

CREATE PROCEDURE pdropindexes @tablename varchar(255)
AS
BEGIN
DECLARE @indexName NVARCHAR(128)
, @dropIndexSql NVARCHAR(4000)
DECLARE tableIndexes CURSOR FOR
SELECT name
FROM sys.indexes
WHERE object_id = OBJECT_ID(@tablename)
AND type =2 -- do not delete primary key
OPEN tableIndexes
FETCH NEXT FROM tableIndexes INTO @indexName WHILE @@fetch_status = 0
BEGIN
SET @dropIndexSql = N'DROP INDEX ' + @tablename +'.' + @indexName
EXEC sp_executesql @dropIndexSql
FETCH NEXT FROM tableIndexes INTO @indexName
END
CLOSE tableIndexes
DEALLOCATE tableIndexes
END

Sunday, 26 April 2009

Public holiday indicator in your Date Dimension

For a lot of reasons we want to determine whether a day is a working day or not.
For example: productivity figures are only valid if we exclude the bank / public holidays.
Trend analysis reports show representative charts if we exclude weeks with signifcant low values (as a public holiday is in the week).

No RDBMS supports public holidays, not static ones (like New Year), neither will dynamical holidays be supported (like Christian holidays which depend on the date of Easter Day).

Whit Monday is seven weeks after Easter Day and Ascension day is 10 days before Whit Monday.
So we only have to calculate the date of Easter Day. This is a complex calculation as it is originated on a non-accurate version of the Hebrewian calendar. There is a correct formula: Jewish months start on Full Moon. The 14th or 15th day must follow immediately after Full Moon. That's why the decision has been made that Easter Day is the sunday after the first Full Moon after ‘Vernal Equinox’ in spring (the crusification of Jesus). ‘Vernal Equinox’ is on March, 21.
Full Moon before Easter Day is being called: 'Paschal Full Moon'. This conflicts with the date of the real Full Moon (astronomical). As 'Paschal Full Moon' is not variabel, Easter Day can be calculated (hence all other Christian holidays).

I used the following calculation in a stored procedure to calculate the Public Holiday Indicator in a date dimension:

Source: Claus Tøndering, Frequently Asked Questions about Calendars, 25 september 2000

Prerequisites:

CREATE SCHEMA BSM

CREATE TABLE [bsm].[DIM_date]
( [date_id] [int] NOT NULL
, [date] [datetime] NOT NULL
, [day_nr] [int] NOT NULL
, [month_nr] [int] NOT NULL
, [month] [varchar](10) NOT NULL
, [month_key] [int] NOT NULL
, [quarter_nr] [int] NOT NULL
, [quarter] [varchar](2) NOT NULL
, [quarter_key] [int] NOT NULL
, [year] [int] NOT NULL
, [week_nr] [int] NOT NULL
, [week] [varchar](7) NOT NULL
, [week_key] [int] NOT NULL
, [weekday_nr] [int] NOT NULL
, [day] [varchar](9) NOT NULL
, [public_holiday] [varchar](25) NOT NULL
, [weekend] [varchar](1) NOT NULL
, [working_day] [varchar](1) NOT NULL
, [nr_working_days] [int] NOT NULL
, [nr_days] [int] NOT NULL
, CONSTRAINT [PK_DIM_date_id] PRIMARY KEY CLUSTERED
( [date_id] ASC)WITH (IGNORE_DUP_KEY = OFF)
ON [PRIMARY]) ON [PRIMARY]

The function below is needed as SQL SERVER 2005 doesn't support ISO8601 weeknumber (only United States weeknumbers). You do not need this in SQL SERVER 2008. Oracle 9i and higher support it by using the 'IW'-schema.

CREATE FUNCTION [bsm].[ISO8601WeekNr] (@Date datetime)
RETURNS INT
AS
BEGIN
DECLARE @ISO8601Week int
DECLARE @day int
DECLARE @month int
DECLARE @year int
DECLARE @a int
DECLARE @y int
DECLARE @m int
DECLARE @JD int -- Julian day number
DECLARE @d4 int
DECLARE @L int
DECLARE @d1 int

SET @year = YEAR(@Date)
SET @month = MONTH(@Date)
SET @day = DAY(@Date)
SET @a = (14-@month)/12
SET @y = @year +4800 -@a
SET @m = @month + (12*@a) - 3
SET @JD = @day + ( ((153*@m)+2)/5 ) + (365*@y) + (@y/4) - (@y/100) + (@Y/400) - 32045
SET @d4 = (@JD + 31741 - (@JD % 7)) % 146097 % 36524 %1461SET @L = @d4/1460
SET @d1 = ((@d4-@L)%365)+@L
SET @ISO8601Week = (@d1/7) +1
RETURN @ISO8601Week
END

The load script:

CREATE PROCEDURE [bsm].[pLoadDIM_Date] AS
BEGIN
DECLARE @d int -- day counter
DECLARE @date datetime
DECLARE @day_nr int
DECLARE @month varchar(25)
DECLARE @year int
DECLARE @yearvarchar varchar(4)
DECLARE @quarter_nr int
DECLARE @month_nr int
DECLARE @day varchar(9)
DECLARE @g int -- Golden number (-1): Years with the same Golden number have 'New Moon' on the same date
DECLARE @c int -- year/4
DECLARE @h int -- number of days since 'New Moon'
DECLARE @i int -- number of days since March, 21 and 'Paschal Full Maan'
DECLARE @j int -- 'Paschal Full Moon day'
DECLARE @l int -- number of days between March, 21 and Sunday on or before 'Paschal Full Moon' (between -6 and 28)
DECLARE @Easter_month int
DECLARE @Easter int
DECLARE @Easter_date datetime
DECLARE @public_holiday int
DECLARE @x int -- working days counter
DECLARE @y int -- indicator in case week 53 from year x is in year x+1 (100=yes;0=no)
DECLARE @z int -- indicator in case week 1 from year x is in year x-1 (100=yes;0=no)

DECLARE @weekday_nr int

truncate table bsm.dim_date

SET DATEFIRST 1 -- First day of week is Monday (Default is Sunday)
SET @d = -5475 -- First date is current date minus 5475 days
SET @x = 0

WHILE @d <5475

BEGIN

SET @date= DATEADD(dd, @d, GETDATE())
SET @year =YEAR(@date)
SET @yearvarchar = CONVERT(VARCHAR,DATEPART(yy,DATEADD(dd, @d, GETDATE())))
SET @quarter_nr = DATEPART(qq,DATEADD(dd, @d, GETDATE()))
SET @month_nr = DATEPART(mm, DATEADD(dd, @d, GETDATE()))
SET @weekday_nr = DATEPART(dw, DATEADD(dd, @d, GETDATE()))
SET @day_nr = DATEPART(dd, DATEADD(dd, @d, GETDATE()))
SET @day = DATENAME(dw,DATEADD(dd, @d, GETDATE()))
SET @month = DATENAME(mm,DATEADD(dd, @d, GETDATE()))
SET @g = @year%19 SET @c = @year/100SET @h = (@c-(@c/4)-((8*@c+13)/25)+(19*@g)+15)%30
SET @i = @h-((@h/28)*(1-(29/(@h+1)))*((21-@g)/11))
SET @j = (@year+(@year/4)+@i+2-@c+(@c/4))%7SET @l = @i-@j
SET @Easter_month = 3+((@l+40)/44)
SET @Easter = @l+28-31*(@Easter_month/4)
SET @Easter_date = CONVERT(CHAR(10),(CONVERT(DATETIME,CAST(@Easter_month AS CHAR)+'/'+CAST(@Easter AS CHAR)+'/'+CAST(DATEPART(yy, DATEADD(dd,@d, GETDATE())) AS CHAR))),110)

IF @weekday_nr = 6 OR @weekday_nr = 7 OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-1, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-49, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-50, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-39, GETDATE()),110) OR (@day_nr= 1 AND @month_nr = 1) OR (@day_nr = 30 AND @month_nr = 4) OR (@day_nr = 25 AND @month_nr = 12) OR (@day_nr = 26 AND @month_nr = 12)
SET @public_holiday = 1 ELSE select @public_holiday = 0

IF @public_holiday =1 or @weekday_nr in(6,7)
SET @x = @x ELSE
SET @x =@x+1

IF bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) >51 and @month_nr = 1
SET @y=100 ELSE SET @y=0 -- if a week belongs party to the next year (2004.53 belongs party to 2005) set indicator on 100 so -- if you subtract 100 of 200553, week 200553 is still week 200453.

IF bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) =1 and @month_nr = 12
SET @z=100 ELSE SET @z=0 -- if a week is in last year (first week of 2004 belongs partly to 2003) set indicator on 100 so -- if you add 100 to 200301 week 200301, week 200301 is still 200401

INSERT INTO bsm.dim_date
SELECT ((DATEPART(yy, DATEADD(dd, @d, GETDATE())) * 100) + DATEPART(mm,DATEADD(dd, @d, GETDATE()))) * 100 + @day_nr AS date_id
, CAST(CONVERT(VARCHAR, @date, 111) AS DATETIME) as date
, @day_nr AS day_nr
, DATEPART(mm, DATEADD(dd, @d, GETDATE())) AS month_nr
, @month AS month
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+ DATEPART(mm, DATEADD(dd, @d, GETDATE())) AS month_key
, DATEPART(qq,DATEADD(dd, @d, GETDATE())) AS quarter_nr
, 'Q'+CONVERT(VARCHAR,DATEPART(qq,DATEADD(dd, @d, GETDATE()))) as quarter
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+ DATEPART(qq,DATEADD(dd, @d, GETDATE())) AS quarter_key
, @year AS year_nr
, bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) AS week_nr
, 'Week '+CONVERT(VARCHAR,bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101))) AS week
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+CAST (bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) AS VARCHAR)-@y+@z AS week_key
, @weekday_nr AS weekday_nr
, @day AS day
, CASE WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d, GETDATE()),110) THEN 'Easter Day'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-1, GETDATE()),110) THEN'Easter Monday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-49, GETDATE()),110) THEN 'Whit Sunday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-50, GETDATE()),110) THEN 'Whit Monday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-39, GETDATE()),110) THEN 'Ascension day'
WHEN @day_nr= 1 AND @month_nr = 1 THEN 'New Year-s Day' WHEN @day_nr = 30 AND @month_nr = 4 THEN 'Dutch Queen-s Day'
WHEN @day_nr = 25 AND @month_nr = 12 THEN 'Christmas Day' WHEN @day_nr = 26 AND @month_nr = 12 THEN 'Boxing Day' ELSE '-'
END AS public_holiday
, CASE WHEN @weekday_nr in(6,7) THEN 'Y' ELSE 'N' END AS weekend
, CASE WHEN @public_holiday = 1 OR @weekday_nr in(6,7) THEN 'N' ELSE 'Y' END AS working_day
, @x AS nr_working_days
, @d+5476 AS nr_days

SET @d = @d + 1

END

END

Friday, 24 April 2009

Monitor / Audit your Data Warehouse load

One of the most important means of monitoring the run of your Data Warehouse load, is getting insight in the status of running jobs, the duration of the jobs (in history) and which jobs doesn't seem to end. An execution of a simple update script of an admin table before the start of a job and after the end of a job can result in a table like:

RunID	Process	Action	Object	New	Updated	Deleted	Load_start	Load_end
1	EDW - HRM	Publish HRM_cube	HRM_Montly.mdc				4/15/09 8:09:38 AM	4/15/09 8:10:02 AM
1	EDW - HRM	FTP HRM cube	HRM_Montly.mdc				4/15/09 8:09:22 AM	4/15/09 8:09:38 AM
1	EDW - HRM	Create HRM cube	HRM_Montly.mdc				4/15/09 8:09:21 AM	4/15/09 8:09:22 AM
1	EDW - HRM	Load new facts into aggregate table in Business Model	BSM.FCT_Hour_Monthly	11408	0	0	4/15/09 8:09:21 AM	4/15/09 8:09:21 AM
1	EDW - HRM	Delete existing facts from aggregate table in Business Model	BSM.FCT_Hour_Monthly	0	0	344	4/15/09 8:09:03 AM	4/15/09 8:09:21 AM
1	EDW - HRM	Load Dimension Supplier into BSM (Refresh)	BSM.DIM_Supplier	3	0	0	4/15/09 8:09:02 AM	4/15/09 8:09:03 AM
1	EDW - HRM	Load Dimension Organisational Unit into BSM (Refresh)	BSM.DIM_OrganisationalUnit	23	0	0	4/15/09 8:09:00 AM	4/15/09 8:09:02 AM
1	EDW - HRM	Load Dimension Employee into BSM (Refresh)	BSM.DIM_Employee	45632	0	0	4/15/09 7:59:20 AM	4/15/09 8:09:00 AM
1	EDW - HRM	Load Date Dimension into Business Model (Refresh)	BSM.DIM_Date	365	0	0	4/15/09 7:59:18 AM	4/15/09 7:59:20 AM
1	EDW - HRM	Load new and changed hour facts into ODS	ODS.FCT_Hour	45632	0	0	4/15/09 7:59:16 AM	4/15/09 7:59:18 AM
1	EDW - HRM	Load new and changed targets into ODS	ODS.FCT_Target	12	0	0	4/15/09 7:59:15 AM	4/15/09 7:59:16 AM
1	EDW - HRM	Load new and changed suppliers into ODS	ODS.DIM_Supplier	0	0	0	4/15/09 7:59:08 AM	4/15/09 7:59:15 AM
1	EDW - HRM	Load new and changed organisational units into ODS	ODS.DIM_OrganisationalUnit	0	1	0	4/15/09 7:58:58 AM	4/15/09 7:59:08 AM
1	EDW - HRM	Load new and changed employees into ODS	ODS.DIM_Employee	1	2	0	4/15/09 7:58:55 AM	4/15/09 7:58:58 AM
1	EDW - HRM	Transformation step: create hierarchy for organisational unit	STG.OrganisationalUnit	0	23	0	4/15/09 7:58:52 AM	4/15/09 7:58:53 AM
1	EDW - HRM	Transformation step: calculate productivity	STG.Hour	0	45632	0	4/15/09 7:58:49 AM	4/15/09 7:58:52 AM
1	EDW - HRM	Lookup surrogate key supplier	STG.Supplier	0	3	0	4/15/09 7:58:23 AM	4/15/09 7:58:49 AM
1	EDW - HRM	Lookup surrogate key organisational unit	STG.OrganisationalUnit	0	23	0	4/15/09 7:51:56 AM	4/15/09 7:58:04 AM
1	EDW - HRM	Lookup surrogate key employee	STG.employee	0	2456	0	4/15/09 7:51:28 AM	4/15/09 7:51:56 AM
1	EDW - HRM	Detect corrected hours	STG.Hour	0	35	0	4/15/09 7:51:24 AM	4/15/09 7:51:28 AM
1	EDW - HRM	Detect changed supplier attributes	STG.Hour	0	0	0	4/15/09 7:51:22 AM	4/15/09 7:51:24 AM
1	EDW - HRM	Detect changed organisational unit attributes	STG.OrganisationalUnit	0	1	0	4/15/09 7:51:18 AM	4/15/09 7:51:22 AM
1	EDW - HRM	Detect changed employee attributes	STG.Employee	2	0	0	4/15/09 7:51:15 AM	4/15/09 7:51:18 AM
1	EDW - HRM	Extract Monthly Targets	STG.Target	12	0	0	4/15/09 7:50:54 AM	4/15/09 7:51:15 AM
1	EDW - HRM	Extract Fact Hours	STG.Hour	45632	0	0	4/15/09 7:50:49 AM	4/15/09 7:50:52 AM
1	EDW - HRM	Extract Dimension Supplier	STG.Supplier	3	0	0	4/15/09 7:50:17 AM	4/15/09 7:50:49 AM
1	EDW - HRM	Extract Dimension Organisational Unit	STG.OrganisationalUnit	23	0	0	4/15/09 7:49:21 AM	4/15/09 7:50:17 AM
1	EDW - HRM	Extract Dimension Employee	STG.Employee	2456	0	0	4/15/09 7:49:20 AM	4/15/09 7:49:21 AM

Firstly, create an admin table:

-- Create MetaData layer

create schema mda

-- Create Admin table

CREATE TABLE [mda].[admin_run](

[Runid] [int] NULL,

[Process] [varchar](50) ,

[Action] [varchar](50) ,

[Object] [varchar](50) ,

[Inserted] [int] NULL,

[Updated] [int] NULL,

[Deleted] [int] NULL,

[LoadStart] [datetime] NULL,

[LoadEnd] [datetime] NULL

)

Secondly, create stored procedures to update the table:

CREATE PROCEDURE [mda].[PUpdate_Admin_Run_Start]

(

@Runid int

, @Process varchar(255)

, @Action varchar(255)

, @Object varchar(255)

)

BEGIN

insert into mda.admin_run

(

RunID

, Process

, Action

, Object

, Inserted

, Updated

, Deleted

, LoadStart

)

values

(

@RunID

, @Process

, @Action

, @Object

, ''

, getdate()

)

END

CREATE PROCEDURE [mda].[PUpdate_Admin_Run_End]

( @RunId int

, @Action varchar(50)

, @Records_New int = 0

, @Records_Updated int = 0

, @Records_Deleted int = 0 )
AS
BEGIN
update mda.admin_run

set LoadEnd = getdate()

,Inserted = @Records_new

,Updated = @Records_updated

,Deleted = @Records_deleted

where RunId = @RunId

and LoadEnd is null

and Action = @Action
END

Below you find an example script which updates the admin table:

begin

declare @Records_inserted int

declare @Records_updated int

declare @Records_deleted int

declare @Runid int

declare @Process varchar(50)

declare @Action varchar(50)

declare @Object varchar(50)

declare @rows int

select @rows = count(*) from mda.admin_run

if @rows = 0 set @Runid = 1 else select @runid = max(runid) from mda.admi_run

set @Process = 'EDW - HRM'

set @Action = 'Extract Dimension Employee'

set @Object = 'STG.Employee'

exec mda.PUpdate_Admin_Run_Start @RunID,@Process,@Action,@Object

--load script

select getdate()

set @Records_Inserted = @@rowcount

exec mda.PUpdate_Admin_Run_End @RunID,@Action,@Records_Inserted, @Records_updated, @Records_deleted

end

Wednesday, 4 March 2009

Adding a trend line to your report

Trend lines are used in business analytics to show changes in data over time. Trend lines are often used to argue that a particular action or event (such as the first press major press releases about the current financial crisis) caused observed changes at a point in time.

This is a simple technique, and does not require a sophisticated analysis technique. However, most reporting tools do not have the feature to add trend lines in charts (like MS-Excel has).

You can add the linear regression formula in your reporting tool as a calculated field. You shouldn't do this as every time the report is executed the formula has to be calculated. Besides this, the formula is a performance killer. Rather than this, define the y-values of the trendline in your ETL process to update it in your Data Warehouse.

The formula used is: y' = mx +b where m is the slope of the line and b is the y-intercept, which is the y-coordinate of the point where the line crosses the y axis. This can be seen by letting x = 0, which immediately gives y = b. In mathematic equation, y'=

To clarify this formula, take a look at the following example:

x y

1 20883

2 24260

3 21678

4 27094

gives:

x = {1,2,3,4},
y = {20883,24260,21678,27094),
average y= (20883+24260+21678+27094)/4 = 23478.75
average x=(1+2+3+4)/4 = 2.5
total xy = (20883*1)+(24260*2)+(21678*3)+(27094*4) = 242813
total x = 1+2+3+4=10
total x*x= (1*1)+(2*2)+(3*3)+(4*4) = 10

results in a slope of 1605.1 and a y-intercept of 19466, so the y'-coordinates will be:

x y'
1 21071.1
2 22676.2
3 24281.3
4 25886.4

If above example is the revenue for a certain salesperson and you want to visualize a list of salespersons starting with the one with the most declining trend in revenue, you have to use the slope to set the order property. In mathematic equation, the slope =

The slope is used to describe the steepness, incline, gradient, or grade of the trend line. A higher slope value indicates a steeper incline. The slope is defined as the ratio of the "rise" divided by the "run" between two points on the line, or in other words, the ratio of the altitude change to the horizontal distance between any two points on the line:

Here you find the Sql script (based on SQL Server) to load the trend data":

CREATE TABLE
[dbo].[BM_Trend]
([Salesperson] [nvarchar(25)] NULL,
[Week] [tinyint] NULL,
[y] [numeric](18, 0) NULL,
[x] [numeric](6, 0) NULL,
[x2] [int] NULL,
[xy] [int] NULL,
[totalx] [int] NULL,
[totaly] [numeric](18, 0) NULL,
[totalx2] [bigint] NULL,
[totalxy] [numeric](18, 0) NULL,
[avgx] [float] NULL,
[avgy] [float] NULL,
[maxx] [int] NULL,
[slope] [float] NULL,
[intercept] [float] NULL,
[ytrend] [float] NULL)

insert into bm_trend
(
, salesperson
, week
, y
, x
, x2
, xy
)
select
Week
, sum(fs.value) y
, ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) x
, ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) *
ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) x2
, ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) *
sum(fs.value) xy
from BM_SalesFacts FS
group by fs.salesperson, fs.week

update bm_trend
set totalx = a.totalx
, totaly = a.totaly
, totalx2 = a.totalx2
, totalxy = a.totalxy
, avgx = a.avgx
, avgy = a.avgy
, maxx = a.maxx
from
(select
salesperson
, sum(x) totalx
, sum(y) totaly
, sum(x2) totalx2
, sum(xy) totalxy
, avg(x) avgx
, avg(y) avgy
, max(x) maxx
from
bm_trend
group by salesperson)a
where a.salesperson = bm_trend.salesperson

update bm_sectortrend
set slope = a.slope
, intercept = a.intercept
, ytrend = a.ytrend
from
(select
salesperson
, x
, case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end slope
, avgy-(case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end) *avgx intercept
, (case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end)*x+
avgy-(case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end) *avgx Ytrend
from
bm_trend
)a
where a.salesperson = bm_trend.salesperson
and a.x = bm_trend.x

Now you can use column Ytrend for the trendline in the chart and you can use column slope to order the salespersons in a list.

BI-Framework