Monday 9 November 2009

Automatically drop indexes

I'm developing several star schema's for several data marts / data warehouses.
Over and over again, I have to key-in all the drop index scripts for the fact tables
(drop indexes --> load data --> create indexes).

I created a script to automate this (kick it off by
EXECUTE pDropIndexes '[TableName]':


CREATE PROCEDURE pdropindexes @tablename varchar(255)
AS
BEGIN
DECLARE @indexName NVARCHAR(128)
, @dropIndexSql NVARCHAR(4000)
DECLARE tableIndexes CURSOR FOR
SELECT name
FROM sys.indexes
WHERE object_id = OBJECT_ID(@tablename)
AND type =2 -- do not delete primary key
OPEN tableIndexes
FETCH NEXT FROM tableIndexes INTO @indexName WHILE @@fetch_status = 0
BEGIN
SET @dropIndexSql = N'DROP INDEX ' + @tablename +'.' + @indexName
EXEC sp_executesql @dropIndexSql
FETCH NEXT FROM tableIndexes INTO @indexName
END
CLOSE tableIndexes
DEALLOCATE tableIndexes
END

Monday 27 April 2009

Multiple years in a Cognos ReportNet Graph

If you visualize data to a line chart with 'zero-values' (or in my case, they do have values but they are excluded because of the fact that they belong to weeks with a public holiday) and you need to create a distinction between the years, there will be gaps in the line. See the graph below:

sales20082009

I needed to create a graph starting in week 44 (the start of the impact of the financial crises) without any gaps. The periods (200844-200943 and 200744-200843) had to have different to colours. If you create a chart without series, there is no distinction anymore between
the years.

graph_normal

Query:

query1

Tabular data: (the data item 'Serie'' is defined in Report Studio).

salestable

To place all historical data on one line, I added a week_key column which is the concatination of the year and the week. I added the 'serie column' as a serie (instead of year). In the properties of the level item of the week_key I set 'Sort' on 'ascending':

week_key_no_key

chart_adapted

It didn't work immediately as the 'key property' of the level item 'week_key' had to be set on 'Yes':

week_key_key

After this, the result was:

sales200744-

Another usage of this:

20090428salesquarter

Query:

20090428 - Query SalesQuarter

Sunday 26 April 2009

Public holiday indicator in your Date Dimension

For a lot of reasons we want to determine whether a day is a working day or not.
For example: productivity figures are only valid if we exclude the bank / public holidays.
Trend analysis reports show representative charts if we exclude weeks with signifcant low values (as a public holiday is in the week).

No RDBMS supports public holidays, not static ones (like New Year), neither will dynamical holidays be supported (like Christian holidays which depend on the date of Easter Day).

Whit Monday is seven weeks after Easter Day and Ascension day is 10 days before Whit Monday.
So we only have to calculate the date of Easter Day. This is a complex calculation as it is originated on a non-accurate version of the Hebrewian calendar. There is a correct formula: Jewish months start on Full Moon. The 14th or 15th day must follow immediately after Full Moon. That's why the decision has been made that Easter Day is the sunday after the first Full Moon after ‘Vernal Equinox’ in spring (the crusification of Jesus). ‘Vernal Equinox’ is on March, 21.
Full Moon before Easter Day is being called: 'Paschal Full Moon'. This conflicts with the date of the real Full Moon (astronomical). As 'Paschal Full Moon' is not variabel, Easter Day can be calculated (hence all other Christian holidays).


I used the following calculation in a stored procedure to calculate the Public Holiday Indicator in a date dimension:

Easter calculation
Source: Claus Tøndering, Frequently Asked Questions about Calendars, 25 september 2000


Prerequisites:

CREATE SCHEMA BSM

CREATE TABLE [bsm].[DIM_date]
( [date_id] [int] NOT NULL
, [date] [datetime] NOT NULL
, [day_nr] [int] NOT NULL
, [month_nr] [int] NOT NULL
, [month] [varchar](10) NOT NULL
, [month_key] [int] NOT NULL
, [quarter_nr] [int] NOT NULL
, [quarter] [varchar](2) NOT NULL
, [quarter_key] [int] NOT NULL
, [year] [int] NOT NULL
, [week_nr] [int] NOT NULL
, [week] [varchar](7) NOT NULL
, [week_key] [int] NOT NULL
, [weekday_nr] [int] NOT NULL
, [day] [varchar](9) NOT NULL
, [public_holiday] [varchar](25) NOT NULL
, [weekend] [varchar](1) NOT NULL
, [working_day] [varchar](1) NOT NULL
, [nr_working_days] [int] NOT NULL
, [nr_days] [int] NOT NULL
, CONSTRAINT [PK_DIM_date_id] PRIMARY KEY CLUSTERED
( [date_id] ASC)WITH (IGNORE_DUP_KEY = OFF)
ON [PRIMARY]) ON [PRIMARY]


The function below is needed as SQL SERVER 2005 doesn't support ISO8601 weeknumber (only United States weeknumbers). You do not need this in SQL SERVER 2008. Oracle 9i and higher support it by using the 'IW'-schema.

CREATE FUNCTION [bsm].[ISO8601WeekNr] (@Date datetime)
RETURNS INT
AS
BEGIN
DECLARE @ISO8601Week int

DECLARE @day int
DECLARE @month int
DECLARE @year int
DECLARE @a int
DECLARE @y int
DECLARE @m int
DECLARE @JD int -- Julian day number
DECLARE @d4 int
DECLARE @L int
DECLARE @d1 int

SET @year = YEAR(@Date)

SET @month = MONTH(@Date)
SET @day = DAY(@Date)
SET @a = (14-@month)/12
SET @y = @year +4800 -@a
SET @m = @month + (12*@a) - 3
SET @JD = @day + ( ((153*@m)+2)/5 ) + (365*@y) + (@y/4) - (@y/100) + (@Y/400) - 32045
SET @d4 = (@JD + 31741 - (@JD % 7)) % 146097 % 36524 %1461SET @L = @d4/1460
SET @d1 = ((@d4-@L)%365)+@L
SET @ISO8601Week = (@d1/7) +1
RETURN @ISO8601Week

END

The load script:

CREATE PROCEDURE [bsm].[pLoadDIM_Date] AS
BEGIN
DECLARE @d int -- day counter
DECLARE @date datetime
DECLARE @day_nr int
DECLARE @month varchar(25)
DECLARE @year int
DECLARE @yearvarchar varchar(4)
DECLARE @quarter_nr int
DECLARE @month_nr int
DECLARE @day varchar(9)
DECLARE @g int -- Golden number (-1): Years with the same Golden number have 'New Moon' on the same date

DECLARE @c int -- year/4
DECLARE @h int -- number of days since 'New Moon'
DECLARE @i int -- number of days since March, 21 and 'Paschal Full Maan'
DECLARE @j int -- 'Paschal Full Moon day'
DECLARE @l int -- number of days between March, 21 and Sunday on or before 'Paschal Full Moon' (between -6 and 28)
DECLARE @Easter_month int
DECLARE @Easter int
DECLARE @Easter_date datetime
DECLARE @public_holiday int
DECLARE @x int -- working days counter
DECLARE @y int -- indicator in case week 53 from year x is in year x+1 (100=yes;0=no)

DECLARE @z int -- indicator in case week 1 from year x is in year x-1 (100=yes;0=no)

DECLARE @weekday_nr int

truncate table bsm.dim_date

SET DATEFIRST 1 -- First day of week is Monday (Default is Sunday)
SET @d = -5475 -- First date is current date minus 5475 days
SET @x = 0

WHILE @d <5475

BEGIN

SET @date= DATEADD(dd, @d, GETDATE())
SET @year =YEAR(@date)
SET @yearvarchar = CONVERT(VARCHAR,DATEPART(yy,DATEADD(dd, @d, GETDATE())))

SET @quarter_nr = DATEPART(qq,DATEADD(dd, @d, GETDATE()))
SET @month_nr = DATEPART(mm, DATEADD(dd, @d, GETDATE()))
SET @weekday_nr = DATEPART(dw, DATEADD(dd, @d, GETDATE()))
SET @day_nr = DATEPART(dd, DATEADD(dd, @d, GETDATE()))
SET @day = DATENAME(dw,DATEADD(dd, @d, GETDATE()))
SET @month = DATENAME(mm,DATEADD(dd, @d, GETDATE()))
SET @g = @year%19 SET @c = @year/100SET @h = (@c-(@c/4)-((8*@c+13)/25)+(19*@g)+15)%30
SET @i = @h-((@h/28)*(1-(29/(@h+1)))*((21-@g)/11))
SET @j = (@year+(@year/4)+@i+2-@c+(@c/4))%7SET @l = @i-@j
SET @Easter_month = 3+((@l+40)/44)
SET @Easter = @l+28-31*(@Easter_month/4)
SET @Easter_date = CONVERT(CHAR(10),(CONVERT(DATETIME,CAST(@Easter_month AS CHAR)+'/'+CAST(@Easter AS CHAR)+'/'+CAST(DATEPART(yy, DATEADD(dd,@d, GETDATE())) AS CHAR))),110)

IF @weekday_nr = 6 OR @weekday_nr = 7 OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-1, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-49, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-50, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-39, GETDATE()),110) OR (@day_nr= 1 AND @month_nr = 1) OR (@day_nr = 30 AND @month_nr = 4) OR (@day_nr = 25 AND @month_nr = 12) OR (@day_nr = 26 AND @month_nr = 12)
SET @public_holiday = 1 ELSE select @public_holiday = 0

IF @public_holiday =1 or @weekday_nr in(6,7)
SET @x = @x ELSE
SET @x =@x+1


IF bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) >51 and @month_nr = 1
SET @y=100 ELSE SET @y=0 -- if a week belongs party to the next year (2004.53 belongs party to 2005) set indicator on 100 so -- if you subtract 100 of 200553, week 200553 is still week 200453.

IF bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) =1 and @month_nr = 12
SET @z=100 ELSE SET @z=0 -- if a week is in last year (first week of 2004 belongs partly to 2003) set indicator on 100 so -- if you add 100 to 200301 week 200301, week 200301 is still 200401

INSERT INTO bsm.dim_date
SELECT ((DATEPART(yy, DATEADD(dd, @d, GETDATE())) * 100) + DATEPART(mm,DATEADD(dd, @d, GETDATE()))) * 100 + @day_nr AS date_id
, CAST(CONVERT(VARCHAR, @date, 111) AS DATETIME) as date
, @day_nr AS day_nr
, DATEPART(mm, DATEADD(dd, @d, GETDATE())) AS month_nr
, @month AS month
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+ DATEPART(mm, DATEADD(dd, @d, GETDATE())) AS month_key
, DATEPART(qq,DATEADD(dd, @d, GETDATE())) AS quarter_nr
, 'Q'+CONVERT(VARCHAR,DATEPART(qq,DATEADD(dd, @d, GETDATE()))) as quarter
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+ DATEPART(qq,DATEADD(dd, @d, GETDATE())) AS quarter_key
, @year AS year_nr
, bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) AS week_nr
, 'Week '+CONVERT(VARCHAR,bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101))) AS week
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+CAST (bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) AS VARCHAR)-@y+@z AS week_key
, @weekday_nr AS weekday_nr
, @day AS day
, CASE WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d, GETDATE()),110) THEN 'Easter Day'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-1, GETDATE()),110) THEN'Easter Monday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-49, GETDATE()),110) THEN 'Whit Sunday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-50, GETDATE()),110) THEN 'Whit Monday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-39, GETDATE()),110) THEN 'Ascension day'
WHEN @day_nr= 1 AND @month_nr = 1 THEN 'New Year-s Day' WHEN @day_nr = 30 AND @month_nr = 4 THEN 'Dutch Queen-s Day'
WHEN @day_nr = 25 AND @month_nr = 12 THEN 'Christmas Day' WHEN @day_nr = 26 AND @month_nr = 12 THEN 'Boxing Day' ELSE '-'
END AS public_holiday
, CASE WHEN @weekday_nr in(6,7) THEN 'Y' ELSE 'N' END AS weekend
, CASE WHEN @public_holiday = 1 OR @weekday_nr in(6,7) THEN 'N' ELSE 'Y' END AS working_day
, @x AS nr_working_days
, @d+5476 AS nr_days

SET @d = @d + 1

END

END


Friday 24 April 2009

Monitor / Audit your Data Warehouse load

One of the most important means of monitoring the run of your Data Warehouse load, is getting insight in the status of running jobs, the duration of the jobs (in history) and which jobs doesn't seem to end. An execution of a simple update script of an admin table before the start of a job and after the end of a job can result in a table like:





































































































































































































































































































































RunIDProcessActionObjectNewUpdatedDeletedLoad_startLoad_end
1EDW - HRMPublish HRM_cubeHRM_Montly.mdc4/15/09 8:09:38 AM4/15/09 8:10:02 AM
1EDW - HRMFTP HRM cubeHRM_Montly.mdc4/15/09 8:09:22 AM4/15/09 8:09:38 AM
1EDW - HRMCreate HRM cubeHRM_Montly.mdc4/15/09 8:09:21 AM4/15/09 8:09:22 AM
1EDW - HRMLoad new facts into aggregate table in Business ModelBSM.FCT_Hour_Monthly11408004/15/09 8:09:21 AM4/15/09 8:09:21 AM
1EDW - HRMDelete existing facts from aggregate table in Business ModelBSM.FCT_Hour_Monthly003444/15/09 8:09:03 AM4/15/09 8:09:21 AM
1EDW - HRMLoad Dimension Supplier into BSM (Refresh)BSM.DIM_Supplier3004/15/09 8:09:02 AM4/15/09 8:09:03 AM
1EDW - HRMLoad Dimension Organisational Unit into BSM (Refresh)BSM.DIM_OrganisationalUnit23004/15/09 8:09:00 AM4/15/09 8:09:02 AM
1EDW - HRMLoad Dimension Employee into BSM (Refresh)BSM.DIM_Employee45632004/15/09 7:59:20 AM4/15/09 8:09:00 AM
1EDW - HRMLoad Date Dimension into Business Model (Refresh)BSM.DIM_Date365004/15/09 7:59:18 AM4/15/09 7:59:20 AM
1EDW - HRMLoad new and changed hour facts into ODSODS.FCT_Hour45632004/15/09 7:59:16 AM4/15/09 7:59:18 AM
1EDW - HRMLoad new and changed targets into ODSODS.FCT_Target12004/15/09 7:59:15 AM4/15/09 7:59:16 AM
1EDW - HRMLoad new and changed suppliers into ODSODS.DIM_Supplier0004/15/09 7:59:08 AM4/15/09 7:59:15 AM
1EDW - HRMLoad new and changed organisational units into ODSODS.DIM_OrganisationalUnit0104/15/09 7:58:58 AM4/15/09 7:59:08 AM
1EDW - HRMLoad new and changed employees into ODSODS.DIM_Employee1204/15/09 7:58:55 AM4/15/09 7:58:58 AM
1EDW - HRMTransformation step: create hierarchy for organisational unitSTG.OrganisationalUnit02304/15/09 7:58:52 AM4/15/09 7:58:53 AM
1EDW - HRMTransformation step: calculate productivitySTG.Hour04563204/15/09 7:58:49 AM4/15/09 7:58:52 AM
1EDW - HRMLookup surrogate key supplierSTG.Supplier0304/15/09 7:58:23 AM4/15/09 7:58:49 AM
1EDW - HRMLookup surrogate key organisational unitSTG.OrganisationalUnit02304/15/09 7:51:56 AM4/15/09 7:58:04 AM
1EDW - HRMLookup surrogate key employeeSTG.employee0245604/15/09 7:51:28 AM4/15/09 7:51:56 AM
1EDW - HRMDetect corrected hoursSTG.Hour03504/15/09 7:51:24 AM4/15/09 7:51:28 AM
1EDW - HRMDetect changed supplier attributesSTG.Hour0004/15/09 7:51:22 AM4/15/09 7:51:24 AM
1EDW - HRMDetect changed organisational unit attributesSTG.OrganisationalUnit0104/15/09 7:51:18 AM4/15/09 7:51:22 AM
1EDW - HRMDetect changed employee attributesSTG.Employee2004/15/09 7:51:15 AM4/15/09 7:51:18 AM
1EDW - HRMExtract Monthly TargetsSTG.Target12004/15/09 7:50:54 AM4/15/09 7:51:15 AM
1EDW - HRMExtract Fact HoursSTG.Hour45632004/15/09 7:50:49 AM4/15/09 7:50:52 AM
1EDW - HRMExtract Dimension SupplierSTG.Supplier3004/15/09 7:50:17 AM4/15/09 7:50:49 AM
1EDW - HRMExtract Dimension Organisational UnitSTG.OrganisationalUnit23004/15/09 7:49:21 AM4/15/09 7:50:17 AM
1EDW - HRMExtract Dimension EmployeeSTG.Employee2456004/15/09 7:49:20 AM4/15/09 7:49:21 AM



Firstly, create an admin table:

-- Create MetaData layer

create schema mda

-- Create Admin table

CREATE TABLE [mda].[admin_run](
[Runid] [int] NULL,
[Process] [varchar](50) ,
[Action] [varchar](50) ,
[Object] [varchar](50) ,
[Inserted] [int] NULL,
[Updated] [int] NULL,
[Deleted] [int] NULL,
[LoadStart] [datetime] NULL,
[LoadEnd] [datetime] NULL
)
Secondly, create stored procedures to update the table:

CREATE PROCEDURE [mda].[PUpdate_Admin_Run_Start]

(
@Runid int
, @Process varchar(255)
, @Action varchar(255)
, @Object varchar(255)
)

AS

BEGIN

insert into mda.admin_run
(
RunID
, Process
, Action
, Object
, Inserted
, Updated
, Deleted
, LoadStart
)
values
(
@RunID
, @Process
, @Action
, @Object
, ''
, ''
, ''
, getdate()
)

END
CREATE PROCEDURE [mda].[PUpdate_Admin_Run_End]
( @RunId int
, @Action varchar(50)
, @Records_New int = 0
, @Records_Updated int = 0
, @Records_Deleted int = 0 )
AS
BEGIN
update mda.admin_run
set LoadEnd = getdate()
,Inserted = @Records_new
,Updated = @Records_updated
,Deleted = @Records_deleted
where RunId = @RunId
and LoadEnd is null
and Action = @Action
END
Below you find an example script which updates the admin table:
begin
declare @Records_inserted int
declare @Records_updated int
declare @Records_deleted int
declare @Runid int
declare @Process varchar(50)
declare @Action varchar(50)
declare @Object varchar(50)
declare @rows int

select @rows = count(*) from mda.admin_run
if @rows = 0 set @Runid = 1 else select @runid = max(runid) from mda.admi_run

set @Process = 'EDW - HRM'
set @Action = 'Extract Dimension Employee'
set @Object = 'STG.Employee'
exec mda.PUpdate_Admin_Run_Start @RunID,@Process,@Action,@Object
--load script
select getdate()

set @Records_Inserted = @@rowcount
exec mda.PUpdate_Admin_Run_End @RunID,@Action,@Records_Inserted, @Records_updated, @Records_deleted
end

Tuesday 7 April 2009

Controlled update in SQL Server

Whenever you experience slow updates or timeouts errors on updating columns you can update it incrementally by using the script below.
The WaitForDelay is added to permit the run of other processes (including a transaction log backup.). The script only works if one of the source columns which are used to update, cannot contain NULL values. If that's not the case you have to add a column (i.e. 'updated'). You should update this with a certain value (i.e. '1'). Use this column to determine @Count ('where updated <> 1') and to filter in the 'where clause' the update query ('and updated <> 1').

SQL 2005 Syntax:

DECLARE @Batch int
DECLARE @Rownum int

DECLARE @PrevRownum int

SET @Batch = 10000
SET @Rownum = 1

WHILE (@Rownum > 0)

BEGIN

UPDATE TOP(@Batch) UPDATE_TABLE
SET
UPDATE_COLUMN1 = S.SOURCE_COLUMN1
, UPDATE_COLUM2 = S.SOURCE_COLUMN2
FROM SOURCE_TABLE c
WHERE UPDATE_TABLE.ID = SOURCE_TABLE.ID
AND UPDATE_COLUMN1 IS NULL


SET @Prevrownum = @Rownum

SELECT @Rownum = COUNT(*) FROM UPDATE_TABLE
WHERE UPDATE_COLUMN1 IS NULL


IF @Prevrownum = @Rownum
set @ROWNUM = 0

WAITFOR DELAY '000:00:05'

END
RETURN



Sunday 29 March 2009

Automated Cognos Report Documentation

If you need a list with all reports (Cognos ReportNet or Cognos8), including the location of the report you can query the content store. It's easy to retrieve a list like:


LocationNameCreatedModifiedTypeContactDescription
GO Data Warehouse/Employee ProfileEmployee Profile2005-09-07 18:06:45.7902007-01-04 20:43:34.617powerplay cubeStewart JamesHuman resources cube showing data for each employee.
GO Data Warehouse/Sales Target by RegionSales Target by Region2005-07-17 20:04:09.8902007-01-04 20:44:12.587powerplay reportStewart JamesReport showing sales target by region using a prompt to provide values for the report.
GO Data Warehouse/Tool TipsTool Tips2005-08-23 19:26:18.1272007-01-04 20:44:30.040reportStewart JamesReport that shows tool tips and baselines in chart.
GO Sales and Retailers/Actual Sales against Target SalesActual Sales against Target Sales2005-05-27 20:46:43.6402006-12-14 15:41:06.260reportMelvin JohnThe report shows a simple list with conditional formatting that drills through to the Sales Representative Contact List report.
GO Sales and Retailers/Actual Sales against Target Sales-burstActual Sales against Target Sales-burst2005-05-27 20:50:06.9102006-12-14 15:41:15.463reportMelvin JohnReport that is set up for bursting to email addresses. The email addresses point to sales representatives in the Sales reps table in the GOSALES database.
GO Sales and Retailers/Banded ReportBanded Report2005-05-27 20:57:01.4802006-12-14 15:41:26.433reportMelvin JohnBanded report that shows Product name Quantity and Revenue with sales opportunities for each Product line category.
GO Sales and Retailers/Business Details Drill ThroughBusiness Details Drill Through2005-05-27 19:12:28.6272006-12-14 15:42:07.150reportMelvin JohnReport that shows product details in charts. This report is a drill-through target report for the GO Business View report. You can also run it as a product-prompted report.
GO Sales and Retailers/Consumer TrendsConsumer Trends2007-01-05 15:35:24.0032007-01-05 16:24:21.317reportMelvin JohnThis complex report shows a list chart bar chart and product images to illustrate revenue by product type.
GO Sales/Conditional DisplayConditional Display2006-05-30 18:42:25.1572007-01-04 20:45:26.793reportDropster TinaThe report uses a prompt page to query a chart and a list report.
GO Sales/Custom LegendCustom Legend2005-05-19 18:57:04.3172007-01-04 20:45:34.840reportDropster TinaThe report shows that the legend can be customized in a similar way to the titles.
GO Sales/Orders ReportOrders Report2005-06-13 12:50:18.6802007-01-04 20:45:52.907report viewDropster TinaThis report shows the default style applied to List Column Title and the List Column Body in a single step.
GO Sales/Product ReportProduct Report2005-05-19 18:56:43.6832007-01-04 20:47:20.690reportDropster TinaThe report displays combination charts with drills through options.
GO Sales/Retailer Report (Multiple Prompt Values)Retailer Report (Multiple Prompt Values)2005-06-16 16:50:21.9372007-01-04 20:56:16.270reportDropster TinaThis list report accepts multiple prompt values.
GO Sales/Returns by Order Method -Prompted ChartReturns by Order Method -Prompted Chart2005-08-05 17:00:31.9072007-01-04 20:56:56.177reportDropster TinaThis report uses a bar chart and a list to show the reasons why products are returned. It uses a prompt to filter on the return description. It is illustrated in the Getting Started tutorial.
GO Sales/Revenue by Sales TerritoryRevenue by Sales Territory2005-06-14 18:19:12.6532007-01-04 21:00:04.703reportDropster TinaThe report shows a chart and a crosstab display with conditional formatting that drills through to the Orders Report report.
GO Sales/Top 5 Sales StaffTop 5 Sales Staff2005-07-22 23:44:55.9072007-01-04 21:00:13.737reportDropster TinaThis list report embeds a bar chart that shows the top five sales representatives by sales targets and revenue.
GO Sales/Top Revenue (Conditional)Top Revenue (Conditional)2005-06-16 19:22:10.1602007-01-04 21:00:27.347reportDropster TinaA list report that conditionally hides the Product Line group footers for the revenue lower than the specified value.



The only hard part is to create the location. You have the implement a user defined function to join the parent cmid (pcmid) to the cmid. A stored procedure fills a table with all the info you need. I wrote a script that works for ReportNet and Cognos8:

First create the UDF and the stored procedure:

CREATE FUNCTION [dbo].[ufn_GetParentPath] ( @pCurrentNodeID INT )
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @vCurrentNodeName VARCHAR(50)
DECLARE @vParentID NVARCHAR(100)
IF @pCurrentNodeID IS NULL OR @pCurrentNodeID = 0
RETURN NULL
SELECT @vCurrentNodeName = [name], @vParentID = [pcmid],@pCurrentNodeID =[cmid]
FROM [dbo].[reports]
WHERE [cmid] = @pCurrentNodeID
RETURN ISNULL([dbo].[ufn_GetParentPath] ( @vParentID ) + '/', '') + @vCurrentNodeName
END


CREATE procedure [dbo].[pLoadReportData]
as
begin
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[reports]') and OBJECTPROPERTY(id, N'IsTable') = 1)
drop table reports
CREATE TABLE [dbo].[reports](
[cmid] [int] NOT NULL,
[localeid] [smallint] NOT NULL,
[mapdlocaleid] [smallint] NULL,
[isdefault] [bit] NOT NULL,
[name] [nvarchar](256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[pcmid] [int] NOT NULL,
[version] [int] NOT NULL,
[created] [datetime] NOT NULL,
[modified] [datetime] NOT NULL,
[disabled] [tinyint] NULL,
[class] [nvarchar](50) NOT NULL,
[dispseq] [int] NOT NULL,
[contactemail] [varchar] (128) NULL,
[contact] [varchar] (128) NULL,
[description] [nvarchar] (3772) NULL,
[owner] [int] NULL,
[path] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[spec] XML NULL
)


create index ix_cmid on reports (cmid)


insert into reports
(
cmid
, localeid
, mapdlocaleid
, isdefault
, name
, pcmid
, version
, created
, modified
, disabled
, class
, dispseq
, owner
, path
)
select
n.cmid
, n.localeid
, n.mapdlocaleid
, n.isdefault
, n.name
, o.pcmid
, o.version
, o.created
, o.modified
, o.disabled
, c.name
, o.dispseq
, o.owner
, '' path
from cmobjnames n join cmobjects o
on n.cmid = o.cmid and o.classid in (0,1,10,18,19,33,81,82)
join cmclasses c on o.classid = c.classid and n.isdefault = 1

update reports
set contact = c.contact
, contactemail = c.contactemail
from cmobjprops10 c
join reports on c.cmid = reports.cmid

update reports
set description = d.value
from cmobjprops3 d
join reports on d.cmid = reports.cmid
and d.isdefault = 1

update reports
set path = x.path
from
(SELECT [dbo].[ufn_GetParentPath] ( [cmid] ) AS [Path], [name],[cmid]
FROM [dbo].[reports]) x
where reports.cmid = x.cmid

end

Run (and schedule) the stored procedure.

Create a view to exclude the folder, package, root and content items:


create view [dbo].[v_reports]
as
select
path
, name
, created
, modified
, class
, description
, contact
, contactemail
from reports
where class in ('report','reportview','powerplaycube','powerplay report')

Query

'select * from v_reports order by path'

will do the final part of the job.

I implemented this for a customer with >37000 (!) reports, report views, cubes and cube reports. In this case, the script runs in about 10-15 seconds.

Thursday 19 March 2009

Date range filter in Cognos8 ReportStudio based on dynamic date/fixed time.

Today I needed to report data in Cognos8.2 Report Studio with a date range filter, based on the current day and yesterday (dynamic) and a designated (fixed) time.

Values had to be extracted with a DateTime between yesterday 9 AM and today 9 AM. As we use an Oracle database it should be easy like:

[DateTime] >= TO_TIMESTAMP (to_char({sysdate}-1) || ' 09:00:00.000000', 'DD-Mon-RR HH24:MI:SS.FF')
AND
[DateTime] < TO_TIMESTAMP (to_char({sysdate}) || ' 09:00:00.000000', 'DD-Mon-RR HH24:MI:SS.FF')

However, on this particular reporting environment, we are not permitted to use native functions. Hence, I wrote the Cognos8 syntax (in the Detail Filter expression):


[DateTime]

-

_make_timestamp
(
extract(year,(_add_days(current_date,-1) ) )
, extract(month, (_add_days(current_date,-1) ) )
, extract(day, (_add_days(current_date,-1) ) )

)

>=9 hour

AND

[DateTime]

-

_make_timestamp
(
extract(year,current_date )
, extract(month,current_date )
, extract(day,current_date )

)

<9 hour

Tuesday 10 March 2009

The common process of a BI - request

This post is the third article with title

'Requirements and procedures for the successful deployment and maintenance of a Data Warehouse and Business Intelligence environment'
The articles describe the requirements for developing, deploying and maintaining a Data Warehouse environment including Business Intelligence solutions within the contexts:
Change/ Problem Management (the process of managing changes and problems, in other words, how does a customer and/or a user submit a change/problem request, how is it prioritized, tracked and implemented?).
Configuration management (the versioning, labeling and tracking of code and content components during the development life cycle)
Release Management (it designates and schedules software and hardware release components to migrate it to the live environment as a set).


We distinguish different types of Business Intelligence requests:

Incidents (IN)
General Questions (GQ)
Problem requests (PR)
(Ad hoc) reports (AQ)
Change requests (CR)
Data Manipulations (DM)
Emergency requests (ER)

The requests have origin in the user. They, initially, always refer to the user layer (reports \ dashboard\ reporting tool) and not to the architecture that creates the product for the customer. A request can also arise in the Data Warehouse team but it can always be allocated to one of the above request types and must always be treated as if they come from a user.

If the problem/change management tool is not capable of storing sources, attachments of requesters must be filed in a content management store so developers can access it.

Non CR’s are raised as an issue/ticket in a problem/change management tool with a certain priority. Change requests can be raised as a project or as an issue (depending on the size of effort). Incidents, problem requests, ad hoc queries and general questions can be raised as a single ticket or can be assigned to a project (related to a CR). SLA stuff. High tickets not only need to be entered in the tool, but has to be communicated directly to the assignee to guarantee that it will be picked up immediately.

User support must be equipped with an intake form to register the call as complete as possible to avoid misunderstandings. User support must be trained to distinguish the different types of requests.

INCIDENTS

Incidents refer to an offline or error mode of the user layer. These are technical questions, not related to data.

Offline mode
In this case the product is not present:
The (web) portal is not accesible
The report is not refreshed
The report is not visible (on the web or on the network)

Error mode
In this case the product crashes:
The user can not log in (security problem)
A canned (olap) report can not be opened
A non-canned report won’t run or is killed
An error message appears, but the result seems to be fine.
An error appears when a power user creates a new report.

User Support reproduces the incident (with or without terminal services) and creates a solution if a user performed incorrect actions (or in case the user forget his authorization code). User support must possess a user configuration and power user configuration to be able of reproducing the incident. If the user acted correctly an high priority ticket must be raised. In case of error 4 a low priority and in case of error 5 a medium priority is sufficient. If the problem concerns all users (e.g. offline mode of the webportal) communication must take place to the users. In other cases user support should report the incident to the ower of the report (key user). In case of error type 5: In this case User Support is the problem owner if it concerns an error related to the tool itself, its configuration, or if it’s a database connection problem. Development issues of the power users should only be managed if the meta-data repository (i.e. Cognos Catalog, Business Objects Univers) is maintained by the data warehouse team and the user cannot create new definitions. If not, the reports can not be supported and maintained.

All users (and members of the data warehouse team) are allowed to report an incident.

incident

In most cases the tool administrator is concerned with it. In case of a non-refresh of a cube/report it’s possible that other processes than the front-end tools are evolved (network problems, database problems). In that case the tool administrator needs the help of the system administrator/database administrator. To maintain the knowledge base the problem and solution must be documented.

GENERAL QUESTIONS

General questions concern:

- (power) user training
- request for authorization
- end-user-tool functionality explanation
- report functionality explanation
- definition explanation (i.e. ‘How is revenue calculated’, ‘What’s a work breakdown structure’?)
User Support should answer all these questions. User Support should have documentation to deal with this kind of requests. If it concerns definition explanation, they can refer to the key user who should be equipped with the data warehouse definition handbook. If such a handbook does not exist or the definition is not entered in the book User Support can raise a ticket and assign it as low priority ticket to administration. Training is planned by the team lead of User Support (or in case of no competence, the team lead of administration).

Only Key users may raise general questions.

general question

To maintain the Frequently Asked Questions (FAQ) the question and answer must be documented.

PROBLEM REQUESTS

A problem request arises when – according to the user - a report does not correspond with the original specifications. Other data is reported then expected. If the data warehouse team is organized in the way that sources in production are maintained by administration a ticket will be assigned to the administration manager. If development is also working on production sources the ticket will be assigned to the development manager or directly to a certain developer. If the data warehouse team is implemented with substantial functional en technical documents with configuration management and release management procedures it’s possible to hand over development knowledge to an administration team. If not, production development should stay at the original developers (although, in a practical situation developers leave, roles don’t). The administration team can also report a problem request (i.e. reconciliation outcomes). The owner or the report (key user) needs to be reported.

problem request1

To equip the assignee with all information and for maintain purposes the problem information (i.e. attachments of error messages) must be saved in a content management system (and if possible in the problem/change management tool).

In case of a problem report or a change request the developer

- contacts customer to make an inventory of the specifications;
- writes a functional design;
- makes a planning and gives feedback to requester;
- delivers functional design to requester;
- chases requester to approve the functional design (redesign can be result);
- adapts the design in case of disapproval;
- executes feasibility study (if not attainable requester should get well founded reply).
- writes the technical design in case of approval;
- a colleague judges the technical design (redesign can be a result)
- checks out all necessary sources.
- adds the new sources to the source id document;
- develops on the client (with connection to the development database\dwh)
- creates test specifications
- checks in all sources;
- writes the deliver-to-administration document so administration can publish all sources;
- promotes the sources to test landscape so testers can perform technical tests.
- writes an migration request and acceptance test criterions.
- request administration to promote (after approval) the source to acceptance and asks test management for an acceptance test request
- creates/maintains technical documentation;
- hands over input to User Support create/maintain user documentation/training material;
- request administration to promote (after approval) the source(s) to production and communicate to requester that problem is solved.

problem request 2

AD HOC REPORTS

Sometimes users wish to have once-only information, which is not provided by the existing reports. We distinguish two kinds of ad hoc reports:

User needs real-time, detailed information.
User needs detailed/aggregated information (currency equals frequency of the data warehouse refresh or less)

Ad 1. In this case a ticket must be assigned to a (sql-) developer in the development team. If a report (which meets the need of the requester) already exists in the application of the source, User Support must refer to this. The (sql)- developer must be able to connect to the original source because a data warehouse cannot provide real-time information. If User Support / development finds out that the query is similar to an existing request (i.e. with an another filter) User Support can advice the user to let it published as a structural report. Then a change request has to be raised.
Generally, these kinds of ad hoc reports have a high priority character because it’s a real-time need.
The developer saves the query, the result of the query and the metadata so User Support has access to it and can reuse it.

Ad 2. If the report already exists User Support must refer to this. If not, User Support can analyse whether the report can be deployed on an existing OLAP-cube. If so, User Support creates a ticket and assigns it to themselves. User Support must be aware that it should be a task of the user (as much as possible). If the report cannot be deployed on an existing OLAP cube and User Support assigns a ticket to a data access developer. User must confirm that the request is a once-only request. Otherwise, a change request has to be raised.

ad hoc report 1

In case of assignment to development a data access developer builds a query on the source system (in case of a real time need) or on the DWH (not real time).

The developer

- contacts customer to make an inventory of the specifications;
- adds the new sources to the source id document;
- develops the query on development;
- asks requester to evaluate the draft;
- runs the query on production;
- saves result and query in content management system;
- maintains definition repository.

ad hoc report 2

A migration to production process through change management won’t be used because of the high priority character of an ad hoc report request. In fact, nothing will be published but the result will be send to the requester (i.e. in xls format). If a report on a cube is published, the cube model itself has not changed. Users themselves can create reports on cubes and publish them without using the acceptance environment.

CHANGE REQUESTS

Change requests cannot be raised by all users. This is only admitted to the owner of a (set of) reports (a key user) or the lead of the User Group if the request concerns more key users.
A change request is a request for new information that is not provided by the existing report/cubes. If a change requests has a certain cost (i.e. above € 10.000) it will be classed and managed as a project. If less it can be raised as an individual ticket, assigned to development.

change request 1

In case of a problem report or a change request the developer

- contacts customer to make an inventory of the specifications;
- writes a functional design;
- makes a planning and gives feedback to requester;
- delivers functional design to requester;
- chases requester to approve the functional design (redesign can be result);
- adapts the design in case of disapproval;
- writes the technical design in case of approval;
- a colleague judges the technical design (redesign can be a result)
- checks out all necessary sources.
- adds the new sources to the source id document;
- develops on the client (with connection to the development database\dwh)
- creates test specifications
- checks in all sources;
- writes the deliver-to-administration document so administration can publish all sources;
- promotes the sources to test landscape so testers can perform technical tests.
- writes an migration request and acceptance test criterions.
- request administration to promote (after approval) the source to acceptance and asks test management for an acceptance test request
- creates/maintains technical documentation;
- hands over input to User Support create/maintain user documentation/training material;
- request administration to promote (after approval) the source(s) to production and communicate to requester that problem is solved.

change request 2

DATA MANIPULATIONS

A Data Manipulation is a change of data in the Data Warehouse. Data Manipulations request cannot be raised by all users. This is only admitted to the owner of a (set of) reports (a key user) or the lead of the User Group if the request concerns more key users. In most cases definitions are dynamical. In some cases a definition is determined by hard-coded data. For instance, the rule to determine whether a cost object is billable or not, cannot be automated. Regularly the cost objects must be updated in a specific table. Another example is the definition of the bridging day: there is no formula to calculate this, every year the policy can change. If possibly, the requester must have the means to do it himself (i.e. ASP).
Changes must be maintained in the Data Warehouse Definition Handbook.

Data Manipulation

EMERGENCY REQUESTS

In case of emergency it can be needed to publish a change as soon as possible in the production environment. In this case the data warehouse team manager has to approve. This kind of requests has origin in the development team. It’s an accelerated migration to production procedure. If necessary, technical documentation can be written afterwards. To check whether everything went ok a (short) evaluation with the DWH manager is needed.

emergency request

emergency request 2

Furthermore an emergency request can arise if a user wants the result of an ad hoc report immediately. The team lead for data access has to approve.

emergency request 3

Required landscape in a DWH environment

This post is the second article with title

'Requirements and procedures for the successful deployment and maintenance of a Data Warehouse and Business Intelligence environment'
The articles describe the requirements for developing, deploying and maintaining a Data Warehouse environment including Business Intelligence solutions within the contexts:
Change/ Problem Management (the process of managing changes and problems, in other words, how does a customer and/or a user submit a change/problem request, how is it prioritized, tracked and implemented?).
Configuration management (the versioning, labeling and tracking of code and content components during the development life cycle)
Release Management (it designates and schedules software and hardware release components to migrate it to the live environment as a set).



One of the most important aspects of building Business Intelligence solutions is the requirement to build and support multiple environments, which allow developers and testers to:

• Make changes to the (etl)-scripts/reports
• Add new subject areas
• Add or update dimensions
• Verify and test the accuracy of the data
• Experiment with joining/conforming subject areas

without effecting each other and/ or the production reports/data.

At least four environments are required:

• A development environment where data analysts, ETL specialists, and report writers could add new or enhance existing components of the warehouse and test their changes.
This can be used a system test environment. In needed a separate system test (T) environment can be set up.
• An acceptance test environment which would be used to perform the final verification of data and reports prior to deploying into production.
• A production mirror environment. This environment mirrors production reports, web pages and data. This is used as fallback.
• Production environment

To maintain stability, development has only access to the development/test environment and administration (and no one else) to acceptance and production. In case of an emergency request or an ad hoc report this can differ.

Depending on the change frequency, the refresh of the acceptance environment will have the same frequency as the production environment or will have a refresh every time a change is published.

For training purposes a separate environment (R) can be set up. Usually, acceptance can not be used for this purposed (unless the performance is ok).

Below is a graphical presentation of an advisable environment (fallback machines are not presented, every server is intended to have fallback-mechanisms).
Depending on the intensity of use a less server configuration is sufficient. A single server configuration can only be advised for a development environment.

landscape

About tooling: To minimize administration problems a single-product-configuration is advised. Nowadays Business Intelligence products cover the whole range of Data Warehouse requirements. Cognos has its own ETL-tool, Informatica delivers high-end reporting products, Oracle and Microsoft are doing well in Business Intelligence and SAP BI brings together a powerful business intelligence platform and data warehousing capabilities.

Required roles and responsibilities in a DWH environment

This post is the first article within title

'Requirements and procedures for the successful deployment and maintenance of a Data Warehouse and Business Intelligence environment'
The articles describe the requirements for developing, deploying and maintaining a Data Warehouse environment including Business Intelligence solutions within the contexts:
Change/ Problem Management (the process of managing changes and problems, in other words, how does a customer and/or a user submit a change/problem request, how is it prioritized, tracked and implemented?).
Configuration management (the versioning, labeling and tracking of code and content components during the development life cycle)
Release Management (it designates and schedules software and hardware release components to migrate it to the live environment as a set).


The following users and roles are in general related to Business Intelligence activities (some roles may be combined, one person may fill more than one role or a role may be filled by more than one person).

BUSINESS

Key User
Key-Users are the owners of a (subset of) reports. More important, they can be the owners of the definitions of specific performance indicators. They have privileges to report change requests and problem requests with respect to their own reports and cubes. They are not allowed to request changes of conformed dimensions and measures (dimensions and measures which are used by more key-users).

User Group
Members of the User Group are key users who share conformed dimensions and measures. They authorize the head of the User Group to request for changes. Usually the User Group is in conference to discuss changes (and practical value of the management information produced by the Data Warehouse team). A representative of the Data Warehouse Team can participate.

User
Users do not have the privilege to request for changes. They may report problem requests.
Users can have different roles:

Recipient
Members can only retrieve content (for example by email)

Consumer
Members can read (and execute) (public) content such as Powerplay/Impromptu reports.

Query User
Members can execute queries on the Data Warehouse (with TOAD, Cognos Impromptu, SQL+)

Power User
Members have the same access permissions as Query Users. They can use Impromptu or PowerPlay to create and publish reports for public or personal use.

Control Advisor
A control advisor judges whether it’s allowed to deliver information/data (i.e. privacy aspects). The data privacy must be kept in mind. Very often there are legal restrictions around the correct use of (personal) data. The head of the User Group can perform this role and delegate it to i.e. HRM (e.g. privacy-data) or Sales (e.g. customer-data).

Business Analyst
The role of the business analyst is to perform research and possess knowledge of existing business applications and processes to assist in identification of potential data sources, business rules being applied to data as it is captured by and moved through the transaction processing applications, etc. Whenever possible, this role should be filled by someone who has extensive prior experience with a broad range of the organization’s business applications. Another alternative is using business analysts from the various application areas as sources when those areas are added to the data warehouse. These analysts may only be associated with the DW team for relatively short periods when the applications they are responsible for are addressed in the data warehouse project. In many cases, the business analyst will work with end users and management to develop business process models that may be used in designing the data warehouse. A business analyst can be a key user.
Business Advisor
The Business Advisor is a business analyst who exceeds the separate processes. He owns shared, conformed dimensions and measures.

SOURCE SYSTEM

Functional Application Administration (FAA)

Members of Functional Application Administration support key-users. They are responsible for migration of changes of the source system (not the Data Warehouse environment). Functional administrators have functional knowledge of the source system. They are capable of assist the Data Warehouse team. They can map business questions to database tables/columns.

Technical Application Administration (TAA)

Technical administrators implement changes and solve problems. They can assist the Data Warehouse Team in creating sql-scripts. They can map the more complex business questions.

Database administrators (DBA)
A database administrator maintains the database of the source system, with the schema’s, tables and so on. It’s important that the Data Warehouse Team cooperates with DBA. Database maintenance is critical for the organization. A breakdown of failure in database traffic can have huge consequences.

DATA WAREHOUSE TEAM

Data Warehouse Manager
The data warehouse manager has overall responsibility. The manager defines, plans, schedules, and controls. The DWH-plan must include tasks, deliverables and resources – the people who will perform the tasks. The manager will monitor and coordinate the activities of the team, and will review their deliverables. If contractors and consultants are used, the project manager assigns the tasks, monitors activities and deliverables and assures that knowledge transfer is indeed taking place.
The manager will estimate the cost of the project and monitor conformance to the cost estimates. The manager will also project the benefits, measure the effectiveness of the data warehouse and report on the benefits and costs.
The most important task for the manager is recruiting the right people; people with the requisite skills and people who work well with the users and each other. Getting the right people will require an understanding from management on the importance of the project and their cooperation in the recruiting process. The problem is that the good people are already taken and a smart manager outside of the data warehouse area is loath to give up an outstanding performer. Hiring people from the outside has its own risks and time delays. An outside hire will need some time to become familiar with the organization, to learn how things are done, to understand the minefields, and to learn about the source data.

User Support
User support is the first line of defense when the user has problems. These people must be customer/client-oriented and know that the calls from the users are not just annoyances but their reason for being. They must be knowledgeable about the tools, must understand the database structure, must know the data and must be familiar with frequently executed queries and reports. They must know the users’ concerns and problems and know the frequently asked questions and the answers to them. They must have a profile of the users, know the power users and the more casual users. They must be patient and responsive. If time elapses between hearing about and resolving the problem, the users must be informed on how the resolution is progressing. Just as in operational systems, an incident tracking system should be in place.
User support will create and maintain frequently asked questions (FAQs) that could be accessible through the Internet. User support should make extensive use of meta data to train, inform and answer user questions. A large percentage of questions that come to help desks involve problems of the users not understanding the meaning of the data, the timeliness of the data or the source of the data. Most of these questions can be answered with meta data.

Intake assistant
Raised requests are entered in a problem/change management system. The intake assistant judges the correctness of the reported type of request, assigns the issue to the relevant person and gives the requester feedback about the planning. If possible, the assistant creates a solution himself.
The duty includes the setup of user IDs and system access roles for each person or group, which is given access to the data warehouse or data mart.

Report builder
Sometimes User Support creates some of the canned queries and reports. It should be a task of the (power) user but organizations can experience a passive attitude in this in spite of transfer of knowledge. This is not unusual; reporting is not the core activity of users.
User Support shouldn’t run queries with a structural character. In practice, it can happen that a change request to publish the query as a standard report has not finished. In the meantime User Support can run the query and distribute the result.

Job Monitor
Some of the problems that are seen by user support relate to the update/load (refresh) process, such as whether jobs are completed on time and whether they ran successfully. These processes must be monitored, and any variance in availability or timeliness of the data should be communicated to the users. Other problems relate to performance. User support must have a feel for performance, monitor performance and report performance degradation to database administration.

Trainer
The responsibilities can include training end users to use data query/analysis/ reporting tools, assisting end users in development of queries and reports, assisting in development of production queries/reports and assisting with evaluation and selection of end-user data access tools. This role is more likely to be placed within the IT data warehouse team. The trainer(s) can then work with end users in many departments which might be utilizing the query/reporting/ analysis tools selected for use by the company’s personnel.

Data Access Analyst
This role performs a variety of tasks, which help provide end users with the ability to access and utilize the data stored in the data warehouse or its dependent data marts. This includes evaluating and selecting business intelligence, OLAP and other query/reporting tools, identifying and prioritizing data access projects, analyzing data access requests to determine if and how the request can be met, educating end users on data warehouse capabilities and data availability, establishing data access standards, etc. This role can sometimes be filled by a power user, but most often will be filled by a member of User Support familiar with the application data who has been involved with the data warehouse design and definition process.

DATA ACQUISITION

Data Modeler
The person(s) in this role prepares data models for the source systems based on information provided by the business and/or data analysts. Additionally, the data modeler may assist with the development of the data model(s) for the DWH or a data mart guided by the data warehouse architect.

Data Warehouse Adviser
This role will require some level of familiarity with the source systems, the extract processes placed within the transaction processing systems, the content and design of the data warehouse and the load processes and business rules used for data integration and summarization, and the use of the data warehouse by end users or data mart build processes which may have been developed in conjunction with the data warehouse. This person or persons may also be required to have a fundamental knowledge of the data access tools and work with end users to design and develop data access functions for use within the departments or by the company as a whole. This role if fulfilled if a senior consultant Business Intelligence / Data Warehousing is needed for advice.

Data Warehouse Architect
These job responsibilities encompass definition of overall data warehouse architectures and standards, definition of data models for the data warehouse and all data marts, evaluation and selection of infrastructure components including hardware, DBMS, networking facilities, ETL (extract, transform and load) software, performing applications design and related tasks.

Data Warehouse Developer
The person(s) in this role develops various functional components of the data warehouse applications including extract programs on source systems, ETL applications, data cleansing functions, system management functions including load automation, data acquisition functions and others. The transformation, data cleansing and load functions will usually be developed by team members who are trained in the use of specific ETL and data cleansing tools.
The back-end ETL application developer is responsible for the acquisition process that constitutes the major effort for any data warehouse – the extract/transform/load process. It has been estimated that these tasks are 60 to 80 percent of the total time and effort to implement a data warehouse. Extract/transform/load will be performed either with an ETL tool, with conventional programming or with a combination of both. The back-end ETL application developers will be responsible for the process regardless of which approach is used. Do not assume a tool will eliminate the work or effort involved with this process. The required analysis is still time-consuming and describing the complicated transformation logic to the ETL tool can be difficult. The back-end ETL application developers are responsible for tie-outs, which are the controls of numbers of records extracted matched to the number of records loaded, controls on totals, on errors and on the types of errors. Data cleansing, whether it is done with a cleansing tool, with an ETL product or with conventional coding, is also the responsibility of the back-end ETL application developers. Neither the data warehouse nor the back-end ETL application developers are responsible for fixing the operational systems that feed the data warehouse.
The ETL process is always time- and resource-consuming, even with a data warehouse of moderate size. Application development must be aware of the performance implications of their architecture, design and coding.

DATA ACCESS

The second part of the DW project team is a group whose purpose is to get data, or rather, information out of the data warehouse or a data mart.

Data Access Developer
The person in this position develops data access queries/programs using BI tools such as Cognos Impromptu, PowerPlay and Cognos Query to run in production on a regular basis and assists users with development of complex ad hoc queries and analyses. This role may be a power user, trained in the use of the query/reporting/analysis tool or a member of User Support. The front-end application developers play a critical role in delivering the data from the data warehouse. The front-end application developers are usually heavily involved with the users. One of the selling points of the data warehouse is to empower users to write their own queries. If you do not use Data Access Developers you have to write incredibly complex queries – often to the detriment of performance. In case of a clean and correct Data Warehouse with documented logical and technical metadata a Power User can be able to write their own queries and publish reports. However, there are many users who are not capable of writing all the queries they need. In these cases, the front-end application developers will be responsible to write the queries for them. Developers must be aware of the importance of writing functional and technical documents.

ADMINISTRATION AND MAINTENANCE

Data Warehouse Infrastructure Manager
If a project is large enough to require dedicated resources for system administration and database administrators (DBAs), it is possible you will want a person who will provide leadership and direction for these efforts. This would be someone who is familiar with the hardware and software likely to be used, experienced in administration of these areas and who can direct tuning and optimization efforts as warehouse development and use moves forward in the organization. Including the infrastructure team within the large data warehousing group helps ensure that the needed resources are available as needed to ensure that the project stays on track and within budgeT.

System Administrator
This position is responsible for maintaining hardware reliability, system level security, system level performance monitoring and tuning, and automation of production activities including extract and load functions, repetitively produced queries/reports, etc. In many cases, the system administrator is responsible for ensuring that appropriate disaster recovery functions such as system level backups are performed correctly and on an accepted schedule. This person is also responsible for all data acquisition and –access software updates and migrations. Some of the problems relate to the update/load (refresh) process, such as whether it completed on time and whether it ran successfully. These processes must be monitored, and any variance in availability or timeliness of the data should be communicated to the User Group.
The architecture consists of Development/Test, Acceptance and Production.

Database Administrator
The person in this role is involved in database design, especially the physical design used to implement the data warehouse or data marts. The DBA monitors and tunes DBMS performance, administers end-user access and security at the DBMS level, ensures that DW data is appropriately protected via database backup and related strategies, assists with load process automation, and evaluates and selects infrastructure components. The DBA should be proficient in the technology of both the DBMS and the operating system chosen for the data warehouse or data mart. Other problems relate to performance. The Database Administrator has a feel for performance, monitor performance, report performance degradation to database administration, spot poor query techniques that could cause bad performance and help the users write more efficient queries and reports.

Data Administrator
The primary roles of data administration are key to a successful data warehouse.
This role is responsible for identifying, documenting and administering the corporate data asset. This includes working with end users and IT staff to gain consensus on standard definition of common data attributes, developing and maintaining the corporate data model, identifying and documenting data sources and maintaining the meta data repository. The Data Administrator constantly monitors the accuracy and completeness of the metadata.
The data administrator may not be a reporting member of the data warehousing organization but is an integral part of the data identification and analysis needed to build a data warehouse or mart.
Data administrators model the business data according to the business rules and policies governing the data relationships. They also document and maintain the mapping between the logical data model and the physical database design and between the source data and the data warehouse target data. Data administrators understand the source files and know which are the appropriate source files for extraction. Data administration establishes and maintains the naming standards. They communicate the availability of data and, when appropriate, suggest that departments share their data. Data administration is responsible for administering the tool used in data modeling and administering the tool libraries although in some organizations, this is the responsibility of the application development team.
Data administration is responsible for administering the meta data repository. Meta data will come from a number of sources: the modeling tools, the extract/transform/load tool, the query tools, from reengineering the source file data definitions and direct entry into a meta data repository tool. The meta data repository could potentially be updated as well as accessed from the each of these source systems. Tight controls are necessary to minimize
corruption of the meta data. Data administration is responsible for controlling entry, updates and deletes.
A Data Administrator administers the libraries that store scripts, queries and reports. Administration includes developing and implementing the standards for admission to the library and for maintaining the scripts. Version Control software must be used for the sources and documents to prevent the existence of different sources with the same name.

Test Manager
Changes like new queries/reports, DWH-build procedures and ETL-implementations must be well documented and must be thoroughly tested. A functional acceptance-test will be done by the Business Analyst (coordinated by the Test Manager). Team members with (PL)/SQL-knowledge do the technical tests.
Data quality is everyone’s job but it is the primary focus of the Test Manager. Although information about data quality problems often originate in IT, it most often comes from the users. Reconciliation must be a regular part of the job. The responsibilities includes monitoring and reporting on data quality, and searching out causes of incompatibility between the various applications which collect and utilize the company’s data. The Test Manager is responsible for finding and reporting problems relating to data quality, for tracking these problems and assuring that the resolution is assigned to a responsible party. Some of the discovered problems must be reported to data administration where the data exceptions can be properly incorporated into the logical data model. The Test Manager can be involved in writing the programming specifications for the transformation logic that needs to occur during the ETL process.
Not all problems can be resolved nor should they be. It might cost more to fix the problem than it’s worth. The Test Manager along with other interested parties, including the business unit, should determine the criticality of the problem. Analytical requirements could determine the quality requirements. The cost to fix the problem should be estimated and the priority would then be assigned.
The Test Manager analyst should be proactive in trying to find problems before they surface. Data quality analysis tools or simple queries could identify many of the problems.
The most important role for the Test Manager will be evaluating and improving the processes in the data warehouse, specifically the ETL process. By improving the processes, the quality of data is likely to improve. The Test Manager might also seek out cleaner sources of data to replace those that have proven troublesome.

Publish Administrator
The publish administrator publishes all developed sources to the acceptance and –production platform. This regards all DWH-related sources like DWH-build scripts, ETL-scripts and reporting sources. The Publish Administrator has the responsibility to establish an environment where result sets can be distributed (on the Web), where reports can be made available (on the Web) and where queries can be launched (from the Web).
This is not been done by the developers themselves because the changes from development to acceptance and production must be judged on test results and functional/technical documentation. This prevents production misbehavior and knowledge gaps. If a source must be rebuild, documentation has to be sufficient to make that done. Besides that, the data administrator must be guaranteed that definitions have not changed and no new definitions are created without approval of the business. The Data Warehouse
Infrastructure Manager monitors the chosen solution: is everything compatible with the Data Warehouse principles (for example minor transformations in the front-end).

Tool Administrator
The tool administrator has responsibility for dealing with the tool vendor, assuring excellent support and getting answers to questions. The tool administrator deals with problems associated with the tool including queries that produce incorrect results and queries that perform badly. The tool administrator makes sure the tool environment is properly protected. He has responsibility for new releases and migrations.
In most cases this responsibility is allocated to the involved developers/administrators (Query and OLAP tools assigned to the Data Access Developer; ETL tools to the Data Warehouse Developer and so on).

Administration versus Development
If the data warehouse team is implemented with substantial functional en technical documents with configuration management and release management procedures it’s possible to hand over development knowledge to an administration team. In that case it’s possible that administration is concerned with problem requests and development with change requests. Then, data acquisition and data access roles must also be introduced to Administration. If not, production development, including problem requests, should stay at the original developers.

CHANGE MANAGEMENT
Representatives in a ‘change meeting’ will judge the prerequisites for a migration to acceptance and production. Administration, Development and the head of User Group take place. Changes will be judged on the presence and completeness of the required documentation and the correctness of the followed procedures.
Below you find a graphical presentation of the different roles. Again, some roles may be combined, one person may fill more than one role or more than one person may fill a role.

landscape2