Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Tuesday, 30 November 2010

Ad hoc Mini Mart

If you need to improve performance for complex reports, think about implementing a minimart on top of your datamart.

On of my customers was specifically interested in analysing linehauls with a certain origin and a certain destination.

I created a stored procedure that can be kicked off via a cognos report.

CREATE procedure [dbo].[pLoadVolumeAnalysisMiniMart]
@origin varchar(6)
, @destination varchar(6)
as
declare @loaddate datetime
set @loaddate = getdate()


truncate table VolumeAnalysis MiniMart

insert into VolumeAnalysisMiniMart
select
,Sector
,Sectortype
,Week
,Weight
,Pieces
,Sender
,Receiver
,Origin
,Destination
,Service
,Departuretime
,Arrivaltime
,@loaddate
,Runid
,Rundate
from VolumeAnalysisDataMart
where Origin = @origin
and Destination = @destination

select 'Mini Mart has been loaded' as Result

end

This procedure loads a table with a specific origin and a destination (in casu the number of records decreased from 21 million to several thousands). The dataload takes about 1-2 seconds.

In framework manager I created a star schema with the mini mart as a fact table and aliases of the (original) dimension tables.

I also added the stored procedure as query object:


call pLoadReportPromptSector;1(
#prompt('origin')#,
#prompt('destination')#
);



In Report Studio I created a query with data item 'Result', prompts 'Origin' and 'Destination'.
As soon as the parameters are filled out, the procedure will be kicked off and the data will be loaded.

I also created hyperlink buttons on the report that links the user to several reports which retrieve the data from the minimart.

If you do not have schema's for each user, you can create separate tables or add column 'user' with #account. You can pass this variable to the procedure, so every user has it's own set of data.

See also:

http://www.sqlserverclub.com/pdfs/minimarts-organizational-segmentation-in-datawarehousing.pdf :

Many decisions makers not only need to analyze data from within the business unit,
but also across business units or the many segments within a business unit. Think of
it as market segmentation – cubes for more a selective audience.
Your customers can be grouped by segments of business units or smaller functional
groups with unique needs for reporting. Most segments will contain unique, pertinent
subsets of data along with business unit-wide data for decision making. Therefore,
cubes can be made from subsets of cubes in what I would call “minimarts.” These
are datasets that can be made “precise” for more targeted queries thereby
increasing performance, especially across a network, intranet, extranet (or even the
general public internet).
The pros are pretty clear by targeting your audience. Performance will be greater
than querying vast cubes of data that can go unused for most members of your
customer base.

Tuesday, 16 November 2010

How to calculate running totals \ subtotals (like YTD)

If you need to calculate running totals \ subtotals (i.e. year-to-date figures) you can use
a lot of methods (like CTE's). In SQL Server 2008 you can use the GROUP BY ROLLUP function.

However, if you are on 2005 the most efficient way to do this:



is running a query like:

declare
@value float
, @prevorganisationalunitid int
, @prevperiodyear int

update FACT_MEASURE
set @value = valueYTD =
case
when
Year = @PrevPeriodYear
or OrganisationalUnitid = @prevorganisationalunitid
then @value+isnull( value ,0)
else value
end
, @PrevPeriodYear = Year
, @prevOrganisationalUnitid = OrganisationalUnitid


This script only works if the sorting is correct
(order by measure,organisationalunitid,periodid).
To ensure this,
create a primary key on periodid, organisationalunitid and measure.

Wednesday, 24 February 2010

Scheduled Cognos Reports

If you are not able to access Cognos Auditing reports
and you need to know which reports are scheduled, you
can query the Content Store. This script includes
the source of report views and the name of the package.

select
r.path+r.name report_or_report_view
, r.created
, r.modified
, r.disabled
, r.class
, r.description
, p.startdate
, p.enddate
, case when r.class = 'reportview' then s.path+s.name else 'N/A' end as source_report_view
, case when r.class = 'report' then s.name else t.name end as package
from reports r
join cmobjects j on r.cmid = j.pcmid
join cmobjprops2 p on p.cmid = j.cmid
join cmrefnoord1 n on n.cmid = r.cmid
join reports s on s.cmid = n.refcmid
left join cmrefnoord1 q on q.cmid = s.cmid
left join reports t on t.cmid = q.refcmid
where p.active=1
and (p.endtype = 0 or p.enddate >getdate())

You can fill the 'reports' table by executing the procedure
as described in my earlier post in

Content Store Reporting

Monday, 9 November 2009

Automatically drop indexes

I'm developing several star schema's for several data marts / data warehouses.
Over and over again, I have to key-in all the drop index scripts for the fact tables
(drop indexes --> load data --> create indexes).

I created a script to automate this (kick it off by
EXECUTE pDropIndexes '[TableName]':


CREATE PROCEDURE pdropindexes @tablename varchar(255)
AS
BEGIN
DECLARE @indexName NVARCHAR(128)
, @dropIndexSql NVARCHAR(4000)
DECLARE tableIndexes CURSOR FOR
SELECT name
FROM sys.indexes
WHERE object_id = OBJECT_ID(@tablename)
AND type =2 -- do not delete primary key
OPEN tableIndexes
FETCH NEXT FROM tableIndexes INTO @indexName WHILE @@fetch_status = 0
BEGIN
SET @dropIndexSql = N'DROP INDEX ' + @tablename +'.' + @indexName
EXEC sp_executesql @dropIndexSql
FETCH NEXT FROM tableIndexes INTO @indexName
END
CLOSE tableIndexes
DEALLOCATE tableIndexes
END

Monday, 27 April 2009

Multiple years in a Cognos ReportNet Graph

If you visualize data to a line chart with 'zero-values' (or in my case, they do have values but they are excluded because of the fact that they belong to weeks with a public holiday) and you need to create a distinction between the years, there will be gaps in the line. See the graph below:

sales20082009

I needed to create a graph starting in week 44 (the start of the impact of the financial crises) without any gaps. The periods (200844-200943 and 200744-200843) had to have different to colours. If you create a chart without series, there is no distinction anymore between
the years.

graph_normal

Query:

query1

Tabular data: (the data item 'Serie'' is defined in Report Studio).

salestable

To place all historical data on one line, I added a week_key column which is the concatination of the year and the week. I added the 'serie column' as a serie (instead of year). In the properties of the level item of the week_key I set 'Sort' on 'ascending':

week_key_no_key

chart_adapted

It didn't work immediately as the 'key property' of the level item 'week_key' had to be set on 'Yes':

week_key_key

After this, the result was:

sales200744-

Another usage of this:

20090428salesquarter

Query:

20090428 - Query SalesQuarter

Sunday, 26 April 2009

Public holiday indicator in your Date Dimension

For a lot of reasons we want to determine whether a day is a working day or not.
For example: productivity figures are only valid if we exclude the bank / public holidays.
Trend analysis reports show representative charts if we exclude weeks with signifcant low values (as a public holiday is in the week).

No RDBMS supports public holidays, not static ones (like New Year), neither will dynamical holidays be supported (like Christian holidays which depend on the date of Easter Day).

Whit Monday is seven weeks after Easter Day and Ascension day is 10 days before Whit Monday.
So we only have to calculate the date of Easter Day. This is a complex calculation as it is originated on a non-accurate version of the Hebrewian calendar. There is a correct formula: Jewish months start on Full Moon. The 14th or 15th day must follow immediately after Full Moon. That's why the decision has been made that Easter Day is the sunday after the first Full Moon after ‘Vernal Equinox’ in spring (the crusification of Jesus). ‘Vernal Equinox’ is on March, 21.
Full Moon before Easter Day is being called: 'Paschal Full Moon'. This conflicts with the date of the real Full Moon (astronomical). As 'Paschal Full Moon' is not variabel, Easter Day can be calculated (hence all other Christian holidays).


I used the following calculation in a stored procedure to calculate the Public Holiday Indicator in a date dimension:

Easter calculation
Source: Claus Tøndering, Frequently Asked Questions about Calendars, 25 september 2000


Prerequisites:

CREATE SCHEMA BSM

CREATE TABLE [bsm].[DIM_date]
( [date_id] [int] NOT NULL
, [date] [datetime] NOT NULL
, [day_nr] [int] NOT NULL
, [month_nr] [int] NOT NULL
, [month] [varchar](10) NOT NULL
, [month_key] [int] NOT NULL
, [quarter_nr] [int] NOT NULL
, [quarter] [varchar](2) NOT NULL
, [quarter_key] [int] NOT NULL
, [year] [int] NOT NULL
, [week_nr] [int] NOT NULL
, [week] [varchar](7) NOT NULL
, [week_key] [int] NOT NULL
, [weekday_nr] [int] NOT NULL
, [day] [varchar](9) NOT NULL
, [public_holiday] [varchar](25) NOT NULL
, [weekend] [varchar](1) NOT NULL
, [working_day] [varchar](1) NOT NULL
, [nr_working_days] [int] NOT NULL
, [nr_days] [int] NOT NULL
, CONSTRAINT [PK_DIM_date_id] PRIMARY KEY CLUSTERED
( [date_id] ASC)WITH (IGNORE_DUP_KEY = OFF)
ON [PRIMARY]) ON [PRIMARY]


The function below is needed as SQL SERVER 2005 doesn't support ISO8601 weeknumber (only United States weeknumbers). You do not need this in SQL SERVER 2008. Oracle 9i and higher support it by using the 'IW'-schema.

CREATE FUNCTION [bsm].[ISO8601WeekNr] (@Date datetime)
RETURNS INT
AS
BEGIN
DECLARE @ISO8601Week int

DECLARE @day int
DECLARE @month int
DECLARE @year int
DECLARE @a int
DECLARE @y int
DECLARE @m int
DECLARE @JD int -- Julian day number
DECLARE @d4 int
DECLARE @L int
DECLARE @d1 int

SET @year = YEAR(@Date)

SET @month = MONTH(@Date)
SET @day = DAY(@Date)
SET @a = (14-@month)/12
SET @y = @year +4800 -@a
SET @m = @month + (12*@a) - 3
SET @JD = @day + ( ((153*@m)+2)/5 ) + (365*@y) + (@y/4) - (@y/100) + (@Y/400) - 32045
SET @d4 = (@JD + 31741 - (@JD % 7)) % 146097 % 36524 %1461SET @L = @d4/1460
SET @d1 = ((@d4-@L)%365)+@L
SET @ISO8601Week = (@d1/7) +1
RETURN @ISO8601Week

END

The load script:

CREATE PROCEDURE [bsm].[pLoadDIM_Date] AS
BEGIN
DECLARE @d int -- day counter
DECLARE @date datetime
DECLARE @day_nr int
DECLARE @month varchar(25)
DECLARE @year int
DECLARE @yearvarchar varchar(4)
DECLARE @quarter_nr int
DECLARE @month_nr int
DECLARE @day varchar(9)
DECLARE @g int -- Golden number (-1): Years with the same Golden number have 'New Moon' on the same date

DECLARE @c int -- year/4
DECLARE @h int -- number of days since 'New Moon'
DECLARE @i int -- number of days since March, 21 and 'Paschal Full Maan'
DECLARE @j int -- 'Paschal Full Moon day'
DECLARE @l int -- number of days between March, 21 and Sunday on or before 'Paschal Full Moon' (between -6 and 28)
DECLARE @Easter_month int
DECLARE @Easter int
DECLARE @Easter_date datetime
DECLARE @public_holiday int
DECLARE @x int -- working days counter
DECLARE @y int -- indicator in case week 53 from year x is in year x+1 (100=yes;0=no)

DECLARE @z int -- indicator in case week 1 from year x is in year x-1 (100=yes;0=no)

DECLARE @weekday_nr int

truncate table bsm.dim_date

SET DATEFIRST 1 -- First day of week is Monday (Default is Sunday)
SET @d = -5475 -- First date is current date minus 5475 days
SET @x = 0

WHILE @d <5475

BEGIN

SET @date= DATEADD(dd, @d, GETDATE())
SET @year =YEAR(@date)
SET @yearvarchar = CONVERT(VARCHAR,DATEPART(yy,DATEADD(dd, @d, GETDATE())))

SET @quarter_nr = DATEPART(qq,DATEADD(dd, @d, GETDATE()))
SET @month_nr = DATEPART(mm, DATEADD(dd, @d, GETDATE()))
SET @weekday_nr = DATEPART(dw, DATEADD(dd, @d, GETDATE()))
SET @day_nr = DATEPART(dd, DATEADD(dd, @d, GETDATE()))
SET @day = DATENAME(dw,DATEADD(dd, @d, GETDATE()))
SET @month = DATENAME(mm,DATEADD(dd, @d, GETDATE()))
SET @g = @year%19 SET @c = @year/100SET @h = (@c-(@c/4)-((8*@c+13)/25)+(19*@g)+15)%30
SET @i = @h-((@h/28)*(1-(29/(@h+1)))*((21-@g)/11))
SET @j = (@year+(@year/4)+@i+2-@c+(@c/4))%7SET @l = @i-@j
SET @Easter_month = 3+((@l+40)/44)
SET @Easter = @l+28-31*(@Easter_month/4)
SET @Easter_date = CONVERT(CHAR(10),(CONVERT(DATETIME,CAST(@Easter_month AS CHAR)+'/'+CAST(@Easter AS CHAR)+'/'+CAST(DATEPART(yy, DATEADD(dd,@d, GETDATE())) AS CHAR))),110)

IF @weekday_nr = 6 OR @weekday_nr = 7 OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-1, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-49, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-50, GETDATE()),110) OR CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-39, GETDATE()),110) OR (@day_nr= 1 AND @month_nr = 1) OR (@day_nr = 30 AND @month_nr = 4) OR (@day_nr = 25 AND @month_nr = 12) OR (@day_nr = 26 AND @month_nr = 12)
SET @public_holiday = 1 ELSE select @public_holiday = 0

IF @public_holiday =1 or @weekday_nr in(6,7)
SET @x = @x ELSE
SET @x =@x+1


IF bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) >51 and @month_nr = 1
SET @y=100 ELSE SET @y=0 -- if a week belongs party to the next year (2004.53 belongs party to 2005) set indicator on 100 so -- if you subtract 100 of 200553, week 200553 is still week 200453.

IF bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) =1 and @month_nr = 12
SET @z=100 ELSE SET @z=0 -- if a week is in last year (first week of 2004 belongs partly to 2003) set indicator on 100 so -- if you add 100 to 200301 week 200301, week 200301 is still 200401

INSERT INTO bsm.dim_date
SELECT ((DATEPART(yy, DATEADD(dd, @d, GETDATE())) * 100) + DATEPART(mm,DATEADD(dd, @d, GETDATE()))) * 100 + @day_nr AS date_id
, CAST(CONVERT(VARCHAR, @date, 111) AS DATETIME) as date
, @day_nr AS day_nr
, DATEPART(mm, DATEADD(dd, @d, GETDATE())) AS month_nr
, @month AS month
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+ DATEPART(mm, DATEADD(dd, @d, GETDATE())) AS month_key
, DATEPART(qq,DATEADD(dd, @d, GETDATE())) AS quarter_nr
, 'Q'+CONVERT(VARCHAR,DATEPART(qq,DATEADD(dd, @d, GETDATE()))) as quarter
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+ DATEPART(qq,DATEADD(dd, @d, GETDATE())) AS quarter_key
, @year AS year_nr
, bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) AS week_nr
, 'Week '+CONVERT(VARCHAR,bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101))) AS week
, (CAST(DATEPART(yy, DATEADD(dd, @d, GETDATE())) AS VARCHAR)*100)+CAST (bsm.ISO8601weekNr(CONVERT(CHAR, (DATEADD(dd, @d, GETDATE())),101)) AS VARCHAR)-@y+@z AS week_key
, @weekday_nr AS weekday_nr
, @day AS day
, CASE WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d, GETDATE()),110) THEN 'Easter Day'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-1, GETDATE()),110) THEN'Easter Monday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-49, GETDATE()),110) THEN 'Whit Sunday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-50, GETDATE()),110) THEN 'Whit Monday'
WHEN CONVERT(CHAR(10),@Easter_date,110) = CONVERT(CHAR(10),DATEADD(dd, @d-39, GETDATE()),110) THEN 'Ascension day'
WHEN @day_nr= 1 AND @month_nr = 1 THEN 'New Year-s Day' WHEN @day_nr = 30 AND @month_nr = 4 THEN 'Dutch Queen-s Day'
WHEN @day_nr = 25 AND @month_nr = 12 THEN 'Christmas Day' WHEN @day_nr = 26 AND @month_nr = 12 THEN 'Boxing Day' ELSE '-'
END AS public_holiday
, CASE WHEN @weekday_nr in(6,7) THEN 'Y' ELSE 'N' END AS weekend
, CASE WHEN @public_holiday = 1 OR @weekday_nr in(6,7) THEN 'N' ELSE 'Y' END AS working_day
, @x AS nr_working_days
, @d+5476 AS nr_days

SET @d = @d + 1

END

END


Friday, 24 April 2009

Monitor / Audit your Data Warehouse load

One of the most important means of monitoring the run of your Data Warehouse load, is getting insight in the status of running jobs, the duration of the jobs (in history) and which jobs doesn't seem to end. An execution of a simple update script of an admin table before the start of a job and after the end of a job can result in a table like:





































































































































































































































































































































RunIDProcessActionObjectNewUpdatedDeletedLoad_startLoad_end
1EDW - HRMPublish HRM_cubeHRM_Montly.mdc4/15/09 8:09:38 AM4/15/09 8:10:02 AM
1EDW - HRMFTP HRM cubeHRM_Montly.mdc4/15/09 8:09:22 AM4/15/09 8:09:38 AM
1EDW - HRMCreate HRM cubeHRM_Montly.mdc4/15/09 8:09:21 AM4/15/09 8:09:22 AM
1EDW - HRMLoad new facts into aggregate table in Business ModelBSM.FCT_Hour_Monthly11408004/15/09 8:09:21 AM4/15/09 8:09:21 AM
1EDW - HRMDelete existing facts from aggregate table in Business ModelBSM.FCT_Hour_Monthly003444/15/09 8:09:03 AM4/15/09 8:09:21 AM
1EDW - HRMLoad Dimension Supplier into BSM (Refresh)BSM.DIM_Supplier3004/15/09 8:09:02 AM4/15/09 8:09:03 AM
1EDW - HRMLoad Dimension Organisational Unit into BSM (Refresh)BSM.DIM_OrganisationalUnit23004/15/09 8:09:00 AM4/15/09 8:09:02 AM
1EDW - HRMLoad Dimension Employee into BSM (Refresh)BSM.DIM_Employee45632004/15/09 7:59:20 AM4/15/09 8:09:00 AM
1EDW - HRMLoad Date Dimension into Business Model (Refresh)BSM.DIM_Date365004/15/09 7:59:18 AM4/15/09 7:59:20 AM
1EDW - HRMLoad new and changed hour facts into ODSODS.FCT_Hour45632004/15/09 7:59:16 AM4/15/09 7:59:18 AM
1EDW - HRMLoad new and changed targets into ODSODS.FCT_Target12004/15/09 7:59:15 AM4/15/09 7:59:16 AM
1EDW - HRMLoad new and changed suppliers into ODSODS.DIM_Supplier0004/15/09 7:59:08 AM4/15/09 7:59:15 AM
1EDW - HRMLoad new and changed organisational units into ODSODS.DIM_OrganisationalUnit0104/15/09 7:58:58 AM4/15/09 7:59:08 AM
1EDW - HRMLoad new and changed employees into ODSODS.DIM_Employee1204/15/09 7:58:55 AM4/15/09 7:58:58 AM
1EDW - HRMTransformation step: create hierarchy for organisational unitSTG.OrganisationalUnit02304/15/09 7:58:52 AM4/15/09 7:58:53 AM
1EDW - HRMTransformation step: calculate productivitySTG.Hour04563204/15/09 7:58:49 AM4/15/09 7:58:52 AM
1EDW - HRMLookup surrogate key supplierSTG.Supplier0304/15/09 7:58:23 AM4/15/09 7:58:49 AM
1EDW - HRMLookup surrogate key organisational unitSTG.OrganisationalUnit02304/15/09 7:51:56 AM4/15/09 7:58:04 AM
1EDW - HRMLookup surrogate key employeeSTG.employee0245604/15/09 7:51:28 AM4/15/09 7:51:56 AM
1EDW - HRMDetect corrected hoursSTG.Hour03504/15/09 7:51:24 AM4/15/09 7:51:28 AM
1EDW - HRMDetect changed supplier attributesSTG.Hour0004/15/09 7:51:22 AM4/15/09 7:51:24 AM
1EDW - HRMDetect changed organisational unit attributesSTG.OrganisationalUnit0104/15/09 7:51:18 AM4/15/09 7:51:22 AM
1EDW - HRMDetect changed employee attributesSTG.Employee2004/15/09 7:51:15 AM4/15/09 7:51:18 AM
1EDW - HRMExtract Monthly TargetsSTG.Target12004/15/09 7:50:54 AM4/15/09 7:51:15 AM
1EDW - HRMExtract Fact HoursSTG.Hour45632004/15/09 7:50:49 AM4/15/09 7:50:52 AM
1EDW - HRMExtract Dimension SupplierSTG.Supplier3004/15/09 7:50:17 AM4/15/09 7:50:49 AM
1EDW - HRMExtract Dimension Organisational UnitSTG.OrganisationalUnit23004/15/09 7:49:21 AM4/15/09 7:50:17 AM
1EDW - HRMExtract Dimension EmployeeSTG.Employee2456004/15/09 7:49:20 AM4/15/09 7:49:21 AM



Firstly, create an admin table:

-- Create MetaData layer

create schema mda

-- Create Admin table

CREATE TABLE [mda].[admin_run](
[Runid] [int] NULL,
[Process] [varchar](50) ,
[Action] [varchar](50) ,
[Object] [varchar](50) ,
[Inserted] [int] NULL,
[Updated] [int] NULL,
[Deleted] [int] NULL,
[LoadStart] [datetime] NULL,
[LoadEnd] [datetime] NULL
)
Secondly, create stored procedures to update the table:

CREATE PROCEDURE [mda].[PUpdate_Admin_Run_Start]

(
@Runid int
, @Process varchar(255)
, @Action varchar(255)
, @Object varchar(255)
)

AS

BEGIN

insert into mda.admin_run
(
RunID
, Process
, Action
, Object
, Inserted
, Updated
, Deleted
, LoadStart
)
values
(
@RunID
, @Process
, @Action
, @Object
, ''
, ''
, ''
, getdate()
)

END
CREATE PROCEDURE [mda].[PUpdate_Admin_Run_End]
( @RunId int
, @Action varchar(50)
, @Records_New int = 0
, @Records_Updated int = 0
, @Records_Deleted int = 0 )
AS
BEGIN
update mda.admin_run
set LoadEnd = getdate()
,Inserted = @Records_new
,Updated = @Records_updated
,Deleted = @Records_deleted
where RunId = @RunId
and LoadEnd is null
and Action = @Action
END
Below you find an example script which updates the admin table:
begin
declare @Records_inserted int
declare @Records_updated int
declare @Records_deleted int
declare @Runid int
declare @Process varchar(50)
declare @Action varchar(50)
declare @Object varchar(50)
declare @rows int

select @rows = count(*) from mda.admin_run
if @rows = 0 set @Runid = 1 else select @runid = max(runid) from mda.admi_run

set @Process = 'EDW - HRM'
set @Action = 'Extract Dimension Employee'
set @Object = 'STG.Employee'
exec mda.PUpdate_Admin_Run_Start @RunID,@Process,@Action,@Object
--load script
select getdate()

set @Records_Inserted = @@rowcount
exec mda.PUpdate_Admin_Run_End @RunID,@Action,@Records_Inserted, @Records_updated, @Records_deleted
end

Tuesday, 7 April 2009

Controlled update in SQL Server

Whenever you experience slow updates or timeouts errors on updating columns you can update it incrementally by using the script below.
The WaitForDelay is added to permit the run of other processes (including a transaction log backup.). The script only works if one of the source columns which are used to update, cannot contain NULL values. If that's not the case you have to add a column (i.e. 'updated'). You should update this with a certain value (i.e. '1'). Use this column to determine @Count ('where updated <> 1') and to filter in the 'where clause' the update query ('and updated <> 1').

SQL 2005 Syntax:

DECLARE @Batch int
DECLARE @Rownum int

DECLARE @PrevRownum int

SET @Batch = 10000
SET @Rownum = 1

WHILE (@Rownum > 0)

BEGIN

UPDATE TOP(@Batch) UPDATE_TABLE
SET
UPDATE_COLUMN1 = S.SOURCE_COLUMN1
, UPDATE_COLUM2 = S.SOURCE_COLUMN2
FROM SOURCE_TABLE c
WHERE UPDATE_TABLE.ID = SOURCE_TABLE.ID
AND UPDATE_COLUMN1 IS NULL


SET @Prevrownum = @Rownum

SELECT @Rownum = COUNT(*) FROM UPDATE_TABLE
WHERE UPDATE_COLUMN1 IS NULL


IF @Prevrownum = @Rownum
set @ROWNUM = 0

WAITFOR DELAY '000:00:05'

END
RETURN



Friday, 6 March 2009

SQL Server 2008: Finally the ISO8601 week function

The US system has weeks from Sunday through Saturday, and partial weeks at the beginning and the end of the year. An advantage is that no separate year numbering like the ISO year is needed, while correspondence of lexicographical order and chronological order is preserved.

However, in Europe, we use ISO8601, where the week starts on Monday.This system is used (mainly) in government and business for fiscal periods.

There are mutually equivalent descriptions of week 01:

-the week with the year's first Thursday in it (the formal ISO definition),
-the week with 4 January in it,
-the first week with the majority (four or more) of its days in the starting year, and
-the week starting with the Monday in the period 29 December - 4 January.

If 1 January is on a Monday, Tuesday, Wednesday or Thursday, it is in week 01. If 1 January is on a Friday, Saturday or Sunday, it is in week 52 or 53 of the previous year. 28 December is always in the last week of its year. The week number can be described by counting the Thursdays: week 12 contains the 12th Thursday of the year.
The ISO week-numbering year starts at the first day (Monday) of week 01 and ends at the Sunday before the new ISO year (hence without overlap or gap). It consists of 52 or 53 full weeks. The ISO week-numbering year number deviates from the number of the calendar year (Gregorian year) on a Friday, Saturday, and Sunday, or a Saturday and Sunday, or just a Sunday, at the start of the calendar year (which are at the end
of the previous ISO week-numbering year) and a Monday, Tuesday and Wednesday, or a Monday and Tuesday, or just a Monday, at the end of the calendar year (which are in week 01 of the next ISO week-numbering year).

For Thursdays, the ISO week-numbering year number is always equal to the calendar year number.

In SQL Server SELECT DATEPART(wk,'2010-01-01') results in weeknummer 1 while it should be week 53.

Untill version 2008 ISO8601 is not supported.

In MS SQL Server 2008 you can use

SELECT DATEPART(isowk,'2010-01-01') which results in week 53.

There are mutually equivalent descriptions of week 01:

In older versions (SQL 2005, SQL 200) you have to create a function to use ISOweeks:

CREATE FUNCTION [dbo].[ISOweek] (@DATE datetime)
RETURNS int
AS
BEGIN
DECLARE @ISOweek int
SET @ISOweek= DATEPART(wk,@DATE)+1
-DATEPART(wk,CAST(DATEPART(yy,@DATE) as CHAR(4))+'0104')
--Special cases: Jan 1-3 may belong to the previous year
IF (@ISOweek=0)
SET @ISOweek=dbo.ISOweek(CAST(DATEPART(yy,@DATE)-1
AS CHAR(4))+'12'+ CAST(24+DATEPART(DAY,@DATE) AS CHAR(2)))+1
--Special case: Dec 29-31 may belong to the next year
IF ((DATEPART(mm,@DATE)=12) AND
((DATEPART(dd,@DATE)-DATEPART(dw,@DATE))>= 28))
SET @ISOweek=1
RETURN(@ISOweek)
END


This is an example from SQL Book Online by Microsoft.

However, the function relies on the first day of the week being a monday.

Before testing you should set the first day on monday by using:

SET DATEFIRST 1

After that you can check it by:

select
Date,
dbo.ISOWEEK(a.Date) ISOWEEK
from
(
select Date = getdate() union
select Date = convert(datetime,'2010/01/01')
) a


The function below also works independent of the DATEFIRST setting:

create function dbo.ISOWEEK2
(
@Date datetime
)
returns int
as

begin

declare @WeekOfYear int

select
-- Compute week of year as (days since start of year/7)+1
-- Division by 7 gives whole weeks since start of year.
-- Adding 1 starts week number at 1, instead of zero.
@WeekOfYear =
(datediff(dd,
-- Case finds start of year
case
when NextYrStart <= @date
then NextYrStart
when CurrYrStart <= @date
then CurrYrStart
else PriorYrStart
end,@date)/7)+1
from
(
select
-- First day of first week of prior year
PriorYrStart =
dateadd(dd,(datediff(dd,-53690,dateadd(yy,-1,aa.Jan4))/7)*7,-53690),
-- First day of first week of current year
CurrYrStart =
dateadd(dd,(datediff(dd,-53690,aa.Jan4)/7)*7,-53690),
-- First day of first week of next year
NextYrStart =
dateadd(dd,(datediff(dd,-53690,dateadd(yy,1,aa.Jan4))/7)*7,-53690)
from
(
select
--Find Jan 4 for the year of the input date
Jan4 =
dateadd(dd,3,dateadd(yy,datediff(yy,0,@date),0))
) aa
) a

return @WeekOfYear

end
go


You can check it by:

SET DATEFIRST 7

select
Date,
dbo.ISOWEEK(a.Date) ISOWEEK,
dbo.ISOWEEK2(a.Date) ISOWEEK2
from
(
select Date = getdate() union
select Date = convert(datetime,'2010/01/01')
) a


results for 2010/01/01 in:

ISOWEEK = 52
ISOWEEK2 = 53

SET DATEFIRST 1

select
Date,
dbo.ISOWEEK(a.Date) ISOWEEK,
dbo.ISOWEEK2(a.Date) ISOWEEK2
from
(
select Date = getdate() union
select Date = convert(datetime,'2010/01/01')
) a


results for 2010/01/01 in:

ISOWEEK = 53
ISOWEEK2 = 53

Wednesday, 4 March 2009

Adding a trend line to your report

Trend lines are used in business analytics to show changes in data over time. Trend lines are often used to argue that a particular action or event (such as the first press major press releases about the current financial crisis) caused observed changes at a point in time.

This is a simple technique, and does not require a sophisticated analysis technique. However, most reporting tools do not have the feature to add trend lines in charts (like MS-Excel has).

You can add the linear regression formula in your reporting tool as a calculated field. You shouldn't do this as every time the report is executed the formula has to be calculated. Besides this, the formula is a performance killer. Rather than this, define the y-values of the trendline in your ETL process to update it in your Data Warehouse.

The formula used is: y' = mx +b where m is the slope of the line and b is the y-intercept, which is the y-coordinate of the point where the line crosses the y axis. This can be seen by letting x = 0, which immediately gives y = b. In mathematic equation, y'=

To clarify this formula, take a look at the following example:

x y

1 20883

2 24260

3 21678

4 27094

gives:
x = {1,2,3,4},
y = {20883,24260,21678,27094),
average y= (20883+24260+21678+27094)/4 = 23478.75
average x=(1+2+3+4)/4 = 2.5
total xy = (20883*1)+(24260*2)+(21678*3)+(27094*4) = 242813
total x = 1+2+3+4=10
total x*x= (1*1)+(2*2)+(3*3)+(4*4) = 10
results in a slope of 1605.1 and a y-intercept of 19466, so the y'-coordinates will be:

x y'
1 21071.1

2 22676.2
3 24281.3
4 25886.4
If above example is the revenue for a certain salesperson and you want to visualize a list of salespersons starting with the one with the most declining trend in revenue, you have to use the slope to set the order property. In mathematic equation, the slope =

The slope is used to describe the steepness, incline, gradient, or grade of the trend line. A higher slope value indicates a steeper incline. The slope is defined as the ratio of the "rise" divided by the "run" between two points on the line, or in other words, the ratio of the altitude change to the horizontal distance between any two points on the line:


Here you find the Sql script (based on SQL Server) to load the trend data":

CREATE TABLE
[dbo].[BM_Trend]
(
[Salesperson] [nvarchar(25)] NULL,
[Week] [tinyint] NULL,
[y] [numeric](18, 0) NULL,
[x] [numeric](6, 0) NULL,
[x2] [int] NULL,
[xy] [int] NULL,
[totalx] [int] NULL,
[totaly] [numeric](18, 0) NULL,
[totalx2] [bigint] NULL,
[totalxy] [numeric](18, 0) NULL,
[avgx] [float] NULL,
[avgy] [float] NULL,
[maxx] [int] NULL,
[slope] [float] NULL,
[intercept] [float] NULL,
[ytrend] [float] NULL)

insert into bm_trend
(
, salesperson
, week
, y
, x
, x2
, xy
)
select
Week
, sum(fs.value) y
, ROW_NUMBER() OVER(PARTITION BY fs.
salesperson
ORDER BY fs.week) x
, ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) *
ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) x2
, ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) *
sum(fs.value) xy
from BM_SalesFacts FS
group by fs.salesperson, fs.week

update bm_trend
set totalx = a.totalx
, totaly = a.totaly
, totalx2 = a.totalx2
, totalxy = a.totalxy
, avgx = a.avgx
, avgy = a.avgy
, maxx = a.maxx
from
(select
salesperson
, sum(x) totalx
, sum(y) totaly
, sum(x2) totalx2
, sum(xy) totalxy
, avg(x) avgx
, avg(y) avgy
, max(x) maxx
from
bm_trend
group by salesperson)a
where a.salesperson = bm_trend.salesperson

update bm_sectortrend
set slope = a.slope
, intercept = a.intercept
, ytrend = a.ytrend
from
(select
salesperson
, x
, case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end slope
, avgy-(case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end) *avgx intercept
, (case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end)*x+
avgy-(case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end) *avgx Ytrend
from
bm_trend
)a
where a.salesperson = bm_trend.salesperson
and a.x = bm_trend.x


Now you can use column Ytrend for the trendline in the chart and you can use column slope to order the salespersons in a list.