Wednesday 4 March 2009

Adding a trend line to your report

Trend lines are used in business analytics to show changes in data over time. Trend lines are often used to argue that a particular action or event (such as the first press major press releases about the current financial crisis) caused observed changes at a point in time.

This is a simple technique, and does not require a sophisticated analysis technique. However, most reporting tools do not have the feature to add trend lines in charts (like MS-Excel has).

You can add the linear regression formula in your reporting tool as a calculated field. You shouldn't do this as every time the report is executed the formula has to be calculated. Besides this, the formula is a performance killer. Rather than this, define the y-values of the trendline in your ETL process to update it in your Data Warehouse.

The formula used is: y' = mx +b where m is the slope of the line and b is the y-intercept, which is the y-coordinate of the point where the line crosses the y axis. This can be seen by letting x = 0, which immediately gives y = b. In mathematic equation, y'=

To clarify this formula, take a look at the following example:

x y

1 20883

2 24260

3 21678

4 27094

gives:
x = {1,2,3,4},
y = {20883,24260,21678,27094),
average y= (20883+24260+21678+27094)/4 = 23478.75
average x=(1+2+3+4)/4 = 2.5
total xy = (20883*1)+(24260*2)+(21678*3)+(27094*4) = 242813
total x = 1+2+3+4=10
total x*x= (1*1)+(2*2)+(3*3)+(4*4) = 10
results in a slope of 1605.1 and a y-intercept of 19466, so the y'-coordinates will be:

x y'
1 21071.1

2 22676.2
3 24281.3
4 25886.4
If above example is the revenue for a certain salesperson and you want to visualize a list of salespersons starting with the one with the most declining trend in revenue, you have to use the slope to set the order property. In mathematic equation, the slope =

The slope is used to describe the steepness, incline, gradient, or grade of the trend line. A higher slope value indicates a steeper incline. The slope is defined as the ratio of the "rise" divided by the "run" between two points on the line, or in other words, the ratio of the altitude change to the horizontal distance between any two points on the line:


Here you find the Sql script (based on SQL Server) to load the trend data":

CREATE TABLE
[dbo].[BM_Trend]
(
[Salesperson] [nvarchar(25)] NULL,
[Week] [tinyint] NULL,
[y] [numeric](18, 0) NULL,
[x] [numeric](6, 0) NULL,
[x2] [int] NULL,
[xy] [int] NULL,
[totalx] [int] NULL,
[totaly] [numeric](18, 0) NULL,
[totalx2] [bigint] NULL,
[totalxy] [numeric](18, 0) NULL,
[avgx] [float] NULL,
[avgy] [float] NULL,
[maxx] [int] NULL,
[slope] [float] NULL,
[intercept] [float] NULL,
[ytrend] [float] NULL)

insert into bm_trend
(
, salesperson
, week
, y
, x
, x2
, xy
)
select
Week
, sum(fs.value) y
, ROW_NUMBER() OVER(PARTITION BY fs.
salesperson
ORDER BY fs.week) x
, ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) *
ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) x2
, ROW_NUMBER() OVER(PARTITION BY fs.salesperson
ORDER BY fs.week) *
sum(fs.value) xy
from BM_SalesFacts FS
group by fs.salesperson, fs.week

update bm_trend
set totalx = a.totalx
, totaly = a.totaly
, totalx2 = a.totalx2
, totalxy = a.totalxy
, avgx = a.avgx
, avgy = a.avgy
, maxx = a.maxx
from
(select
salesperson
, sum(x) totalx
, sum(y) totaly
, sum(x2) totalx2
, sum(xy) totalxy
, avg(x) avgx
, avg(y) avgy
, max(x) maxx
from
bm_trend
group by salesperson)a
where a.salesperson = bm_trend.salesperson

update bm_sectortrend
set slope = a.slope
, intercept = a.intercept
, ytrend = a.ytrend
from
(select
salesperson
, x
, case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end slope
, avgy-(case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end) *avgx intercept
, (case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end)*x+
avgy-(case when totalx2-totalx*avgx = 0 then 0 else (totalxy-totalx*avgy)/(totalx2-totalx*avgx) end) *avgx Ytrend
from
bm_trend
)a
where a.salesperson = bm_trend.salesperson
and a.x = bm_trend.x


Now you can use column Ytrend for the trendline in the chart and you can use column slope to order the salespersons in a list.


No comments:

Post a Comment

Thanks for your comment.