This post is the first article within title
'Requirements and procedures for the successful deployment and maintenance of a Data Warehouse and Business Intelligence environment'The articles describe the requirements for developing, deploying and maintaining a Data Warehouse environment including Business Intelligence solutions within the contexts:
Change/ Problem Management (the process of managing changes and problems, in other words, how does a customer and/or a user submit a change/problem request, how is it prioritized, tracked and implemented?).
Configuration management (the versioning, labeling and tracking of code and content components during the development life cycle)
Release Management (it designates and schedules software and hardware release components to migrate it to the live environment as a set).The following users and roles are in general related to Business Intelligence activities (some roles may be combined, one person may fill more than one role or a role may be filled by more than one person). BUSINESSKey UserKey-Users are the owners of a (subset of) reports. More important, they can be the owners of the definitions of specific performance indicators. They have privileges to report change requests and problem requests with respect to their own reports and cubes. They are not allowed to request changes of conformed dimensions and measures (dimensions and measures which are used by more key-users).
User GroupMembers of the User Group are key users who share conformed dimensions and measures. They authorize the head of the User Group to request for changes. Usually the User Group is in conference to discuss changes (and practical value of the management information produced by the Data Warehouse team). A representative of the Data Warehouse Team can participate.
UserUsers do not have the privilege to request for changes. They may report problem requests.
Users can have different roles:
RecipientMembers can only retrieve content (for example by email)
Consumer Members can read (and execute) (public) content such as Powerplay/Impromptu reports.
Query UserMembers can execute queries on the Data Warehouse (with TOAD, Cognos Impromptu, SQL+)
Power UserMembers have the same access permissions as Query Users. They can use Impromptu or PowerPlay to create and publish reports for public or personal use.
Control AdvisorA control advisor judges whether it’s allowed to deliver information/data (i.e. privacy aspects). The data privacy must be kept in mind. Very often there are legal restrictions around the correct use of (personal) data. The head of the User Group can perform this role and delegate it to i.e. HRM (e.g. privacy-data) or Sales (e.g. customer-data).
Business AnalystThe role of the business analyst is to perform research and possess knowledge of existing business applications and processes to assist in identification of potential data sources, business rules being applied to data as it is captured by and moved through the transaction processing applications, etc. Whenever possible, this role should be filled by someone who has extensive prior experience with a broad range of the organization’s business applications. Another alternative is using business analysts from the various application areas as sources when those areas are added to the data warehouse. These analysts may only be associated with the DW team for relatively short periods when the applications they are responsible for are addressed in the data warehouse project. In many cases, the business analyst will work with end users and management to develop business process models that may be used in designing the data warehouse. A business analyst can be a key user.
Business Advisor
The Business Advisor is a business analyst who exceeds the separate processes. He owns shared, conformed dimensions and measures.
SOURCE SYSTEMFunctional Application Administration (FAA)Members of Functional Application Administration support key-users. They are responsible for migration of changes of the source system (not the Data Warehouse environment). Functional administrators have functional knowledge of the source system. They are capable of assist the Data Warehouse team. They can map business questions to database tables/columns.
Technical Application Administration (TAA)Technical administrators implement changes and solve problems. They can assist the Data Warehouse Team in creating sql-scripts. They can map the more complex business questions.
Database administrators (DBA)A database administrator maintains the database of the source system, with the schema’s, tables and so on. It’s important that the Data Warehouse Team cooperates with DBA. Database maintenance is critical for the organization. A breakdown of failure in database traffic can have huge consequences.
DATA WAREHOUSE TEAMData Warehouse ManagerThe data warehouse manager has overall responsibility. The manager defines, plans, schedules, and controls. The DWH-plan must include tasks, deliverables and resources – the people who will perform the tasks. The manager will monitor and coordinate the activities of the team, and will review their deliverables. If contractors and consultants are used, the project manager assigns the tasks, monitors activities and deliverables and assures that knowledge transfer is indeed taking place.
The manager will estimate the cost of the project and monitor conformance to the cost estimates. The manager will also project the benefits, measure the effectiveness of the data warehouse and report on the benefits and costs.
The most important task for the manager is recruiting the right people; people with the requisite skills and people who work well with the users and each other. Getting the right people will require an understanding from management on the importance of the project and their cooperation in the recruiting process. The problem is that the good people are already taken and a smart manager outside of the data warehouse area is loath to give up an outstanding performer. Hiring people from the outside has its own risks and time delays. An outside hire will need some time to become familiar with the organization, to learn how things are done, to understand the minefields, and to learn about the source data.
User SupportUser support is the first line of defense when the user has problems. These people must be customer/client-oriented and know that the calls from the users are not just annoyances but their reason for being. They must be knowledgeable about the tools, must understand the database structure, must know the data and must be familiar with frequently executed queries and reports. They must know the users’ concerns and problems and know the frequently asked questions and the answers to them. They must have a profile of the users, know the power users and the more casual users. They must be patient and responsive. If time elapses between hearing about and resolving the problem, the users must be informed on how the resolution is progressing. Just as in operational systems, an incident tracking system should be in place.
User support will create and maintain frequently asked questions (FAQs) that could be accessible through the Internet. User support should make extensive use of meta data to train, inform and answer user questions. A large percentage of questions that come to help desks involve problems of the users not understanding the meaning of the data, the timeliness of the data or the source of the data. Most of these questions can be answered with meta data.
Intake assistantRaised requests are entered in a problem/change management system. The intake assistant judges the correctness of the reported type of request, assigns the issue to the relevant person and gives the requester feedback about the planning. If possible, the assistant creates a solution himself.
The duty includes the setup of user IDs and system access roles for each person or group, which is given access to the data warehouse or data mart.
Report builderSometimes User Support creates some of the canned queries and reports. It should be a task of the (power) user but organizations can experience a passive attitude in this in spite of transfer of knowledge. This is not unusual; reporting is not the core activity of users.
User Support shouldn’t run queries with a structural character. In practice, it can happen that a change request to publish the query as a standard report has not finished. In the meantime User Support can run the query and distribute the result.
Job MonitorSome of the problems that are seen by user support relate to the update/load (refresh) process, such as whether jobs are completed on time and whether they ran successfully. These processes must be monitored, and any variance in availability or timeliness of the data should be communicated to the users. Other problems relate to performance. User support must have a feel for performance, monitor performance and report performance degradation to database administration.
TrainerThe responsibilities can include training end users to use data query/analysis/ reporting tools, assisting end users in development of queries and reports, assisting in development of production queries/reports and assisting with evaluation and selection of end-user data access tools. This role is more likely to be placed within the IT data warehouse team. The trainer(s) can then work with end users in many departments which might be utilizing the query/reporting/ analysis tools selected for use by the company’s personnel.
Data Access Analyst
This role performs a variety of tasks, which help provide end users with the ability to access and utilize the data stored in the data warehouse or its dependent data marts. This includes evaluating and selecting business intelligence, OLAP and other query/reporting tools, identifying and prioritizing data access projects, analyzing data access requests to determine if and how the request can be met, educating end users on data warehouse capabilities and data availability, establishing data access standards, etc. This role can sometimes be filled by a power user, but most often will be filled by a member of User Support familiar with the application data who has been involved with the data warehouse design and definition process.
DATA ACQUISITIONData ModelerThe person(s) in this role prepares data models for the source systems based on information provided by the business and/or data analysts. Additionally, the data modeler may assist with the development of the data model(s) for the DWH or a data mart guided by the data warehouse architect.
Data Warehouse AdviserThis role will require some level of familiarity with the source systems, the extract processes placed within the transaction processing systems, the content and design of the data warehouse and the load processes and business rules used for data integration and summarization, and the use of the data warehouse by end users or data mart build processes which may have been developed in conjunction with the data warehouse. This person or persons may also be required to have a fundamental knowledge of the data access tools and work with end users to design and develop data access functions for use within the departments or by the company as a whole. This role if fulfilled if a senior consultant Business Intelligence / Data Warehousing is needed for advice.
Data Warehouse ArchitectThese job responsibilities encompass definition of overall data warehouse architectures and standards, definition of data models for the data warehouse and all data marts, evaluation and selection of infrastructure components including hardware, DBMS, networking facilities, ETL (extract, transform and load) software, performing applications design and related tasks.
Data Warehouse DeveloperThe person(s) in this role develops various functional components of the data warehouse applications including extract programs on source systems, ETL applications, data cleansing functions, system management functions including load automation, data acquisition functions and others. The transformation, data cleansing and load functions will usually be developed by team members who are trained in the use of specific ETL and data cleansing tools.
The back-end ETL application developer is responsible for the acquisition process that constitutes the major effort for any data warehouse – the extract/transform/load process. It has been estimated that these tasks are 60 to 80 percent of the total time and effort to implement a data warehouse. Extract/transform/load will be performed either with an ETL tool, with conventional programming or with a combination of both. The back-end ETL application developers will be responsible for the process regardless of which approach is used. Do not assume a tool will eliminate the work or effort involved with this process. The required analysis is still time-consuming and describing the complicated transformation logic to the ETL tool can be difficult. The back-end ETL application developers are responsible for tie-outs, which are the controls of numbers of records extracted matched to the number of records loaded, controls on totals, on errors and on the types of errors. Data cleansing, whether it is done with a cleansing tool, with an ETL product or with conventional coding, is also the responsibility of the back-end ETL application developers. Neither the data warehouse nor the back-end ETL application developers are responsible for fixing the operational systems that feed the data warehouse.
The ETL process is always time- and resource-consuming, even with a data warehouse of moderate size. Application development must be aware of the performance implications of their architecture, design and coding.
DATA ACCESSThe second part of the DW project team is a group whose purpose is to get data, or rather, information out of the data warehouse or a data mart.
Data Access DeveloperThe person in this position develops data access queries/programs using BI tools such as Cognos Impromptu, PowerPlay and Cognos Query to run in production on a regular basis and assists users with development of complex ad hoc queries and analyses. This role may be a power user, trained in the use of the query/reporting/analysis tool or a member of User Support. The front-end application developers play a critical role in delivering the data from the data warehouse. The front-end application developers are usually heavily involved with the users. One of the selling points of the data warehouse is to empower users to write their own queries. If you do not use Data Access Developers you have to write incredibly complex queries – often to the detriment of performance. In case of a clean and correct Data Warehouse with documented logical and technical metadata a Power User can be able to write their own queries and publish reports. However, there are many users who are not capable of writing all the queries they need. In these cases, the front-end application developers will be responsible to write the queries for them. Developers must be aware of the importance of writing functional and technical documents.
ADMINISTRATION AND MAINTENANCEData Warehouse Infrastructure ManagerIf a project is large enough to require dedicated resources for system administration and database administrators (DBAs), it is possible you will want a person who will provide leadership and direction for these efforts. This would be someone who is familiar with the hardware and software likely to be used, experienced in administration of these areas and who can direct tuning and optimization efforts as warehouse development and use moves forward in the organization. Including the infrastructure team within the large data warehousing group helps ensure that the needed resources are available as needed to ensure that the project stays on track and within budgeT.
System AdministratorThis position is responsible for maintaining hardware reliability, system level security, system level performance monitoring and tuning, and automation of production activities including extract and load functions, repetitively produced queries/reports, etc. In many cases, the system administrator is responsible for ensuring that appropriate disaster recovery functions such as system level backups are performed correctly and on an accepted schedule. This person is also responsible for all data acquisition and –access software updates and migrations. Some of the problems relate to the update/load (refresh) process, such as whether it completed on time and whether it ran successfully. These processes must be monitored, and any variance in availability or timeliness of the data should be communicated to the User Group.
The architecture consists of Development/Test, Acceptance and Production.
Database AdministratorThe person in this role is involved in database design, especially the physical design used to implement the data warehouse or data marts. The DBA monitors and tunes DBMS performance, administers end-user access and security at the DBMS level, ensures that DW data is appropriately protected via database backup and related strategies, assists with load process automation, and evaluates and selects infrastructure components. The DBA should be proficient in the technology of both the DBMS and the operating system chosen for the data warehouse or data mart. Other problems relate to performance. The Database Administrator has a feel for performance, monitor performance, report performance degradation to database administration, spot poor query techniques that could cause bad performance and help the users write more efficient queries and reports.
Data AdministratorThe primary roles of data administration are key to a successful data warehouse.
This role is responsible for identifying, documenting and administering the corporate data asset. This includes working with end users and IT staff to gain consensus on standard definition of common data attributes, developing and maintaining the corporate data model, identifying and documenting data sources and maintaining the meta data repository. The Data Administrator constantly monitors the accuracy and completeness of the metadata.
The data administrator may not be a reporting member of the data warehousing organization but is an integral part of the data identification and analysis needed to build a data warehouse or mart.
Data administrators model the business data according to the business rules and policies governing the data relationships. They also document and maintain the mapping between the logical data model and the physical database design and between the source data and the data warehouse target data. Data administrators understand the source files and know which are the appropriate source files for extraction. Data administration establishes and maintains the naming standards. They communicate the availability of data and, when appropriate, suggest that departments share their data. Data administration is responsible for administering the tool used in data modeling and administering the tool libraries although in some organizations, this is the responsibility of the application development team.
Data administration is responsible for administering the meta data repository. Meta data will come from a number of sources: the modeling tools, the extract/transform/load tool, the query tools, from reengineering the source file data definitions and direct entry into a meta data repository tool. The meta data repository could potentially be updated as well as accessed from the each of these source systems. Tight controls are necessary to minimize
corruption of the meta data. Data administration is responsible for controlling entry, updates and deletes.
A Data Administrator administers the libraries that store scripts, queries and reports. Administration includes developing and implementing the standards for admission to the library and for maintaining the scripts. Version Control software must be used for the sources and documents to prevent the existence of different sources with the same name.
Test ManagerChanges like new queries/reports, DWH-build procedures and ETL-implementations must be well documented and must be thoroughly tested. A functional acceptance-test will be done by the Business Analyst (coordinated by the Test Manager). Team members with (PL)/SQL-knowledge do the technical tests.
Data quality is everyone’s job but it is the primary focus of the Test Manager. Although information about data quality problems often originate in IT, it most often comes from the users. Reconciliation must be a regular part of the job. The responsibilities includes monitoring and reporting on data quality, and searching out causes of incompatibility between the various applications which collect and utilize the company’s data. The Test Manager is responsible for finding and reporting problems relating to data quality, for tracking these problems and assuring that the resolution is assigned to a responsible party. Some of the discovered problems must be reported to data administration where the data exceptions can be properly incorporated into the logical data model. The Test Manager can be involved in writing the programming specifications for the transformation logic that needs to occur during the ETL process.
Not all problems can be resolved nor should they be. It might cost more to fix the problem than it’s worth. The Test Manager along with other interested parties, including the business unit, should determine the criticality of the problem. Analytical requirements could determine the quality requirements. The cost to fix the problem should be estimated and the priority would then be assigned.
The Test Manager analyst should be proactive in trying to find problems before they surface. Data quality analysis tools or simple queries could identify many of the problems.
The most important role for the Test Manager will be evaluating and improving the processes in the data warehouse, specifically the ETL process. By improving the processes, the quality of data is likely to improve. The Test Manager might also seek out cleaner sources of data to replace those that have proven troublesome.
Publish AdministratorThe publish administrator publishes all developed sources to the acceptance and –production platform. This regards all DWH-related sources like DWH-build scripts, ETL-scripts and reporting sources. The Publish Administrator has the responsibility to establish an environment where result sets can be distributed (on the Web), where reports can be made available (on the Web) and where queries can be launched (from the Web).
This is not been done by the developers themselves because the changes from development to acceptance and production must be judged on test results and functional/technical documentation. This prevents production misbehavior and knowledge gaps. If a source must be rebuild, documentation has to be sufficient to make that done. Besides that, the data administrator must be guaranteed that definitions have not changed and no new definitions are created without approval of the business. The Data Warehouse
Infrastructure Manager monitors the chosen solution: is everything compatible with the Data Warehouse principles (for example minor transformations in the front-end).
Tool AdministratorThe tool administrator has responsibility for dealing with the tool vendor, assuring excellent support and getting answers to questions. The tool administrator deals with problems associated with the tool including queries that produce incorrect results and queries that perform badly. The tool administrator makes sure the tool environment is properly protected. He has responsibility for new releases and migrations.
In most cases this responsibility is allocated to the involved developers/administrators (Query and OLAP tools assigned to the Data Access Developer; ETL tools to the Data Warehouse Developer and so on).
Administration versus DevelopmentIf the data warehouse team is implemented with substantial functional en technical documents with configuration management and release management procedures it’s possible to hand over development knowledge to an administration team. In that case it’s possible that administration is concerned with problem requests and development with change requests. Then, data acquisition and data access roles must also be introduced to Administration. If not, production development, including problem requests, should stay at the original developers.
CHANGE MANAGEMENTRepresentatives in a ‘change meeting’ will judge the prerequisites for a migration to acceptance and production. Administration, Development and the head of User Group take place. Changes will be judged on the presence and completeness of the required documentation and the correctness of the followed procedures.
Below you find a graphical presentation of the different roles. Again, some roles may be combined, one person may fill more than one role or more than one person may fill a role.