Finance Management System

SFSU Business Intelligence > Technical Overview


Business Intelligence Technical Overview

How Does The Technology Work?

ETL (Extract, Transform, and Load)

The majority of work in building a data warehouse falls within ETL on the staging server. Often, the information from different sources is placed in an ODS (Operational Data Store) which is then used as the source of data for data warehousing projects. Commonly, BI systems collect information about the state of the building process, which is called an audit trail. This information can be used to track the progress at a fairly detailed level.

Also, most BI systems come with metadata repositories which are maintained during ETL. Metadata is used by the end user reporting and analysis tools. Metadata is data about data. For example, "Last Name" is the metadata describing the data while "Smith" is the data itself.

Modern graphical tools can simplify the development of most, if not all, of the required ETL functionality.

Extract

This phase deals with copying the relevant information from the source systems into the staging area. Typically, only the information that has changed since the last extract is retrieved. Before the extraction can begin, source data issues must be resolved such as future birth dates, codes with more than one meaning, etc. The data retrieval can be done with SQL (Structured Query Language), or by using available tools. The data transfer can be accomplished with FTP (File Transfer Protocol), a file drop, or other mechanisms. Data transfer can be initiated or scheduled either at the source or from the staging server.

Transformation

Transformation performs the bulk of the workload including removing mistakes, correcting missing data, correlating the data from multiple sources, documenting data quality checks, conforming dimensions, source to target mapping, maintaining historical data, applying complex business rules, summarizing and aggregation, and much more. All of this is driven by the business requirements, which must be guided by the business users and subject matter experts. The ultimate goal of this phase is to mold the data into a business friendly dimensional format for load into the data warehouse.

Load

The transformed data is loaded directly into the local data store/data warehouse. Usually, this is done in a single batch at a predefined time as dictated by the business. An interrupted load is typically restarted from scratch due to high data quality requirements – the business cannot afford to expose potentially invalid data.

SFSU Home | Search SFSU
1600 Holloway Avenue, San Francisco, CA 94132
Last modified January 14, 2008 by IS Projects Web Team