Data Warehouse

A Data Warehouse is a subject-oriented integrated time-varying non-volatile COLLECTION OF DATA THAT is used PRIMARILY IN ORGANIZATIONAL DECISION MAKING.

- Bill Inmon,Building the Data Warehouse 1996

A process of transforming data into information and making it available to users in a timely enough manner to make a difference

OLTP	Data Warehouse
Application Oriented Used to Run Business Detailed Data Current Up to Date Isolated Data Repetitive Access User Specific Performance Sensitive Less Records accessed at a time Read/Update Access No Data Redundancy (Normalization)	Subject Oriented Used to Analyze Business Summarized and Refined Snapshot Data Integrated Data Ad-hoc Access Business User Performance Relaxed Large Volumes at a time Mostly Read(Batch Update) Redundancy Present (De normalized)

ETL (Extraction ,Transformation and Loading) is a process by which data is integrated and transformed from the operational systems into the Data Warehouse environment

Operational Data Challenges

Data Transformation

Conversions of Data - Data Type change / Standardized to common units (currency/measurements)
Classification -Changing continuous values to discrete ranges (temperature to temperature ranges)
Splitting of Fields
Merging
Aggregations
Derivations(Percentages,Ratios,Indicators)
Four Classes - Structure ,Format,Conversions,Classifications

Guiding Principles

ETL Methodologies

Kimball /Star Schema - the Right way to do it , takes longer to develop , use less space
Inmon / 3rd Normal Form - The wrong way ,keep the same structure as source , puts lot of work on Business Analysts.

Kimball vs Inmon

Ralph Kimball approch stressed the importance of data marts , which are repositories of data belonging to particular lines of Business .The data warehouse is simply a combination of different data marts that facilitates reporting and analysis. the Kimball data warehouse uses a "bottom-up" approch.
Bill Inmon regarded the data warehouse as the centralized repository for all enterprise data. In this approch , an organization first creates a normalized data warehouse model.Dimensional data marts are then created based on the subjects , it uses "top-down" approch.

A Journey to Data Architect ..