- A Data Warehouse is a subject-oriented integrated time-varying non-volatile COLLECTION OF DATA THAT is used PRIMARILY IN ORGANIZATIONAL DECISION MAKING.
- Bill Inmon,Building the Data Warehouse 1996
- A process of transforming data into information and making it available to users in a timely enough manner to make a difference
- Forrester Research April 1996
- Data Warehouse Architecture
- OLTP vs Data Warehouse
- ETL (Extraction ,Transformation and Loading) is a process by which data is integrated and transformed from the operational systems into the Data Warehouse environment
Operational Data Challenges
- Data from Heterogeneous sources
- Format Differences
- Data Variations
- Across Locations same code represents different customers
- Across period of time a product code have been re-used.
Data Transformation
- Conversions of Data - Data Type change / Standardized to common units (currency/measurements)
- Classification -Changing continuous values to discrete ranges (temperature to temperature ranges)
- Splitting of Fields
- Merging
- Aggregations
- Derivations(Percentages,Ratios,Indicators)
- Four Classes - Structure ,Format,Conversions,Classifications
Guiding Principles
- Single Version of Truth
- Integration of Data
- Non Redundant Data
- Established Standards
- Methodology specific to data warehousing
- Business Oriented data model with metadata
- Detailed atomic data
- Uniform data meanings
- Time-dimensioned data with years of history
- Adhoc Access
- Scaled for growth of data,users and speed
- Any Questions ,any data, at any time
ETL Methodologies
- Kimball /Star Schema - the Right way to do it , takes longer to develop , use less space
- Inmon / 3rd Normal Form - The wrong way ,keep the same structure as source , puts lot of work on Business Analysts.
Kimball vs Inmon
- Ralph Kimball approch stressed the importance of data marts , which are repositories of data belonging to particular lines of Business .The data warehouse is simply a combination of different data marts that facilitates reporting and analysis. the Kimball data warehouse uses a "bottom-up" approch.
- Bill Inmon regarded the data warehouse as the centralized repository for all enterprise data. In this approch , an organization first creates a normalized data warehouse model.Dimensional data marts are then created based on the subjects , it uses "top-down" approch.
Comments
Post a Comment