Difficult General Datawarehousing Interview Preparation Guide
Download PDF

General Datawarehousing guideline for job interview preparation. Explore list of General Datawarehousing frequently asked questions(FAQs) asked in number of General Datawarehousing interviews. Post your comments as your suggestions, questions and answers on any General Datawarehousing Interview Question or answer. Ask General Datawarehousing Question, your question will be answered by our fellow friends.

40 General Datawarehousing Questions and Answers:

Table of Contents:

Difficult  General Datawarehousing Job Interview Questions and Answers
Difficult General Datawarehousing Job Interview Questions and Answers

1 :: Explain me what are conformed dimensions?

Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)

Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions
here D1,D2 are the Conformed Dimensions one dimension can share with more fact tables through primary key and foreign key relationship.

2 :: How to load the time dimension?

Time dimension are used to represent the datas or measures over a certain period of time.The server time dimension is the most widley used one by which we can represent the datas in hierachal manner such as quarter->year->months->week wise representations.

3 :: Explain Why are OLTP database designs not generally a good idea for a Data Warehouse?

Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.

4 :: Explain What are conformed dimensions?

Conformed dimensions mean the exact same thing with every possible fact table to which they are joined
Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc

5 :: Explain What does level of Granularity of a fact table signify?

Granularity
The first step in designing a fact table is to
determine the granularity of the fact table. By
granularity, we mean the lowest level of information
that will be stored in the fact table. This
constitutes two steps:

Determine which dimensions will be included.
Determine where along the hierarchy of each dimension
the information will be kept.
The determining factors usually goes back to the
requirements

6 :: What is a Fact table?

Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.

Facts are organized in a table is called Fact table.
A Fact is a numeric values or a Business measure.
Every numeric is not a fact. a numeric which occupied a key performance indicator is called Facts
A Fact table contains a Facts at lower granularity level

7 :: What is a data warehousing?

Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated....This makes it much easier and more efficient to run queries over data that originally came from different sources.

Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.

A Data Warehousing is defined in 2 ways by 2 authors named "Ralph Kimball" and "W.H.Inman"

According to Ralph Kimball, . A D.W.H is a relational database which is specially design for business analysis but not for running the business.
. An enterprise D.W.H is design to make decision process. Hence it is called Decision Support System.
. A Data Warehouse is design to only read operations required for business analysis but not for transactional process. Hence it is called Read Only Database.

According to W.H.Inman, A Data Warehouse is a,
1) Time variant Database
2) Non-Volatile Database
3) Integrated Database
4) Subject oriented Database
and a Data Warehouse is a historical database

8 :: Explain What are non-additive facts?

Non-Additive: Non-additive facts are facts that cannot
be summed up for any of the dimensions present in the
fact table.

9 :: Explain some of modeling tools available in the Market?

These tools are used for Data/dimension modeling

1. Oracle Designer
2. ERWin (Entity Relationship for windows)
3. Informatica (Cubes/Dimensions)
4. Embarcadero
5. Power Designer Sybase

10 :: What is an ER Diagram?

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.

Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.

Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:

it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables.
it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.

In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.

11 :: Explain Why should you put your data warehouse on a different system than your OLTP system?

A OLTP system is basically " data oriented " (ER model) and not " Subject oriented "(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system...

Moreover if a complex querry is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the daytoday business directly.

12 :: List down some of the real time data-warehousing tools?

ETL:
Informatica,
Abinitio,
Datastage etc..,

OLAP:
Business objects
Cognos,
Micro stragetgy,
Hyperion etc..,

DW:
Oracle,
DB2,
Terradata,
Sybase,
Greenplum etc..,

13 :: What is a general purpose scheduling tool?

The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.

14 :: Explain How are the Dimension tables designed?

Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.

Find where data for this dimension are located.

Figure out how to extract this data.

Determine how to maintain changes to this dimension (see more on this in the next section).

Change fact table and DW population routines.

15 :: What is a dimension table?

A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.

16 :: Explain Which columns go to the fact table and which columns go the dimension table?

The Primary Key columns of the Tables(Entities) go to the Dimension Tables as Foreign Keys.

The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.

17 :: What is is a Star Schema?

Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment.Usually a star schema consists of one or more dimension tables around a fact table which looks like a star,so that it got its name.

18 :: Explain What are slowly changing dimensions?

SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types

SCD1: only maintained updated values.

Ex: a customer address modified we update existing record with new address.

SCD2: maintaining historical information and current information by using

A) Effective Date

B) Versions

C) Flags

or combination of these

scd3: by adding new columns to target table we maintain historical information and current information

19 :: What is data mining?

Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area.

20 :: What are modeling tools available in the Market?

There are a number of data modeling tools

Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation

21 :: What are the advantages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs?

Transaction logs write sequentially and don't need to be read at all. The ideal is to have each on RAID 1/0 because it has much better write performance than RAID 5.

RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking.

RAID 5 is best for data generally because of cost and the fact it provides great read capability.

22 :: Explain What type of Indexing mechanism do we need to use for a typical datawarehouse?

On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.

To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

23 :: What is pre-emptive and non-pre-emptive?

Premptive means taken as a measure against something possible, anticipated, or feared; preventive; deterrent: a preemptive tactic against a ruthless business rival.

Non Pre-emptive is the exact opposite to Pre-emptive.No such preventive measures has been taken.

24 :: What are vaious ETL tools in the Market?

Various ETL tools used in market are:

1. Informatica
2. Data Stage
3. MS-SQL DTS(Integrated Services 2005)
4. Abinitio
5. SQL Loader
6. Sunopsis
7. Oracle Warehouse Bulider
8. Data Junction

25 :: Explain What is real time data-warehousing?

Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
General Datawarehousing Interview Questions and Answers
40 General Datawarehousing Interview Questions and Answers