Vital Data Warehousing Basics Interview Preparation Guide
Download PDF

Basics Data Warehouse guideline for job interview preparation. Explore list of Data Warehousing Basics frequently asked questions(FAQs) asked in number of Basics Data Warehouse interviews. Post your comments as your suggestions, questions and answers on any Data Warehousing Basics Interview Question or answer. Ask Data Warehousing Basics Question, your question will be answered by our fellow friends.

37 Basics Data Warehouse Questions and Answers:

Table of Contents:

Vital  Basics Data Warehouse Job Interview Questions and Answers
Vital Basics Data Warehouse Job Interview Questions and Answers

1 :: What are the Data Marts?

A data mart is a collection of tables focused on specific business group/department. It may have multi-dimensional or normalized. Data marts are usually built from a bigger data warehouse or from operational data.

2 :: Explain What are the vaious ETL tools in the Market?

Various ETL tools used in market are:

Informatica
Data Stage
Oracle Warehouse Bulider
Ab Initio
Data Junction

BusinessObjects DataIntegrator is another ETL tool.

3 :: Explain the definition of normalized and denormalized view and what are the differences between them?

Normalization is the process of removing redundancies.

Denormalization is the process of allowing redundancies.

Normalization is the process of removing redundancies.

Denormalization is the process of allowing redundancies.

4 :: What is surrogate key? where we use it explain with example?

surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.

2. Adapted from response by Vincent on Thursday, March 13, 2003

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.'

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

A surrogate key is a system generated sequential number which acts as a primary key.

5 :: What is the datatype of the surrogate key?

Datatype of the surrogate key is either integer or numeric.it,s always generated by system because surrogate key works as primary key.surrogate key help us to distinguish the information about the data and store the data history.

6 :: What is incremintal loading?
What is batch processing?
What is crass reference table?
What is aggregate fact table?

Incremental loading means loading the ongoing changes in the OLTP.<br><br>Aggregate table contains the [measure] values ,aggregated /grouped/summed up to some level of hirarchy.<br>

Batch Processing means executing more than one session in single run at the same time. we can execute these session in 2 ways : <br>linear: exececuting one after another<br>parralel: executing more than one session at at time

8 :: What is metadata in context of a Datawarehouse and how it is important?

Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

9 :: What is static and local variable?

Static variable is not created on function stack but is created in
the initialized data segment and hence the variable can be shared across the multiple call of the same function. Usage of static variables within a function is not thread safe.

On the other hand local variable or auto variable is created on function stack and valid only in the context of the function call and is not shared across function calls.

10 :: What is the main difference between schema in RDBMS and schemas in DataWarehouse?

RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modelled

DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model

11 :: Explain a linked cube?

Linked cube in which a sub-set of the data can be analysed into great detail. The linking ensures that the data in the cubes remain consistent.

12 :: Explain What is the main difference between Inmon and Kimball philosophies of data warehousing?

Both differed in the concept of building teh datawarehosue..

According to Kimball ...

Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.

Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.



i.e.,

Kimball--First DataMarts--Combined way ---Datawarehouse

Inmon---First Datawarehouse--Later----Datamarts

13 :: Explain clearly how to explain any (sales) project in interview.actually feom where report developer work starts?pls reply as soon as possible?

if you are a Report developer
1,you have to specify the front end and back end tool used for creating the reports
2,Then you have to tell the purpose of the project..what you are going to acheive using the reports.
3,Then you can explain the backend part which is important.FOr example,you have to tell what are all the facts and dimension going to be used
4, Once the facts and dimension are identified yo might want to restructure the fact and dimension using the views.Also have to decide on the schemas you are going to use whether it s an snowflake schema or an star schema.
5,Once the schema has been finalised we might want to include certain KPI(Key Performance Indicators) into it.
6,Once the cube has been ready now we are into the deployement of the cube.
7,After the deployement has been done sucessfuly now we have to use our front end tool such as prolclarity and feed the cube into it.
8.Once the cube has been feed into the proclarity we can create various reports that can helps us for the business.In our case i had identified the top 10 customers in sales , successful saples period over time by using trend analysis and using ranking KPI for rank the customer.

14 :: Explain Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?

Star schema contains the dimesion tables mapped around one or more fact tables.

It is a denormalised model.

No need to use complicated joins.

Queries results fastly.

Snowflake schema

It is the normalised form of Star schema.

contains indepth joins ,bcas the tbales r splitted in to many pieces.We can easily do modification directly in the tables.

We hav to use comlicated joins ,since we hav more tables .

There will be some delay in processing the Query .

in star schema look like a centerally locate fact table and surrounded by dimention tables . its look like a star thats why people colled as a starschema
in star schema dimention tables are de_normalised but fact table is normalised table
in snow flake schema dimention tables are splitted one or more tables
dimention tables are quit bit a table bit space
here dimention tables are normalised
here having the more no of joins
so the performance degrades
as per the client requirement we used star or snow flake schema
client may ask like data normalised or de_normalised

15 :: Explain What are the methodologies of Data Warehousing?

Every company has methodology of their own. But to name a few SDLC Methodology, AIM methodology are stardadly used. Other methodologies are AMM, World class methodology and many more.

There are only two methodologies in building DW they are
1.Top down
2.Bottom-up

16 :: Explain the data types present in bo n wht happens if we implement view in the designer n report?

Three different data types: Dimensions,Measure and Detail.

View is nothing but an alias and it can be used to resolve the loops in the universe.

17 :: Explain VLDB?

The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB.

18 :: Explain the difference between view and materialized view?

View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.

Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.


Views: At run time, the query will be executed against the database.

Materialized views: The data for the materialized view query will be generated at compile time.

Mviews can be created by the following ways:
1. Immediate - mview will be created along with data.
2. Deferred - Mview structure alone will be created. Data will be populated only when you refresh the mview.

We have the option of refreshing the mviews. It means when the data in the master table used in the mview query changes, the refreshing of mviews helps to get the updated (new) data for the mview.

Mview will behave very much like a table. At run time, data will be retrieved from the result set just as retrieved from a table. The retrieval time will be very fast unlike the views.

19 :: Explain can a dimension table contains numeric values?

Yes dimension can have numeric values, that is surrogate Key which holds numeric value for unique identification of records in the dimension

20 :: Explain Dimensional Modelling?

Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

21 :: Explain What are the different types of data warehousing?

Types of data warehousing

1. Enterprise Data warehousing

2. ODS (Operational Data Store)

3. Data Mart

22 :: what is junk dimension?
what is the difference between junk dimension and degenerated dimension?

Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension.


Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..

23 :: Explain ER Diagram?

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.

Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.

Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:

it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables.
it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.

In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.

24 :: Explain What are the steps to build the data warehouse?

As far I know...

Gathering bussiness requiremnts
Identifying Sources
Identifying Facts
Defining Dimensions
Define Attribues
Redefine Dimensions & Attributes
Organise Attribute Hierarchy & Define Relationship
Assign Unique Identifiers
Additional convetions:Cardinality/Adding ratios


Gather requirements,
Identify Source tables,
Identify Destination tables,
Data modelling
Source to target matrix preparation,
ETL flow preparation and scheduling,
Reporting

25 :: What is Data warehosuing Hierarchy?

Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.

Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.

Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.

When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.

Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.

Levels
A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies.

Level Relationships
Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy.

Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.
Basics Data Warehouse Interview Questions and Answers
37 Basics Data Warehouse Interview Questions and Answers