Data Warehousing Interview Preparation Guide
Sharpen your Data Warehousing interview expertise with our handpicked 131 questions. These questions are specifically selected to challenge and enhance your knowledge in Data Warehousing. Perfect for all proficiency levels, they are key to your interview success. Get the free PDF download to access all 131 questions and excel in your Data Warehousing interview. This comprehensive guide is essential for effective study and confidence building.131 Data Warehousing Questions and Answers:
1 :: What is Meta data?
Metadata is data about data. E.g. if in data mart we are receiving any file. Then metadata will contain information like how many columns, file is fix width/limited, ordering of fields, data types of field etc.
2 :: Briefly state different between data ware house & data mart?
Data warehouse is made up of many datamarts. DWH contain many subject areas. However, data mart focuses on one subject area generally. E.g. If there will be DHW of bank then there can be one data mart for accounts, one for Loans etc. This is high-level definitions.
3 :: What is galaxy schema?
Galaxy schema is also known as fact constellation scheme. It requires no of fact tables to share dimension tables. In data, wares housing mainly the people are using the conceptual hierarchy.
4 :: Suppose you are filtering the rows using a filter transformation only the rows meet the condition pass to the target. Tell me where the rows will go that does not meet the condition.
Informatica filter transformation default value is 1 i.e. true. If you place a break point on filter transformation and run the mapping in a debugger mode, you will find these values 1 or 0 for each row passing through filter. If you change 0 to 1, the particular row will be passed to next stage.
5 :: After we create a SCD table, can we use that particular Dimension as a dimension table for Star Schema?
Yes.
6 :: What is Core Dimension?
Core Dimension is a Dimension table, which is used dedicated for single fact table or Datamart. Conform Dimension is a Dimension table which is used across fact tables or Data marts.
7 :: How much data hold in one universe.
Universe does not hold any data. However, practically the universe is known to have issues when the objects cross 6000.
8 :: Can any one explain about Core Dimension, Balanced Dimension, and Dirty Dimension?
Dirty Dimension is nothing but Junk Dimensions. Core Dimensions are dedicated for a fact table or Data mart. Conformed Dimensions are used across fact tables or Data marts.
9 :: Can any one explain the Hierarchies level Data warehousing.
In Data warehousing, levels are columns available in dimension table. Levels are having attributes. Hierarchies are used for navigational purpose; there are two types of Hierarchies. You can define hierarchies in top down or bottom up.
1. Natural Hierarchy: Best example is Time Dimension - Year, Month, Day etc. In natural Hierarchy definite relationship exists between each level
2. Navigational Hierarchy: You can have levels like
Ex - Production cost of Product, Sales Cost of Product.
Ex - Lead Time defined to procure, Actual Procurement time,
In this, two levels need not to have relationship. This Hierarchy is created for navigational purpose.
1. Natural Hierarchy: Best example is Time Dimension - Year, Month, Day etc. In natural Hierarchy definite relationship exists between each level
2. Navigational Hierarchy: You can have levels like
Ex - Production cost of Product, Sales Cost of Product.
Ex - Lead Time defined to procure, Actual Procurement time,
In this, two levels need not to have relationship. This Hierarchy is created for navigational purpose.
10 :: What is data cleaning? How can we do that?
Data cleaning is a self-explanatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood, and hence without the vision to consolidate the same in a single repository of information. In such a scenario, the possibilities of the following are there:
► Missing information for a column from one of the data sources;
► Inconsistent information among different data sources;
► Orphan records;
► Outlier data points;
► Different data types for the same information among various data sources, leading to improper conversion;
► Data breaching business rules
In order to ensure that the data warehouse is not infected by any of these discrepancies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.
► Missing information for a column from one of the data sources;
► Inconsistent information among different data sources;
► Orphan records;
► Outlier data points;
► Different data types for the same information among various data sources, leading to improper conversion;
► Data breaching business rules
In order to ensure that the data warehouse is not infected by any of these discrepancies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.