Data Warehousing Basics Interview Questions And Answers
Download Basics Data Warehouse Interview Questions and Answers PDF
Refine your Basics Data Warehouse interview skills with our 37 critical questions. Each question is designed to test and expand your Basics Data Warehouse expertise. Suitable for all experience levels, these questions will help you prepare thoroughly. Don't miss out on our free PDF download, containing all 37 questions to help you succeed in your Basics Data Warehouse interview. It's an invaluable tool for reinforcing your knowledge and building confidence.
37 Basics Data Warehouse Questions and Answers:
Basics Data Warehouse Job Interview Questions Table of Contents:
1 :: Explain Differences between star and snowflake schemas?
star schema is a logical structure that can be arranged with fact and dimension tables
in a star formation.it looks like a star with fact table at the core of the star and the dimension tables along the spikes of the star.the dimension model is therefore called a STAR SCHEMA.
SNOWFLAKING is a method of normalizing the dimension tables in a star schema.
Star Schema: can have less number of joins.
Snow flake: can have more number of joins.
Star Schema : has data redundancy, so the query performace is good.
Snow flake: is normalized, so does not has data redundancey. can have perfomance issues.
Read Morein a star formation.it looks like a star with fact table at the core of the star and the dimension tables along the spikes of the star.the dimension model is therefore called a STAR SCHEMA.
SNOWFLAKING is a method of normalizing the dimension tables in a star schema.
Star Schema: can have less number of joins.
Snow flake: can have more number of joins.
Star Schema : has data redundancy, so the query performace is good.
Snow flake: is normalized, so does not has data redundancey. can have perfomance issues.
2 :: Explain the data type of the surrogate key?
Data type of the surrogate key is either integer or numeric or number
There is no data type for a Surrogate Key.
Requirement of a surrogate Key:UNIQUE
Recommended data type of a Surrogate key is NUMERIC.
Read MoreThere is no data type for a Surrogate Key.
Requirement of a surrogate Key:UNIQUE
Recommended data type of a Surrogate key is NUMERIC.
3 :: What is a CUBE in datawarehousing concept?
Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values.
Read More4 :: Explain Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so when ever we have the keys in a table .that itself implies that the table is in the normal form.
Read Moreso when ever we have the keys in a table .that itself implies that the table is in the normal form.
5 :: Explain What are the possible data marts in Retail sales?
Product information,sales information
Retailor_site_Dim, product_dim, Time_dim, Sales_staff_dim,Return_method_dim,
Read MoreRetailor_site_Dim, product_dim, Time_dim, Sales_staff_dim,Return_method_dim,
6 :: Explain degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which r neither dimesion nor measures is called degenerate dimensions.Ex : invoice id,empno
Degenerate dimensions are those without any attributes or measures
and they are included inside the fact table
Eg orderno
Read MoreDegenerate dimensions are those without any attributes or measures
and they are included inside the fact table
Eg orderno
7 :: Explain hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2.
It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don't care.
For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.
Read MoreIt may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don't care.
For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.
8 :: Explain conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly
IF A DIMENSION IS EQUALLY SHARED BETWEEN DIFFERENT DATAMARTS HAVING COOMON DATA VALUES IS CALLED CONFORMED DIMENSION.
Read MoreIF A DIMENSION IS EQUALLY SHARED BETWEEN DIFFERENT DATAMARTS HAVING COOMON DATA VALUES IS CALLED CONFORMED DIMENSION.
9 :: Explain BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.
In a BUS schema we would eventually have conformed dimensions and facts defined to be shared across all enterprise data marts. This way all Data Marts can use the conformed dimensions and facts without having them locally. This is the first step towards building an enterprise Data Warehouse from Kimball's perspective. For (e.g) we may have different data marts for Sales, Inventory and Marketing and we need common entities like Customer, Product etc to be seen across these data marts and hence would be ideal to have these as Conformed objects. The challenge here is that some times each line of business may have different definitions for these conformed objects and hence choosing conformed objects have to be designed with some extra care.
Read MoreIn a BUS schema we would eventually have conformed dimensions and facts defined to be shared across all enterprise data marts. This way all Data Marts can use the conformed dimensions and facts without having them locally. This is the first step towards building an enterprise Data Warehouse from Kimball's perspective. For (e.g) we may have different data marts for Sales, Inventory and Marketing and we need common entities like Customer, Product etc to be seen across these data marts and hence would be ideal to have these as Conformed objects. The challenge here is that some times each line of business may have different definitions for these conformed objects and hence choosing conformed objects have to be designed with some extra care.
10 :: Explain What are the advantages data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc.
Traditional approches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining.
Traditional approach is also algorithmic prespective and its advantage is convert larger algorithms into smaller ones.In this approach entire software is procedure
Read MoreTraditional approches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining.
Traditional approach is also algorithmic prespective and its advantage is convert larger algorithms into smaller ones.In this approach entire software is procedure
11 :: Explain Difference between E-R Modeling and Dimentional Modeling?
Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model.
E-R modeling is used for normalizing the OLTP database design.
Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.
Read MoreE-R modeling is used for normalizing the OLTP database design.
Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.
12 :: What is a Fact, Dimension, Measure?
Fact:It is a measure,EX:Sales,Accounts etc.
Dimension:It deals with the details of the data.EX:Geography,Time etc.
Measure:I deals with the quantity of the data..
Read MoreDimension:It deals with the details of the data.EX:Geography,Time etc.
Measure:I deals with the quantity of the data..
13 :: What is Data warehosuing Hierarchy?
Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.
Levels
A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies.
Level Relationships
Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.
Read MoreHierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.
Levels
A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies.
Level Relationships
Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.
14 :: Explain What are the steps to build the data warehouse?
As far I know...
Gathering bussiness requiremnts
Identifying Sources
Identifying Facts
Defining Dimensions
Define Attribues
Redefine Dimensions & Attributes
Organise Attribute Hierarchy & Define Relationship
Assign Unique Identifiers
Additional convetions:Cardinality/Adding ratios
Gather requirements,
Identify Source tables,
Identify Destination tables,
Data modelling
Source to target matrix preparation,
ETL flow preparation and scheduling,
Reporting
Read MoreGathering bussiness requiremnts
Identifying Sources
Identifying Facts
Defining Dimensions
Define Attribues
Redefine Dimensions & Attributes
Organise Attribute Hierarchy & Define Relationship
Assign Unique Identifiers
Additional convetions:Cardinality/Adding ratios
Gather requirements,
Identify Source tables,
Identify Destination tables,
Data modelling
Source to target matrix preparation,
ETL flow preparation and scheduling,
Reporting
15 :: Explain ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.
Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.
Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:
it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables.
it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.
Read MoreSimply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.
Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:
it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables.
it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.
16 :: what is junk dimension?
what is the difference between junk dimension and degenerated dimension?
Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..
Read MoreDegenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..
17 :: Explain What are the different types of data warehousing?
Types of data warehousing
1. Enterprise Data warehousing
2. ODS (Operational Data Store)
3. Data Mart
Read More1. Enterprise Data warehousing
2. ODS (Operational Data Store)
3. Data Mart
18 :: Explain Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.
Read MoreDimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.
19 :: Explain can a dimension table contains numeric values?
Yes dimension can have numeric values, that is surrogate Key which holds numeric value for unique identification of records in the dimension
Read More20 :: Explain the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.
Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.
Views: At run time, the query will be executed against the database.
Materialized views: The data for the materialized view query will be generated at compile time.
Mviews can be created by the following ways:
1. Immediate - mview will be created along with data.
2. Deferred - Mview structure alone will be created. Data will be populated only when you refresh the mview.
We have the option of refreshing the mviews. It means when the data in the master table used in the mview query changes, the refreshing of mviews helps to get the updated (new) data for the mview.
Mview will behave very much like a table. At run time, data will be retrieved from the result set just as retrieved from a table. The retrieval time will be very fast unlike the views.
Read MoreMaterialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.
Views: At run time, the query will be executed against the database.
Materialized views: The data for the materialized view query will be generated at compile time.
Mviews can be created by the following ways:
1. Immediate - mview will be created along with data.
2. Deferred - Mview structure alone will be created. Data will be populated only when you refresh the mview.
We have the option of refreshing the mviews. It means when the data in the master table used in the mview query changes, the refreshing of mviews helps to get the updated (new) data for the mview.
Mview will behave very much like a table. At run time, data will be retrieved from the result set just as retrieved from a table. The retrieval time will be very fast unlike the views.
21 :: Explain VLDB?
The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB.
Read More22 :: Explain the data types present in bo n wht happens if we implement view in the designer n report?
Three different data types: Dimensions,Measure and Detail.
View is nothing but an alias and it can be used to resolve the loops in the universe.
Read MoreView is nothing but an alias and it can be used to resolve the loops in the universe.
23 :: Explain What are the methodologies of Data Warehousing?
Every company has methodology of their own. But to name a few SDLC Methodology, AIM methodology are stardadly used. Other methodologies are AMM, World class methodology and many more.
There are only two methodologies in building DW they are
1.Top down
2.Bottom-up
Read MoreThere are only two methodologies in building DW they are
1.Top down
2.Bottom-up
24 :: Explain Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?
Star schema contains the dimesion tables mapped around one or more fact tables.
It is a denormalised model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalised form of Star schema.
contains indepth joins ,bcas the tbales r splitted in to many pieces.We can easily do modification directly in the tables.
We hav to use comlicated joins ,since we hav more tables .
There will be some delay in processing the Query .
in star schema look like a centerally locate fact table and surrounded by dimention tables . its look like a star thats why people colled as a starschema
in star schema dimention tables are de_normalised but fact table is normalised table
in snow flake schema dimention tables are splitted one or more tables
dimention tables are quit bit a table bit space
here dimention tables are normalised
here having the more no of joins
so the performance degrades
as per the client requirement we used star or snow flake schema
client may ask like data normalised or de_normalised
Read MoreIt is a denormalised model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalised form of Star schema.
contains indepth joins ,bcas the tbales r splitted in to many pieces.We can easily do modification directly in the tables.
We hav to use comlicated joins ,since we hav more tables .
There will be some delay in processing the Query .
in star schema look like a centerally locate fact table and surrounded by dimention tables . its look like a star thats why people colled as a starschema
in star schema dimention tables are de_normalised but fact table is normalised table
in snow flake schema dimention tables are splitted one or more tables
dimention tables are quit bit a table bit space
here dimention tables are normalised
here having the more no of joins
so the performance degrades
as per the client requirement we used star or snow flake schema
client may ask like data normalised or de_normalised
25 :: Explain clearly how to explain any (sales) project in interview.actually feom where report developer work starts?pls reply as soon as possible?
if you are a Report developer
1,you have to specify the front end and back end tool used for creating the reports
2,Then you have to tell the purpose of the project..what you are going to acheive using the reports.
3,Then you can explain the backend part which is important.FOr example,you have to tell what are all the facts and dimension going to be used
4, Once the facts and dimension are identified yo might want to restructure the fact and dimension using the views.Also have to decide on the schemas you are going to use whether it s an snowflake schema or an star schema.
5,Once the schema has been finalised we might want to include certain KPI(Key Performance Indicators) into it.
6,Once the cube has been ready now we are into the deployement of the cube.
7,After the deployement has been done sucessfuly now we have to use our front end tool such as prolclarity and feed the cube into it.
8.Once the cube has been feed into the proclarity we can create various reports that can helps us for the business.In our case i had identified the top 10 customers in sales , successful saples period over time by using trend analysis and using ranking KPI for rank the customer.
Read More1,you have to specify the front end and back end tool used for creating the reports
2,Then you have to tell the purpose of the project..what you are going to acheive using the reports.
3,Then you can explain the backend part which is important.FOr example,you have to tell what are all the facts and dimension going to be used
4, Once the facts and dimension are identified yo might want to restructure the fact and dimension using the views.Also have to decide on the schemas you are going to use whether it s an snowflake schema or an star schema.
5,Once the schema has been finalised we might want to include certain KPI(Key Performance Indicators) into it.
6,Once the cube has been ready now we are into the deployement of the cube.
7,After the deployement has been done sucessfuly now we have to use our front end tool such as prolclarity and feed the cube into it.
8.Once the cube has been feed into the proclarity we can create various reports that can helps us for the business.In our case i had identified the top 10 customers in sales , successful saples period over time by using trend analysis and using ranking KPI for rank the customer.