Difficult Data Architect Interview Preparation Guide
Download PDF

Data Architect Frequently Asked Questions in various Data Architect job interviews by interviewer. The set of questions are here to ensures that you offer a perfect answer posed to you. So get preparation for your new job interview

41 Data Architect Questions and Answers:

Table of Contents:

Difficult  Data Architect Job Interview Questions and Answers
Difficult Data Architect Job Interview Questions and Answers

1 :: Tell us what is degenerated dimension?

A degenerated dimension is a dimension that is derived from fact table and does not have its own dimension table.

A dimension key, such as transaction number, receipt number, Invoice number etc. does not have any more associated attributes and hence can not be designed as a dimension table.

2 :: Explain me what do you understand by dimension and attribute?

Dimensions represent qualitative data. For example– plan, product, class are all dimensions. A dimension table contains descriptive or textual attributes. For example, product category & product name are the attributes of product dimension.

3 :: Please explain semi Additive Measures?

Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.

4 :: Please explain what is dimensional modeling?

Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualifies the data. The goal of Dimensional model is not to achieve high degree of normalization but to facilitate easy and faster data retrieval.

5 :: What is role Playing Dimension?

These are the dimensions which are utilized for multiple purposes in the same database. For example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role playing dimension. The primary key of Date dimension will be associated with multiple foreign keys in the fact table.

6 :: What is the conceptual model?

The conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.

7 :: Explain me what is the difference between OLTP and OLAP?

OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.

OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.

8 :: Can you explain me what is dimension?

A dimension is something that qualifies a quantity (measure).

For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.

9 :: Can you tell me what is SCD?

SCD stands for slowly changing dimension, i.e. the dimensions where data is slowly changing. These can be of many types, e.g. Type 0, Type 1, Type 2, Type 3 and Type 6, although Type 1, 2 and 3 are most common.

10 :: What is slowly Changing Dimension (SCD)?

These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs

11 :: Tell me which schema is better – star or snowflake?

The choice of a schema always depends upon the project requirements & scenarios.

Since star schema is in de-normalized form, you require fewer joins for a query. The query is simple and runs faster in a star schema. Coming to the snowflake schema, since it is in normalized form, it will require a number of joins as compared to a star schema, the query will be complex and execution will be slower than star schema.

12 :: Explain me what is a role-playing dimension?

Dimensions are often reused for multiple applications within the same database with different contextual meaning. For instance, a "Date" dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a 'role-playing dimension'

13 :: Tell us what is data mart?

Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.

14 :: Can you distinguish between OLTP and OLAP?

OLTP stands for Online Transaction Processing system & OLAP stands for Online Analytical processing system. OLTP maintains the transactional data of the business & is highly normalized generally. On the contrary, OLAP is for analysis and reporting purpose & it is in de-normalized form.

This difference between OLAP and OLTP also gives you the way to choosing the design of schema. If your system is OLTP, you should go with star schema design and if your system is OLAP, you should go with snowflake schema.

15 :: Tell me what are the different design schemas in Data Modelling? Explain with the example?

There are two different kinds of schemas in data modeling

Star Schema
Snowflake Schema
Now I will be explaining each of these schemas one by one.

The simplest of the schemas is star schema where we have a fact table in the center which references multiple dimension tables around it.

All the dimension tables are connected to the fact table. The primary key in all dimension tables acts as a foreign key in the fact table.

16 :: Explain me non-additive Measures?

Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.

17 :: Tell us what are the benefits of data warehouse?

A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.

18 :: Tell me what are the different types of dimension?

In a data warehouse model, dimension can be of following types,

☛ Conformed Dimension
☛ Junk Dimension
☛ Degenerated Dimension
☛ Role Playing Dimension


Based on how frequently the data inside a dimension changes, we can further classify dimension as
☛ Unchanging or static dimension (UCD)
☛ Slowly changing dimension (SCD)
☛ Rapidly changing Dimension (RCD)

19 :: What is the logical model?

The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.

20 :: Tell me your idea regarding factless fact? And why do we use it?

Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.

At times, certain situations may arise in the business where you need to have factless fact table. For example, suppose you are maintaining an employee attendance record system, you can have a factless fact table having three keys.

21 :: Explain me what is your understanding of different data models?

There are three types of data models – conceptual, logical and physical. The level of complexity and detail increases from conceptual to logical to a physical data model.

The conceptual model shows a very basic high level of design while the physical data model shows a very detailed view of design.

The conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.

The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.

The physical data model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.

22 :: Tell me what is snow-flake schema?

This is another logical arrangement of tables in dimensional modeling where a centralized fact table references number of other dimension tables; however, those dimension tables are further normalized into multiple related tables.

Consider a fact table that stores sales quantity for each product and customer on a certain time. Sales quantity will be the measure here and keys from customer, product and time dimension tables will flow into the fact table. Additionally all the products can be further grouped under different product families stored in a different table so that primary key of product family tables also goes into the product table as a foreign key. Such construct will be called a snow-flake schema as product table is further snow-flaked into product family.

23 :: Please explain what are the different types of dimensions you have come across? Explain each of them in detail with an example?

There are typically five types of dimensions.

1) Conformed dimensions: A Dimension that is utilized as a part of different areas is called as conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses. For example, if subscriber dimension is connected to two fact tables – billing and claim then the subscriber dimension would be treated as conformed dimension.

2) Junk Dimension: It is a dimension table comprising of attributes that don’t have a place in the fact table or in any of the current dimension tables. Generally, these are the properties like flags or indicators. For example, it can be member eligibility flag set as ‘Y’ or ‘N’ or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called as junk dimension having unique junk IDs with a possible combination of all the indicator values.

3) Role Playing Dimension: These are the dimensions which are utilized for multiple purposes in the same database. For example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role playing dimension. The primary key of Date dimension will be associated with multiple foreign keys in the fact table.

4) Slowly Changing Dimension (SCD): These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs

24 :: Tell us what is a mini dimension?

Mini dimensions can be used to handle rapidly changing dimension scenario. If a dimension has a huge number of rapidly changing attributes it is better to separate those attributes in different table called mini dimension. This is done because if the main dimension table is designed as SCD type 2, the table will soon outgrow in size and create performance issues. It is better to segregate the rapidly changing members in different table thereby keeping the main dimension table small and performing.

25 :: Tell me what is Star-schema?

This schema is used in data warehouse models where one centralized fact table references number of dimension tables so as the keys (primary key) from all the dimension tables flow into the fact table (as foreign key) where measures are stored. This entity-relationship diagram looks like a star, hence the name.
Data Architect Interview Questions and Answers
41 Data Architect Interview Questions and Answers