General Datawarehousing Interview Questions And Answers
Download General Datawarehousing Interview Questions and Answers PDF
Refine your General Datawarehousing interview skills with our 40 critical questions. Our questions cover a wide range of topics in General Datawarehousing to ensure you're well-prepared. Whether you're new to the field or have years of experience, these questions are designed to help you succeed. Don't miss out on our free PDF download, containing all 40 questions to help you succeed in your General Datawarehousing interview. It's an invaluable tool for reinforcing your knowledge and building confidence.
40 General Datawarehousing Questions and Answers:
General Datawarehousing Job Interview Questions Table of Contents:
1 :: What is Normalization, First Normal Form, Second Normal Form, Third Normal Form?
1.Normalization is process for assigning attributes to entities?Reducesdata redundancies?Helps eliminate data anomalies?Produces controlledredundancies to link tables
2.Normalization is the analysis offunctional dependency between attributes / data items of userviewsIt reduces a complex user view to a set of small andstable subgroups of fields / relations
1NF:Repeating groups must beeliminated, Dependencies can be identified, All key attributesdefined,No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies?No attribute dependent on a portionof primary key, Still possible to exhibit transitivedependency,Attributes may be functionally dependent on non-keyattributes
3NF: The Table is already in 2NF, Contains no transitivedependencies
Read More2.Normalization is the analysis offunctional dependency between attributes / data items of userviewsIt reduces a complex user view to a set of small andstable subgroups of fields / relations
1NF:Repeating groups must beeliminated, Dependencies can be identified, All key attributesdefined,No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies?No attribute dependent on a portionof primary key, Still possible to exhibit transitivedependency,Attributes may be functionally dependent on non-keyattributes
3NF: The Table is already in 2NF, Contains no transitivedependencies
2 :: Explain piconet?
The original Piconet was a USB-style expansion port on RM Nimbus computers.
These days, a piconet is an ad-hoc computer network linking a user group of devices using Bluetooth technology protocols to allow one master device to interconnect with up to seven active slave devices (because a three-bit MAC address is used). Up to 255 further slave devices can be inactive, or parked, which the master device can bring into active status at any time.
A piconet typically has a range of about 10 m and a transfer rate between about 400 and 700 kbit/s, depending on whether synchronous or asynchronous connection is used.
All Parked Slaves have 8 bit parked member address (PMA) and all the active slaves have 3 bit active member address (AMA). The AMA is used by the master to send packets to a specific slave and to identify that the slave has sent a response packet.
Read MoreThese days, a piconet is an ad-hoc computer network linking a user group of devices using Bluetooth technology protocols to allow one master device to interconnect with up to seven active slave devices (because a three-bit MAC address is used). Up to 255 further slave devices can be inactive, or parked, which the master device can bring into active status at any time.
A piconet typically has a range of about 10 m and a transfer rate between about 400 and 700 kbit/s, depending on whether synchronous or asynchronous connection is used.
All Parked Slaves have 8 bit parked member address (PMA) and all the active slaves have 3 bit active member address (AMA). The AMA is used by the master to send packets to a specific slave and to identify that the slave has sent a response packet.
3 :: Explain What are the Different methods of loading Dimension tables?
Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.
Read MoreBefore loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.
4 :: What is ODS?
1. ODS means Operational Data Store.
Submitted by Francis C. ( xxchen74 @ hotmail . com )
2. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.
Read MoreSubmitted by Francis C. ( xxchen74 @ hotmail . com )
2. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.
5 :: Explain me What are Data Marts?
Data Marts are designed to help manager make strategic decisions about their business.
Data Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:
1.Independent data marts ? sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart ? sources directly form enterprise data warehouses.
Read MoreData Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:
1.Independent data marts ? sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart ? sources directly form enterprise data warehouses.
6 :: What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.
Read More7 :: Explain me what is VLDB?
VLDB stands for Very Large DataBase.
It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information.
Read MoreIt is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information.
8 :: What is SCD1 , SCD2 , SCD3?
SCD Stands for Slowly changing dimensions.
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
SCD3: by adding new columns to target table we maintain historical information and current information.
Read MoreSCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
SCD3: by adding new columns to target table we maintain historical information and current information.
9 :: Explain What is What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.
EX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.
Read MoreEX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.
10 :: Explain ssl?
The Secure Sockets Layer (SSL) is a commonly-used protocol for managing the security of a message transmission on the Internet. SSL has recently been succeeded by Transport Layer Security (TLS), which is based on SSL. SSL uses a program layer located between the Internet's Hypertext Transfer Protocol (HTTP) and Transport Control Protocol (TCP) layers. SSL is included as part of both the Microsoft and Netscape browsers and most Web server products. Developed by Netscape, SSL also gained the support of Microsoft and other Internet client/server developers as well and became the de facto standard until evolving into Transport Layer Security. The "sockets" part of the term refers to the sockets method of passing data back and forth between a client and a server program in a network or between program layers in the same computer. SSL uses the public-and-private key encryption system from RSA, which also includes the use of a digital certificate.
TLS and SSL are an integral part of most Web browsers (clients) and Web servers. If a Web site is on a server that supports SSL, SSL can be enabled and specific Web pages can be identified as requiring SSL access. Any Web server can be enabled by using Netscape's SSLRef program library which can be downloaded for noncommercial use or licensed for commercial use.
TLS and SSL are not interoperable. However, a message sent with TLS can be handled by a client that handles SSL but not TLS.
Read MoreTLS and SSL are an integral part of most Web browsers (clients) and Web servers. If a Web site is on a server that supports SSL, SSL can be enabled and specific Web pages can be identified as requiring SSL access. Any Web server can be enabled by using Netscape's SSLRef program library which can be downloaded for noncommercial use or licensed for commercial use.
TLS and SSL are not interoperable. However, a message sent with TLS can be handled by a client that handles SSL but not TLS.
11 :: Explain What are the various Reporting tools in the Market?
1. MS-Excel
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity
Read More2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity
12 :: Explain the Difference between OLTP and OLAP?
Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
Read More1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
13 :: What is Snow Flake Schema?
Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.
Read More14 :: What is a lookup table?
A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition.
Read More15 :: What is the Differences between star and snowflake schemas?
Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.
Read MoreSnow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.
16 :: Explain What is real time data-warehousing?
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.
Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
Read MoreData warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
17 :: What are vaious ETL tools in the Market?
Various ETL tools used in market are:
1. Informatica
2. Data Stage
3. MS-SQL DTS(Integrated Services 2005)
4. Abinitio
5. SQL Loader
6. Sunopsis
7. Oracle Warehouse Bulider
8. Data Junction
Read More1. Informatica
2. Data Stage
3. MS-SQL DTS(Integrated Services 2005)
4. Abinitio
5. SQL Loader
6. Sunopsis
7. Oracle Warehouse Bulider
8. Data Junction
18 :: What is pre-emptive and non-pre-emptive?
Premptive means taken as a measure against something possible, anticipated, or feared; preventive; deterrent: a preemptive tactic against a ruthless business rival.
Non Pre-emptive is the exact opposite to Pre-emptive.No such preventive measures has been taken.
Read MoreNon Pre-emptive is the exact opposite to Pre-emptive.No such preventive measures has been taken.
19 :: Explain What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
Read MoreTo my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
20 :: What are the advantages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs?
Transaction logs write sequentially and don't need to be read at all. The ideal is to have each on RAID 1/0 because it has much better write performance than RAID 5.
RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking.
RAID 5 is best for data generally because of cost and the fact it provides great read capability.
Read MoreRAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking.
RAID 5 is best for data generally because of cost and the fact it provides great read capability.
21 :: What are modeling tools available in the Market?
There are a number of data modeling tools
Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
Read MoreTool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
22 :: What is data mining?
Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area.
Read More23 :: Explain What are slowly changing dimensions?
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
scd3: by adding new columns to target table we maintain historical information and current information
Read MoreSCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
scd3: by adding new columns to target table we maintain historical information and current information
24 :: What is is a Star Schema?
Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment.Usually a star schema consists of one or more dimension tables around a fact table which looks like a star,so that it got its name.
Read More25 :: Explain Which columns go to the fact table and which columns go the dimension table?
The Primary Key columns of the Tables(Entities) go to the Dimension Tables as Foreign Keys.
The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.
Read MoreThe Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.