Important ETL (Extract, transform, load) Interview Preparation Guide
Download PDF

Extract, transform, load (ETL) Frequently Asked Questions in various ETL (Extract, transform, load) Interviews asked by the interviewer. So learn Extract, transform, load (ETL) with the help of this ETL (Extract, transform, load) Interview questions and answers guide and feel free to comment as your suggestions, questions and answers on any ETL (Extract, transform, load) Interview Question or answer by the comment feature available on the page.

37 Extract, transform, load (ETL) Questions and Answers:

Table of Contents:

Important  Extract, transform, load (ETL) Job Interview Questions and Answers
Important Extract, transform, load (ETL) Job Interview Questions and Answers

1 :: Explain What is ODS (operation data source)?

ODS - Operational Data Store.

ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of granularity.

Once data was poopulated in ODS aggregated data will be loaded into into EDW through ODS.

ODS: ODS is also a simalar small DWH which will help analyst to analysis the bussiness.It will have data for less number of days. generally it will be around 30-45 days. Like DWH here also we will primary keys will be genrated, error and reject handling will be done.

2 :: Explain if a flat file cotains 1000 records how can i get first and last records only?

By using Aggregator transformation with first and last functions we can get first and last record.

3 :: What are the various tools? Explain Name a few?

A few more ...
- Cognos Decision Stream
- Oracle Warehouse Builder
- Business Objects XI (Extreme Insight)
- SAP Business Warehouse
- SAS Enterprise ETL Server

Tools of SAS are :-
- SAS Information delivery portal.
- SAS data integration studio.
- Business Intelligence.
-SAS Enterprise Guide.

4 :: What is latest version of Power Center / Power Mart?

he Latest Version is 7.2

informatica 8.0

5 :: How to calculate fact table granularity?

Granularity , is the level of detail in which the fact table is describing, for example if we are making time analysis so the granularity maybe day based - month based or year based

6 :: Explain What are the modules in Power Mart?

1. PowerMart Designer
2. Server
3. Server Manager
4. Repository
5. Repository Manager

7 :: How to extract SAP data Using Informatica? What is ABAP? What are IDOCS?

SAP Data can be loaded into Informatica in the form of Flat files.
Condition :
Informatica source qualifier column sequence must match the SAP source file.

8 :: Explain the difference between etl tool and olap tools?

ETL tool is ment for extraction data from the legecy systems and load into specified data base with some process of cleansing data.

ex: Informatica,data stage ....etc

OLAP is ment for Reporting purpose.in OLAP data avaliable in Mulitidimectional model. so that u can write smple query to extract data fro the data base.

ex: Businee objects,Cognos....etc

9 :: Explain What are the various transformation available?

Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
Advanced External Procedure Transformation
External Transformation

10 :: Explain the process of extracting data from source systems,storing in ODS and how data modeling is done?

There are various ways of Extracting Data from Source Systems.For example , You can use a DATA step, an Import Process .It depends with your input data styles. What kind of File/database it is residing in. Storing ur data in an ODS can be done thru an ODS stmt/export stmt/FILE stmt, again which depends on the file & data format ,You want your output to be in.

11 :: Explain Techniques of Error Handling - Ignore , Rejecting bad records to a flat file , loading the records and reviewing them (default values)?

Rejection of records either at the database due to constraint key violation or the informatica server when writing data into target table.These rejected records we can find in the badfiles folder where a reject file will be created for a session.we can check why a record has been rejected.And this bad file contains first column a row indicator and second column a column indicator.
These row indicators or of four types
D-valid data,
O-overflowed data,
N-null data,
T- Truncated data,
And depending on these indicators we can changes to load data successfully to target.

12 :: Explain What are the different versions of Informatica?

Here are some popular versions of Informatica.

Informatica Powercenter 4.1, Informatica Powercenter 5.1, Powercenter Informatica 6.1.2, Informatica Powercenter 7.1.2, Informatica Powercenter 8.1, Informatica Powercenter 8.5, Informatica Powercenter 8.6.

14 :: Explain When do we Analyze the tables? How do we do it?

The ANALYZE statement allows you to validate and compute statistics for an index, table, or cluster. These statistics are used by the cost-based optimizer when it calculates the most efficient plan for retrieval. In addition to its role in statement optimization, ANALYZE also helps in validating object structures and in managing space in your system. You can choose the following operations: COMPUTER, ESTIMATE, and DELETE. Early version of Oracle7 produced unpredicatable results when the ESTIMATE operation was used. It is best to compute
your statistics.

EX:

select OWNER,
sum(decode(nvl(NUM_ROWS,9999), 9999,0,1)) analyzed,
sum(decode(nvl(NUM_ROWS,9999), 9999,1,0)) not_analyzed,
count(TABLE_NAME) total
from dba_tables
where OWNER not in ('SYS', 'SYSTEM')
group by OWNER

15 :: Explain and Compare ETL & Manual development?

ETL - The process of extracting data from multiple sources.(ex. flat files,XML, COBOL, SAP etc) is more simpler with the help of tools.
Manual - Loading the data other than flat files and oracle table need more effort.

ETL - High and clear visibilty of logic.
Manual - complex and not so user friendly visibilty of logic.

ETL - Contains Meta data and changes can be done easily.
Manual - No Meta data concept and changes needs more effort.

ETL- Error hadling,log summary and load progess makes life easier for developer and maintainer.
Manual - need maximum effort from maintainance point of view.

ETL - Can handle Historic data very well.
Manual - as data grows the processing time degrads.

These are some differences b/w manual and ETL developement.

17 :: Explain the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA?

he best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.

18 :: Explain the different Lookup methods used in Informatica?

1. Connected lookup

2. Unconnected lookup

Connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values.it does not contain retun port.

Unconnected lookup can return only one column. it containn return port.

19 :: How can we determine what records to extract?

When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current mth) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes.

20 :: Explain Where do we use semi and non additive facts?

Additve: A masure can participate arithmatic calulatons using all or any demensions.

Ex: Sales profit

Semi additive: A masure can participate arithmatic calulatons using some demensions.

Ex: Sales amount

Non Additve:A masure can't participate arithmatic calulatons using demensions.

Ex: temparature

21 :: Explain What is the difference between Power Center & Power Mart?

PowerCenter - ability to organize repositiries into a data mart domain and share metadata across repositiries.

PowerMart - only local repositiry can be created.

22 :: Explain What is a three tier data warehouse?

A data warehouse can be thought of as a three-tier system in which a middle system provides usable data in a secure way to end users. On either side of this middle system are the end users and the back-end data stores.

23 :: Tell me what is Full load & Incremental or Refresh load?

Full Load: completely erasing the contents of one or more tables and reloading with fresh data.

Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule

Refresh Load: the table will be truncated and data will be loaded again. Here we use to load static dimension table or type tables using this method.
Incremental Load: It is a method to capture on the newly created or updated record. Based upon the falg or Date this load will be performed.
Full Load: when we are loading the data for first time, either it may be a base load or history all the set of records will be loaded at a strech depends upon the volume.

24 :: Explain a mapping, session, worklet, workflow, mapplet?

A mapping represents dataflow from sources to targets.
A mapplet creates or configures a set of transformations.

A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks.

A worklet is an object that represents a set of tasks.

A session is a set of instructions that describe how and when to move data from sources to targets.

25 :: Explain What is Full load & Incremental or Refresh load?

Full Load: completely erasing the contents of one or more tables and reloading with fresh data.

Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.

first time what we are loading the data is called initial load or full load.
secondtime or modified data waht ewe are loading is called as incremental load or delta load.
Extract, transform, load (ETL) Interview Questions and Answers
37 Extract, transform, load (ETL) Interview Questions and Answers