Friday, May 22, 2009

BODI FOR INTERVIEW PURPOSE

1)Auto Correct Load :

It doesn’t allow duplicated data entering into the target table.It works like Type 1 ….Insert else Update the rows based on Non-matching and matching data respectively.

2) SQL :

Performs the indicated SQL query operation.
Datastore name : The name of the datastore that Data Integrator uses to access the tables referred to in SQL text.
Array fetch size : Indicates the number of rows retrieved in a single request to a source database. The default value is 1000. Higher numbers reduce requests,lowering network traffic, and possibly improve performance. The maximum value is 5000.
Join rank : The highest ranked source is accessed first to construct the join.
Cache : Select this check box to hold output from the transform in memory for use in
subsequent transforms.
SQL Text : The text of the SQL query.

3)Table Comparison :

Compares two data sets and produces the difference between them as a data
set with rows flagged as INSERT, UPDATE, or DELETE.

Row-by-row select —Look up the target table using SQL every time it receives an input row.
This option is best if the target table is large.
• Cached comparison table — To load the comparison table into memory. This option is best when the table fits into memory and you are comparing the entire target table.
• Sorted input — To read the comparison table in the order of the primary key column(s) using sequential read.This option improves performance because Data Integrator reads the comparison table only once.Add a query between the source and the Table_Comparison transform. Then, from the query’s input schema, drag the primary key columns into the Order By box of the query.

Input primary key column(s)
The input data set columns that uniquely identify each row. These columns
must be present in the comparison table with the same column names and
data types.

Input contains duplicate keys
Generated Key column
Detect Deleted row(s) from comparison table
Detect all rows
Detect row with largest generated key value.

If we are checking the option Input contains duplicate keys
then in the Generated Key column we will be selecting the Primary key which retrieves the largest generated key value in the duplicate data.

If we are selecting the option Detect Deleted row(s) from comparison table
It will flag the record/records with DELETE.

4)History Preserving :

The History_Preserving transform allows you to produce a new row in your target rather than updating an existing row. You can indicate in which columns the transform identifies changes to be preserved.
If the value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the input data set.




5)Heirarchy Flatenning :

Constructs a complete hierarchy from parent/child relationships, then produces a description of the hierarchy in vertically or horizontally flattened format.

Parent column, Child Column
Parent Attributes, Child Attributes.

6)Map-Operation :

The Map_Operation transform allows you to change operation codes on data sets to produce the desired output.
operation codes: INSERT, UPDATE, DELETE, NORMAL, or DISCARD.

Insert------ Normal/Insert/Update/Delete/ Discard
Update----- Normal/Insert/Update/ Delete/Discard
Normal----- Normal/Insert/Update/ Delete/Discard

if a row in the input data set has been updated in some previous operation in the data flow, you can use this transform to map the UPDATE operation to an INSERT. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target.

Map-Transform-1----------------- Normal----Update.
Map-Tranform-2……………….Update----Insert.



7)Query :

Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.
Select-----Distinct Rows
Outer Join -----Outer Source----Inner Source
Where………..You can put any kind of codition like lookup()/any built-in function which returns a single value.
By using Propose Joins DI automatically joins the columns that are having the same name and data type.
Group By ……Drag the columns that r needed to group.
Order By……..Drag the columns you want.

Can Use SQL().
Can use LookUp,LookUpEXT(),LookupSeq() or Built-In functions.
Can use relational operators in the conditions.
Can use Global variables.
Can use Custom Functions.

8)Merge:

Combines incoming data sets, producing a single output data set with the same schema as the input data sets.

All sources must have the same schema, including:
• The same number of columns
• The same column names
• The same data types of columns



9)Target Table Options :

Rows per commit :Specifies the transaction size in number of rows. If set to 1000, Data Integrator sends a commit to the underlying database every 1000 rows.
Column comparison : compare_by_position, compare_by_name
Delete data from table before loading.
Drop & Recreate Table.
Number of loaders Loading with one loader is known as “single loader loading.” Loading when the number of loaders is greater than one is known as “parallel loading.” The default number of loaders is 1. The maximum number of loaders is 5.
Use overflow file This option is used for recovery purposes. If a row cannot be loaded it is written to a file. When this option is selected, options are enabled for the file name and file format of Write Data/Write SQL.
Update Control : Ignore columns with null , Use input keys , Update key columns , Auto correct load
Transaction Control : Include in Transaction, Transaction Order
Transaction order indicates where this table falls in the loading order of the tables being loaded. By default, there is no ordering. All loaders have a transaction order of zero. If you specify orders among the tables, the loading operations are applied according to the order. Tables with the same transaction order are loaded together.

Other Tabs : Bulk Loader Options, Load Triggers, Pre Load Commands, Post Load Commands.

10) Use Input Keys :
By setting this D.I will use the primary keys of the source table in order to update the data in the target table.

11)Case :
Specifies multiple paths in a single transform
DEFAULT is the expression used when all other CASE expressions evaluate
to false. To enable DEFAULT, select the Produce default output with label.
when all expressions are false check box.
True
If the Row can be TRUE for one case only option is enabled, the row is
passed to the first case whose expression returns TRUE. Otherwise, the row
is passed to all the cases whose expression returns TRUE.

12)Pivot (Columns to Rows)

For each value in each pivot column, Data Integrator produces a row in the output data set. You can create pivot sets to specify more than one pivot column.

Pivot Sequence Column : For each row created from a pivot column, Data Integrator increments and stores a sequence number.
Non-Pivot Columns : The columns in the source that are to appear in the target without modification.
Pivot set : The number that identifies a pivot set. Each pivot set must have a a group of pivot columns,
unique Data field column and the Header column. Data Integrator automatically saves this information.
Data field column : Contains the pivoted data. This column contains all of the Pivot columns values.
Header column : lists the names of the columns where the corresponding data originated.

13)Reverse Pivot (Rows to Columns) :

Creates one row of data from several existing rows.
The Reverse Pivot transform allows you to combine data from several rows
into one row by creating new columns.


Input data is grouped

Non-pivot columns
The columns in the source table that will appear in the target table without modification.
Pivoted columns A set of columns will be created for each unique value in the Pivot axis column.
Pivot axis column
The column that determines what new columns are needed in the output table. At run time, a new column is created for each Pivoted column and each unique value in this column.
Duplicate value
Action taken when a collision occurs. A collision occurs when there is more than one row with the same key and value in the Pivot axis column. In this case, you can store either the first row or the last row, or you can abort the transform process.
Axis value
The value of the pivot axis column that represents a particular set of output columns.
Column Prefix
Text added to the front of the Pivoted column names when creating new column names for the rotated data.


14)LOOKUPS :
The following three type of lookups will produce a text string after giving inputs.

LOOKUP : Retrieves a value in a table or file based on the values in a different source
table or file.

LOOKUP EXT :

• Graphic editor in the function wizard.
• Retrieve a value in a table or file based on the values in a different source table or file,
but it also provides extended functionality allowing you to:
• Return multiple columns from a single lookup
• Choose from more operators to specify a lookup condition
• Perform multiple (including recursive) lookups
• Call lookup_ext in scripts and custom functions (which also lets you
reuse the lookup(s) packaged inside scripts)
• Define custom SQL, using the SQL_override parameter, to populate
the lookup cache, narrowing large quantities of data to only the sections
relevant for your lookup(s)
• Use lookup_ext to dynamically execute SQL
• Call lookup_ext, using the function wizard, in the query output mapping
to return multiple columns in a Query transform
• Design jobs to use lookup_ext without having to hard-code the name of
the translation file at design time.
• Use lookup_ext with memory datastore tables

LOOKUP SEQ :

Retrieves a value in a table or file based on the values in a different source
table or file and a particular sequence value.
NEW FUNCTION CALL :

** Works same as Normal Lookup Ext.

** The only difference is this can be editable in the wizard form where as in normal Lookup Ext which is not possible.

** No Need of Global/Local Variables usage in the output ports.


15) . CACHES

NO_CACHE — Does not cache any values.
• PRE_LOAD_CACHE — Preloads the result column and compare column
into memory (it loads the values before executing the lookup).
Use this option if the table can fit in memory.
• DEMAND_LOAD_CACHE — Loads the result column and compare
column into memory as the function executes.
Use this option when looking up highly repetitive values that are a small
subset of the data and when missing values are unlikely.


16) PERFORMANCE TUNING :

The following sections describe ways you can adjust Data Integrator
performance:
• Source-based performance options
• Using array fetch size
• Caching data
• Join ordering
• Minimizing extracted data
• Target-based performance options
• Loading method and rows per commit
• Staging tables to speed up auto-correct loads
• Job design performance options
• Improving throughput
• Maximizing the number of pushed-down operations
• Minimizing data type conversion
• Minimizing locale conversion
• Improving Informix repository performance


17) EXECUTING A DI JOB :

1)Manually
2)Thru Scheduling in WEB ADMIN.
3) Export Execution Command which is used to export the job into a batch file which is again executable.







18) LOCAL VARIABLES, GLOBAL VARIABLES, PARAMETERS.


Dataflow Level---Only Parameters.


Workflow Level---Parameters,Local Variables.

Job Level --- Local Variables, Global Variables.

You create local variables, parameters, and global variables using the Variables and Parameters window in the Designer.

You can set values for local or global variables in script objects. You can also
set global variable values using external job, execution, or schedule properties.


19) HEIRARCHY.

JOB—TRY CATCH---SCRIPTS—WORKFLOW---DATAFLOW

20)


Architecture :

While designing a job, you can run it from the Designer which tells the Job
Server to run the job. The Job Server gets the job from its associated
repository, then starts a Data Integrator engine to process the job.

When Data Integrator jobs are executed, the Job Server starts Data Integrator engine processes to perform data extraction, transformation, and movement. Data Integrator engine processes use parallel processing and in-memory data transformations.


Data Integrator Access Server

The Access Server is a real-time, request-reply message broker that collects message requests, routes them to a real-time service, and delivers a message reply within a user-specified time frame.


Data Integrator Administrator

The Administrator provides browser-based administration of Data Integrator
resources including:
• Scheduling, monitoring, and executing batch jobs
• Configuring, starting, and stopping real-time services
• Configuring Job Server, Access Server, and repository usage
• Configuring and managing adapters
• Managing users
• Publishing batch jobs and real-time services via Web services

Data Integrator Metadata Reports application

The Metadata Reports application provides browser-based analysis and
reporting capabilities on metadata that is associated with:
• your Data Integrator jobs
• other Business Objects applications associated with Data Integrator
Metadata Reports provide three modules for exploring your metadata:

• Impact and lineage analysis
• Operational dashboards
• Auto documentation

Operational Dashboard reports
Operational dashboard reports provide graphical depictions of Data Integrator job execution statistics. This feedback allows you to view at a glance the status and performance of your job executions for one or more repositories over a given time period. You can then use this information to streamline and
monitor your job scheduling and management for maximizing overall efficiency and performance.
Data Integrator Web Server

The Data Integrator Web Server supports browser access to the Administrator and the Metadata Reporting tool.
use a Tomcat servlet engine to support browser access.
Data Integrator Service
The Data Integrator Service is installed when Data Integrator Job and Access
Servers are installed. The Data Integrator Service starts Job Servers and
Access Servers when you restart your system.

Data Integrator SNMP Agent

Data Integrator error events can be communicated using applications supported by simple network management protocol (SNMP) for better error monitoring. Install a Data Integrator SNMP agent on any computer running a Job Server. The Data Integrator SNMP agent monitors and records information about the Job Servers and jobs running on the computer where the agent is installed. You can configure network management software (NMS) applications to communicate with the Data Integrator SNMP agent. Thus, you can use your NMS application to monitor the status of Data Integrator jobs.






SCD TYPE 2

1 way……….

In the DataFlow where the Intermediate Table and the Ultimate Target table are joined, we will be creating a port named DML_FLAG Which consists of {0/1/2} based on the following.

1) The SCD Columns that are retrieved from the Intermediate table & Ultimate Target table are compared over here.
2) If they are not matching we will assign with 2.
If they are matching we will assign with 1.
If the Surrogate Key of the Ultimate Target table is NULL then we will assign with 0.

So For DML_FLAG having value “2” we will have two dataflows one for insert and one for updating the target table.

Other way……….

SRC----TABLECOMPARISON----HISTORY PRESERVING----KEYGENERATION---TARGET.

Table Comparison-----Primary Key Columns,,,,,Compare columns(SCD Columns)
History Preserving----- Compare columns(SCD Columns)

No comments:

Post a Comment