May 6, 2020

Partitions


Informatica PowerCenter Session Partitioning can be effectively used for parallel data processing and achieve faster data delivery. Parallel data processing performance is heavily depending on the additional hardware power available. In additional to that, it is important to choose the appropriate partitioning algorithm or partition type. In this article lets discuss the optimal session partition settings.

Business Use Case

Partition Tutorial Series
Part I : Partition Introduction.
Part II : Partition Implementation.
Part III : Dynamic Partition.
Lets consider a business use case to explain the implementation of appropriate partition algorithms and configuration.
Daily sales data generated from three sales region need to be loaded into an Oracle data warehouse. The sales volume from three different regions varies a lot, hence the number of records processed for every region varies a lot. The warehouse target table is partitioned based on product line.

Below is the simple structure of the mapping to get the assumed functionality.
Implementing Different Informatica PowerCenter Session Partitioning Algorithms

Pass-through Partition

A pass-through partition at the source qualifier transformation is used to split the source data into three different parallel processing data sets. Below image shows how to setup pass through partition for three different sales regions.
Implementing Different Informatica PowerCenter Session Partitioning Algorithms
Once the partition is setup at the source qualifier, you get additional Source Filter option to restrict the data which corresponds to each partition. Be sure to provide the filter condition such that same data is not processed through more than one partition and data is not duplicated. Below image shows three additional Source Filters, one per each partition.
Implementing Different Informatica PowerCenter Session Partitioning Algorithms

Round Robin Partition

Since the data volume from three sales region is not same, use round robin partition algorithm at the next transformation in pipeline. So that the data is equally distributed among the three partitions and the processing load is equally distributed. Round robin partition can be setup as shown in below image.
Implementing Different Informatica PowerCenter Session Partitioning Algorithms

Hash Auto Key Partition

At the Aggregator transformation, data need to redistribute across the partitions to avoid the potential splitting of aggregator groups. Hash auto key partition algorithm will make sure the data from different partition is redistributed such that records with the same key is in the same partition. This algorithm will identify the keys based on the group key provided in the transformation.

Processing records of the same aggregator group in different partition will result in wrong result.  
Implementing Different Informatica PowerCenter Session Partitioning Algorithms

Key Range Partition

Use Key range partition when required to distribute the records among partitions based on the range of values of a port or multiple ports.

Here the target table is range partitioned on product line. Create a range partition on target definition on PRODUCT_LINE_ID port to get the best write throughput.
Implementing Different Informatica PowerCenter Session Partitioning AlgorithmsBelow images shows the steps involved in setting up the key range partition.

Click on Edit Keys to define the ports on which the key range partition is defined.
Implementing Different Informatica PowerCenter Session Partitioning Algorithms
A pop up window shows the list of ports in the transformation, Choose the ports  on which the key range partition is required.
Implementing Different Informatica PowerCenter Session Partitioning Algorithms
Now give the value start and end range for each partition as shown below.
Implementing Different Informatica PowerCenter Session Partitioning Algorithms
We did not have to use Hash User Key Partition and Database Partition algorithm in the use case discussed here. 

Hash User Key partition algorithm will let you choose the ports to group rows among partitions. This algorithm can be used in most of the places where hash auto key algorithm is appropriate.

Database partition algorithm queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database. This algorithm can be applied either on the source or target definition.

Reference: link

Apr 25, 2020

PL/SQL Interview Q&A


1.What are the various types of Exceptions?
User defined and Predefined Exceptions.

2.Can we define exceptions twice in same block?
No.

3.What is the difference between a procedure and a function?
Functions return a single variable by value whereas procedures do not return any variable by value. Rather they return multiple variables by passing variables by reference through their OUT parameter.

4.Can you have two functions with the same name in a PL/SQL block?
Yes.

5.Can you have two stored functions with the same name?
Yes.

6.Can you call a stored function in the constraint of a table?
No.

7.What are the various types of parameter modes in a procedure?
IN, OUT AND INOUT.

8.What is Over Loading and what are its restrictions?
Over loading means an object performing different functions depending upon the number of parameters or the data type of the parameters passed to it.

9.Can functions be over loaded?
Yes.

10.Can 2 functions have same name & input parameters but differ only by return data type?
No.

11.What are the constructs of a procedure, function or a package?
The constructs of a procedure, function or a package are:
variables and constants, cursors, exceptions

12.Why create or Replace and not Drop and recreate procedures?
So that Grants are not dropped.

13.Can you pass parameters in packages? How?
Yes. You can pass parameters to procedures or functions in a package.

 14.What are the parts of a database trigger?
     The parts of a trigger are:
     A triggering event or statement
     A trigger restriction
    A trigger action

15.What are the various types of database triggers ?
There are 12 types of triggers, they are combination of :
Insert, Delete and Update Triggers.
Before and After Triggers.
Row and Statement Triggers.
(3*2*2=12)

16.What is the advantage of a stored procedure over a database trigger ?
We have control over the firing of a stored procedure but we have no control over the firing of a trigger.

17.What is the maximum no. of statements that can be specified in a trigger statement ?
One.

18.Can views be specified in a trigger statement?
No

19.What are the values of :new and :old in Insert/Delete/Update Triggers ?
INSERT : new = new value, old = NULL
DELETE : new = NULL, old = old value
UPDATE : new = new value, old = old value

20.What are cascading triggers? What is the maximum no of cascading triggers at a time?
When a statement in a trigger body causes another trigger to be fired, the triggers are said to be cascading. Max = 32.

21.What are mutating triggers ?
A trigger giving a SELECT on the table on which the trigger is written.

22.What are constraining triggers ?
A trigger giving an Insert/Updat e on a table having referential integrity constraint on the triggering table.

23.Describe Oracle database's physical and logical structure ?
Physical : Data files, Redo Log files, Control file.
Logical : Tables, Views, Tablespaces, etc.

24.Can you increase the size of a tablespace ? How ?
Yes, by adding datafiles to it.


25.Can you increase the size of datafiles? How?
No (for Oracle 7.0)
Yes (for Oracle 7.3 onwards by using the Resize clause ----- Confirm !!).

26.What is the use of Control files ?
Contains pointers to locations of various data files, redo log files, etc.

27.What is the use of Data Dictionary ?
Used by Oracle to store information about various physical and logical Oracle structures e.g. Tables, Tablespaces, datafiles, etc

28.What are the advantages of clusters ?
Access time reduced for joins.

29.What are the disadvantages of clusters ?
The time for Insert increases.

30.Can Long/Long RAW be clustered ?
No.

31.Can null keys be entered in cluster index, normal index ?
Yes.

32.Can Check constraint be used for self referential integrity ? How ?
Yes. In the CHECK condition for a column of a table, we can reference some other column of the same table and thus enforce self referential integrity.

33.What are the min. extents allocated to a rollback extent ?
Two

34.What are the states of a rollback segment ? What is the difference between partly available and needs recovery ?
The various states of a rollback segment are :
ONLINE, OFFLINE, PARTLY AVAILABLE, NEEDS RECOVERY and INVALID.

35.What is the difference between unique key and primary key ?
Unique key can be null; Primary key cannot be null.

36.An insert statement followed by a create table statement followed by rollback ? Will the rows be inserted ?
No.

37.Can you define multiple savepoints ?
Yes.

38.Can you Rollback to any savepoint ?
Yes.

39.What is the maximum no. of columns a table can have ?
254.

40.What is the significance of the & and && operators in PL SQL ?
The & operator means that the PL SQL block requires user input for a variable. The && operator means that the value of this variable should be the same as inputted by the user previously for this same variable.
If a transaction is very large, and the rollback segment is not able to hold the rollback information, then will the transaction span across different rollback segments or will it terminate ?
It will terminate (Please check ).

41.Can you pass a parameter to a cursor ?
Explicit cursors can take parameters, as the example below shows. A cursor parameter can appear in a query wherever a constant can appear.
CURSOR c1 (median IN NUMBER) IS
SELECT job, ename FROM emp WHERE sal > median;

42.What are the various types of RollBack Segments ?
Public Available to all instances
Private Available to specific instance

43.Can you use %RowCount as a parameter to a cursor ?
Yes

44.Is the query below allowed :
Select sal, ename Into x From emp Where ename = 'KING'
(Where x is a record of Number(4) and Char(15))
Yes

45.Is the assignment given below allowed :
ABC = PQR (Where ABC and PQR are records)
Yes

46.Is this for loop allowed :
For x in &Start..&End Loop
Yes

47.How many rows will the following SQL return :
Select * from emp Where rownum < 10;
9 rows

48.How many rows will the following SQL return :
Select * from emp Where rownum = 10;
No rows

49.Which symbol preceeds the path to the table in the remote database ?
@
50.Are views automatically updated when base tables are updated ?
Yes

51.Can a trigger written for a view ?
No

52.If all the values from a cursor have been fetched and another fetch is issued, the output will be : error, last record or first record ?
Last Record

53.A table has the following data : [5, Null, 10]. What will the average function return ?
7.5

54.Is Sysdate a system variable or a system function?
System Function

55.Consider a sequence whose currval is 1 and gets incremented by 1 by using the nextval reference we get the next number 2. Suppose at this point we issue an rollback and again issue a nextval. What will the output be ?
3

56.Definition of relational DataBase by Dr. Codd (IBM)?
A Relational Database is a database where all data visible to the user is organized strictly as tables of data values and where all database operations work on these tables.

57.What is Multi Threaded Server (MTA) ?
In a Single Threaded Architecture (or a dedicated server configuration) the database manager creates a separate process for each database user. But in MTA the database manager can assign multiple users (multiple user processes) to a single dispatcher (server process), a controlling process that queues request for work thus reducing the databases memory requirement and resources.
58.Which are initial RDBMS, Hierarchical & N/w database ?
RDBMS - R system
Hierarchical - IMS
N/W - DBTG
59.Difference between Oracle 6 and Oracle 7
ORACLE 7 ORACLE 6
Cost based optimizer · Rule based optimizer
Shared SQL Area · SQL area allocated for each user
Multi Threaded Server · Single Threaded Server
Hash Clusters · Only B-Tree indexing
Roll back Size Adjustment · No provision
Truncate command · No provision
Database Integrity
Constraints · Provision at Application Level
Stored procedures, functions
packages & triggers · No provision
Resource profile limit. It
prevents user from running
away with system resources · No provision
Distributed Database · Distributed Query
Table replication & snapshots· No provision
Client/Server Tech. · No provision

60.What is Functional Dependency
Given a relation R, attribute Y of R is functionally dependent on attribute X of R if and only if each X-value has associated with it precisely one -Y value in R

61.What is Auditing ?
The database has the ability to audit all actions that take place within it.
a) Login attempts, b) Object Accesss, c) Database Action
Result of Greatest(1,NULL) or Least(1,NULL)
NULL

62.While designing in client/server what are the 2 imp. things to be considered ?
Network Overhead (traffic), Speed and Load of client server

63.What are the disadvantages of SQL ?
Disadvantages of SQL are :
· Cannot drop a field
· Cannot rename a field
· Cannot manage memory
· Procedural Language option not provided
· Index on view or index on index not provided
· View updation problem

64.When to create indexes ?
To be created when table is queried for less than 2% or 4% to 25% of the table rows.

65.How can you avoid indexes ?
TO make index access path unavailable
· Use FULL hint to optimizer for full table scan
· Use INDEX or AND-EQUAL hint to optimizer to use one index or set to
indexes instead of another.
· Use an expression in the Where Clause of the SQL.

66.What is the result of the following SQL :
Select 1 from dual
UNION
Select 'A' from dual;
Error

67.Can database trigger written on synonym of a table and if it can be then what would be the effect if original table is accessed.
Yes, database trigger would fire.

68.Can you alter synonym of view or view ?
No

69.Can you create index on view?.
No.

70.What is the difference between a view and a synonym?
Synonym is just a second name of table used for multiple link of database. View can be created with many tables, and with virtual columns and with conditions. But synonym can be on view.

71.What is the difference between alias and synonym?
Alias is temporary and used with one query. Synonym is permanent and not used as alias.

72.What is the effect of synonym and table name used in same Select statement?
Valid

73.What's the length of SQL integer?
32 bit length

74.What is the difference between foreign key and reference key?
Foreign key is the key i.e. attribute which refers to another table primary key.
Reference key is the primary key of table referred by another table.

75.Can dual table be deleted, dropped or altered or updated or inserted ?
Yes

76.If content of dual is updated to some value computation takes place or not ?
Yes

77.If any other table same as dual is created would it act similar to dual?
Yes

78.For which relational operators in where clause, index is not used?
<>, like '% ...' is NOT functions, field +constant, field || ''

79.Assume that there are multiple databases running on one machine. How can you switch from one to another?
Changing the ORACLE_SID

80.What are the advantages of Oracle?
Portability: Oracle is ported to more platforms than any of its competitors, running on more than 100 hardware platforms and 20 networking protocols.
Market Presence: Oracle is by far the largest RDBMS vendor and spends more on R & D than most of its competitors earn in total revenue. This market clout means that you are unlikely to be left in the lurch by Oracle and there are always lots of third party interfaces available.
Backup and Recovery: Oracle provides industrial strength support for on-line backup and recovery and good software fault tolerence to disk failure. You can also do point-in-time recovery.
Performance: Speed of a 'tuned' Oracle Database and application is quite good, even with large databases. Oracle can manage > 100GB databases.
Multiple database support: Oracle has a superior ability to manage multiple databases within the same transaction using a two-phase commit protocol.

81.What is a forward declaration? What is its use?
PL/SQL requires that you declare an identifier before using it. Therefore, you must declare a subprogram before calling it. This declaration at the start of a subprogram is called forward declaration. A forward declaration consists of a subprogram specification terminated by a semicolon.

82.What are actual and formal parameters?
Actual Parameters: Subprograms pass information using parameters. The variables or expressions referenced in the parameter list of a subprogram call are actual parameters. For example, the following procedure call lists two actual parameters named emp_num and amount:
Eg. raise_salary(emp_num, amount);
Formal Parameters : The variables declared in a subprogram specification and referenced in the subprogram body are formal parameters. For example, the following procedure declares two formal parameters named emp_id and increase:
Eg. PROCEDURE raise_salary (emp_id INTEGER, increase REAL) IS current_salary REAL;

83.What are the types of Notation?
Position, Named, Mixed and Restrictions.

84.What all important parameters of the init.ora are supposed to be increased if you want to increase the SGA size?
In our case, db_block_buffers was changed from 60 to 1000 (std values are 60, 550 & 3500) shared_pool_size was changed from 3.5MB to 9MB (std values are 3.5, 5 & 9MB) open_cursors was changed from 200 to 300 (std values are 200 & 300) db_block_size was changed from 2048 (2K) to 4096 (4K) {at the time of database creation}.
The initial SGA was around 4MB when the server RAM was 32MB and The new SGA was around 13MB when the server RAM was increased to 128MB.

85.If I have an execute privilege on a procedure in another users schema, can I execute his procedure even though I do not have privileges on the tables within the procedure ?
Yes

86.What are various types of joins?
Equijoins, Non-equijoins, self join, outer join

87.What is a package cursor?
A package cursor is a cursor which you declare in the package specification without an SQL statement. The SQL statement for the cursor is attached dynamically at runtime from calling procedures.
88.If you insert a row in a table, then create another table and then say Rollback. In this case will the row be inserted?
Yes. Because Create table is a DDL which commits automatically as soon as it is executed. The DDL commits the transaction even if the create statement fails internally (eg table already exists error) and not syntactically.

89.What are the various types of queries?
Normal Queries
Sub Queries
Co-related queries
Nested queries
Compound queries

90.What is a transaction?
A transaction is a set of SQL statements between any two COMMIT and ROLLBACK statements.

91.What is implicit cursor and how is it used by Oracle ?
An implicit cursor is a cursor which is internally created by Oracle. It is created by Oracle for each individual SQL.

92.Which of the following is not a schema object : Indexes, tables, public synonyms, triggers and packages ?
Public synonyms

93.What is the difference between a view and a snapshot ?

94.What is PL/SQL?
PL/SQL is Oracle's Procedural Language extension to SQL. The language includes object oriented programming techniques such as encapsulation, function overloading, information hiding (all but inheritance), and so, brings state-of-the-art programming to the Oracle database server and a variety of Oracle tools.

95.Is there a PL/SQL Engine in SQL*Plus?
No. Unlike Oracle Forms, SQL*Plus does not have a PL/SQL engine. Thus, all your PL/SQL are send directly to the database engine for execution. This makes it much more efficient as SQL statements are not stripped off and send to the database individually.

96.Is there a limit on the size of a PL/SQL block?
Currently, the maximum parsed/compiled size of a PL/SQL block is 64K and the maximum code size is 100K. You can run the following select statement to query the size of an existing package or procedure.
SQL> select * from dba_object_size where name = 'procedure_name'

97.Can one read/write files from PL/SQL?
Included in Oracle 7.3 is a UTL_FILE package that can read and write files. The directory you intend writing to has to be in your INIT.ORA file (see UTL_FILE_DIR=... parameter). Before Oracle 7.3 the only means of writing a file was to use DBMS_OUTPUT with the SQL*Plus SPOOL command.
DECLARE
fileHandler UTL_FILE.FILE_TYPE;
BEGIN
fileHandler := UTL_FILE.FOPEN('/home/oracle/tmp', 'myoutput','W');
UTL_FILE.PUTF(fileHandler, 'Value of func1 is %s\n', func1(1));
UTL_FILE.FCLOSE(fileHandler);
END;

98.How can I protect my PL/SQL source code?
PL/SQL V2.2, available with Oracle7.2, implements a binary wrapper for PL/SQL programs to protect the source code. This is done via a standalone utility that transforms the PL/SQL source code into portable binary object code (somewhat larger than the original). This way you can distribute software without having to worry about exposing your proprietary algorithms and methods. SQL*Plus and SQL*DBA will still understand and know how to execute such scripts. Just be careful, there is no "decode" command available.
The syntax is:
wrap iname=myscript.sql oname=xxxx.yyy

99.Can one use dynamic SQL within PL/SQL? OR Can you use a DDL in a procedure ? How ?
From PL/SQL V2.1 one can use the DBMS_SQL package to execute dynamic SQL statements.
Eg: CREATE OR REPLACE PROCEDURE DYNSQL
AS
cur integer;
rc integer;
BEGIN
cur := DBMS_SQL.OPEN_CURSOR;
DBMS_SQL.PARSE(cur,'CREATE TABLE X (Y DATE)', DBMS_SQL.NATIVE);
rc := DBMS_SQL.EXECUTE(cur);
DBMS_SQL.CLOSE_CURSOR(cur);
END;
1. What is Referential Integrity rule?
Differentiate between
2. Delete & Truncate command.
3. Implicit Cursor & Explicit Cursor.
4. Ref. key & Foreign key.
5. Where & Having Clause.
6. What are various kinds of Integrity Constraints in Oracle?
7. What are various kind of joins?
8. What is Raise_Application_Error?
9. What are various kinds of exceptions in Oracle?
10. Normal Forms

Tuning Mappings for Better Performance


Challenge

In general, mapping-level optimization takes time to implement, but can significantly boost performance. Sometimes the mapping is the biggest bottleneck in the load process because business rules determine the number and complexity of transformations in a mapping.
Before deciding on the best route to optimize the mapping architecture, you need to resolve some basic issues. Tuning mappings is a tiered process. The first tier can be of assistance almost universally, bringing about a performance increase in all scenarios. The second tier of tuning processes may yield only small performance increase, or can be of significant value, depending on the situation.
Some factors to consider when choosing tuning processes at the mapping level include the specific environment, software/ hardware limitations, and the number of records going through a mapping. This Best Practice offers some guidelines for tuning mappings.

Description

Analyze mappings for tuning only after you have tuned the system, source, and target for peak performance. To optimize mappings, you generally reduce the number of transformations in the mapping and delete unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Doing so can reduce the amount of data the transformations store in the data cache. Too many Lookups and Aggregators encumber performance because each requires index cache and data cache. Since both are fighting for memory space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings is another option.
Limit the number of Aggregators in a mapping. A high number of Aggregators can increase I/O activity on the cache directory. Unless the seek/access time is fast on the directory itself, having too many Aggregators can cause a bottleneck. Similarly, too many Lookups in a mapping causes contention of disk and memory, which can lead to thrashing, leaving insufficient memory to run a mapping efficiently.
Consider Single-Pass Reading
If several mappings use the same data source, consider a single-pass reading. Consolidate separate mappings into one mapping with either a single Source Qualifier Transformation or one set of Source Qualifier Transformations as the data source for the separate data flows.
Similarly, if a function is used in several mappings, a single-pass reading will reduce the number of times that function will be called in the session.
Optimize SQL Overrides
When SQL overrides are required in a Source Qualifier, Lookup Transformation, or in the update override of a target object, be sure the SQL statement is tuned. The extent to which and how SQL can be tuned depends on the underlying source or target database system.  See the section Tuning SQL Overrides and Environment for Better Performance  for more information.
Scrutinize Datatype Conversions
PowerCenter Server automatically makes conversions between compatible datatypes. When these conversions are performed unnecessarily performance slows. For example, if a mapping moves data from an Integer port to a Decimal port, then back to an Integer port, the conversion may be unnecessary.
In some instances however, datatype conversions can help improve performance. This is especially true when integer values are used in place of other datatypes for performing comparisons using Lookup and Filter transformations.
Eliminate Transformation Errors
Large numbers of evaluation errors significantly slow performance of the PowerCenter Server. During transformation errors, the PowerCenter Server engine pauses to determine the cause of the error, removes the row causing the error from the data flow, and logs the error in the session log.
Transformation errors can be caused by many things including: conversion errors, conflicting mapping logic, any condition that is specifically set up as an error, and so on. The session log can help point out the cause of these errors. If errors recur consistently for certain transformations, re-evaluate the constraints for these transformation. Any source of errors should be traced and eliminated.
Optimize Lookup Transformations
There are a number of ways to optimize lookup transformations that are setup in a mapping.

When to Cache Lookups

When caching is enabled, the PowerCenter Server caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the PowerCenter Server queries the lookup table on a row-by-row basis.
NOTE: All the tuning options mentioned in this Best Practice assume that memory and cache sizing for lookups are sufficient to ensure that caches will not page to disks. Practices regarding memory and cache sizing for Lookup transformations are covered in Best Practice: Tuning Sessions for Better Performance.
In general, if the lookup table needs less than 300MB of memory, lookup caching should be enabled.
A better rule of thumb than memory size is to determine the size of the potential lookup cache with regard to the number of rows expected to be processed. For example, consider the following example.
In Mapping X, the source and lookup contain the following number of records:
ITEMS (source): 
 5000 records
MANUFACTURER:
 200 records
DIM_ITEMS:
 100000 records
Number of Disk Reads

Cached Lookup 
 Un-cached Lookup
 LKP_Manufacturer


 Build Cache
 200
 0
 Read Source Records
 5000
 5000
 Execute Lookup
 0
 5000
 Total # of Disk Reads
 5200
 10000
 LKP_DIM_ITEMS


 Build Cache
 100000
 0
 Read Source Records
 5000
 5000
 Execute Lookup
 0
 5000
 Total # of Disk Reads
 105000
 10000







Consider the case where MANUFACTURER is the lookup table. If the lookup table is cached, it will take a total of 5200 disk reads to build the cache and execute the lookup. If the lookup table is not cached, then it will take a total of 10,000 total disk reads to execute the lookup. In this case, the number of records in the lookup table is small in comparison with the number of times the lookup is executed. So this lookup should be cached. This is the more likely scenario.
Consider the case where DIM_ITEMS is the lookup table. If the lookup table is cached, it will result in 105,000 total disk reads to build and execute the lookup. If the lookup table is not cached, then the disk reads would total 10,000. In this case the number of records in the lookup table is not small in comparison with the number of times the lookup will be executed. Thus the lookup should not be cached.
Use the following eight step method to determine if a lookup should be cached:
  1. Code the lookup into the mapping.
  2. Select a standard set of data from the source. For example, add a where clause on a relational source to load a sample 10,000 rows.
  3. Run the mapping with caching turned off and save the log.
  4. Run the mapping with caching turned on and save the log to a different name than the log created in step 3.
  5. Look in the cached lookup log and determine how long it takes to cache the lookup object. Note this time in seconds: LOOKUP TIME IN SECONDS = LS.
  6. In the non-cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into the number or rows being processed: NON-CACHED ROWS PER SECOND = NRS.
  7. In the cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into number or rows being processed: CACHED ROWS PER SECOND = CRS.
  8. Use the following formula to find the breakeven row point:

    (LS*NRS*CRS)/(CRS-NRS)  = X

    Where X is the breakeven point. If your expected source records is less than X, it is better to not cache the lookup. If your expected source records is more than X, it is better to cache the lookup.

    For example:

    Assume the lookup takes 166 seconds to cache (LS=166).
    Assume with a cached lookup the load is 232 rows per second (CRS=232).
    Assume with a non-cached lookup the load is 147 rows per second (NRS = 147).

    The formula would result in: (166*147*232)/(232-147) = 66,603.

    Thus, if the source has less than 66,603 records, the lookup should not be cached. If it has more than 66,603 records, then the lookup should be cached.

Sharing Lookup Caches

There are a number of methods for sharing lookup caches.
  • Within a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the PowerCenter Server will re-use the cache for the multiple instances of the lookup. Using the same lookup multiple times in the mapping will be more resource intensive with each successive instance.  If multiple cached lookups are from the same table but are expected to return different columns of data, it may be better to setup the multiple lookups to bring back the same columns even though not all return ports are used in all lookups. Bringing back a common set of columns may reduce the number of disk reads.
  • Across sessions of the same mapping, the use of an unnamed persistent cache allows multiple runs to use an existing cache file stored on the PowerCenter Server. If the option of creating a persistent cache is set in the lookup properties, the memory cache created for the lookup during the initial run is saved to the PowerCenter Server. This can improve performance because the Server builds the memory cache from cache files instead of the database. This feature should only be used when the lookup table is not expected to change between session runs.
  • Across different mappings and sessions, the use of a named persistent cache allows sharing of an existing cache file.

Reducing the Number of Cached Rows

There is an option to use a SQL override in the creation of a lookup cache. Options can be added to the WHERE clause to reduce the set of records included in the resulting cache.
NOTE: If you use a SQL override in a lookup, the lookup must be cached.

Optimizing the Lookup Condition

In the case where a lookup uses more than one lookup condition, set the conditions with an equal sign first in order to optimize lookup performance.

Indexing the Lookup Table

The PowerCenter Server must query, sort and compare values in the lookup condition columns. As a result, indexes on the database table should include every column used in a lookup condition. This can improve performance for both cached and un-cached lookups.
¨ In the case of a cached lookup, an ORDER BY condition is issued in the SQL statement used to create the cache. Columns used in the ORDER BY condition should be indexed. The session log will contain the ORDER BY statement.
¨ In the case of an un-cached lookup, since a SQL statement created for each row passing into the lookup transformation, performance can be helped by indexing columns in the lookup condition.
Optimize Filter and Router Transformations
Filtering data as early as possible in the data flow improves the efficiency of a mapping. Instead of using a Filter Transformation to remove a sizeable number of rows in the middle or end of a mapping, use a filter on the Source Qualifier or a Filter Transformation immediately after the source qualifier to improve performance.
Avoid complex expressions when creating the filter condition. Filter transformations are most effective when a simple integer or TRUE/FALSE expression is used in the filter condition.
Filters or routers should also be used to drop rejected rows from an Update Strategy transformation if rejected rows do not need to be saved.
Replace multiple filter transformations with a router transformation. This reduces the number of transformations in the mapping and makes the mapping easier to follow.
Optimize Aggregator Transformations
Aggregator Transformations often slow performance because they must group data before processing it.
Use simple columns in the group by condition to make the Aggregator Transformation more efficient. When possible, use numbers instead of strings or dates in the GROUP BY columns. Also avoid complex expressions in the Aggregator expressions, especially in GROUP BY ports.
Use the Sorted Input option in the aggregator. This option requires that data sent to the aggregator be sorted in the order in which the ports are used in the aggregators group by. The Sorted Input option decreases the use of aggregate caches. When it is used, the PowerCenter Server assumes all data is sorted by group and, as a group is passed through an aggregator, calculations can be performed and information passed on to the next transformation. Without sorted input, the Server must wait for all rows of data before processing aggregate calculations. Use of the Sorted Inputs option is usually accompanied by a Source Qualifier which uses the Number of Sorted Ports option.
Use an Expression and Update Strategy instead of an Aggregator Transformation. This technique can only be used if the source data can be sorted. Further, using this option assumes that a mapping is using an Aggregator with Sorted Input option. In the Expression Transformation, the use of variable ports is required to hold data from the previous row of data processed. The premise is to use the previous row of data to determine whether the current row is a part of the current group or is the beginning of a new group. Thus, if the row is a part of the current group, then its data would be used to continue calculating the current group function. An Update Strategy Transformation would follow the Expression Transformation and set the first row of a new group to insert and the following rows to update.
Optimize Joiner Transformations
Joiner transformations can slow performance because they need additional space in memory at run time to hold intermediate results.
Define the rows from the smaller set of data in the joiner as the Master rows. The Master rows are cached to memory and the detail records are then compared to rows in the cache of the Master rows. In order to minimize memory requirements, the smaller set of data should be cached and thus set as Master.
Use Normal joins whenever possible. Normal joins are faster than outer joins and the resulting set of data is also smaller.
Use the database to do the join when sourcing data from the same database schema. Database systems usually can perform the join more quickly than the Informatica Server, so a SQL override or a join condition should be used when joining multiple tables from the same database schema.
Optimize Sequence Generator Transformations
Sequence Generator transformations need to determine the next available sequence number, thus increasing the Number of Cached Values property can increase performance. This property determines the number of values the Informatica Server caches at one time. If it is set to cache no values then the Informatica Server must query the Informatica repository each time to determine what is the next number which can be used. Configuring the Number of Cached Values to a value greater than 1000 should be considered. It should be noted any cached values not used in the course of a session are lost since the sequence generator value in the repository is set, when it is called next time, to give the next set of cache values.
Avoid External Procedure Transformations
For the most part, making calls to external procedures slows down a session. If possible, avoid the use of these Transformations, which include Stored Procedures, External Procedures and Advanced External Procedures.
Field Level Transformation Optimization
As a final step in the tuning process, expressions used in transformations can be tuned. When examining expressions, focus on complex expressions for possible simplification.
To help isolate slow expressions, do the following:
1. Time the session with the original expression.
2. Copy the mapping and replace half the complex expressions with a constant.
3. Run and time the edited session.
4. Make another copy of the mapping and replace the other half of the complex expressions with a constant.
5. Run and time the edited session.
Processing field level transformations takes time. If the transformation expressions are complex, then processing will be slower. Its often possible to get a 10- 20% performance improvement by optimizing complex field level transformations. Use the target table mapping reports or the Metadata Reporter to examine the transformations. Likely candidates for optimization are the fields with the most complex expressions. Keep in mind that there may be more than one field causing performance problems.
Factoring out Common Logic
This can reduce the number of times a mapping performs the same logic. If a mapping performs the same logic multiple times in a mapping, moving the task upstream in the mapping may allow the logic to be done just once. For example, a mapping has five target tables. Each target requires a Social Security Number lookup. Instead of performing the lookup right before each target, move the lookup to a position before the data flow splits.
Minimize Function Calls
Anytime a function is called it takes resources to process. There are several common examples where function calls can be reduced or eliminated.
Aggregate function calls can sometime be reduced. In the case of each aggregate function call, the Informatica Server must search and group the data.
Thus the following expression:
SUM(Column A) + SUM(Column B)
Can be optimized to:
SUM(Column A + Column B)
In general, operators are faster than functions, so operators should be used whenever possible.
For example if you have an expression which involves a CONCAT function such as:
CONCAT(CONCAT(FIRST_NAME, ), LAST_NAME)
It can be optimized to:
FIRST_NAME || || LAST_NAME
Remember that IIF() is a function that returns a value, not just a logical test. This allows many logical statements to be written in a more compact fashion.
For example:
IIF(FLG_A=Y and FLG_B=Y and FLG_C=Y, VAL_A+VAL_B+VAL_C,
 IIF(FLG_A=Y and FLG_B=Y and FLG_C=N, VAL_A+VAL_B,
  IIF(FLG_A=Y and FLG_B=N and FLG_C=Y, VAL_A+VAL_C,
   IIF(FLG_A=Y and FLG_B=N and FLG_C=N, VAL_A,
    IIF(FLG_A=N and FLG_B=Y and FLG_C=Y, VAL_B+VAL_C,
      IIF(FLG_A=N and FLG_B=Y and FLG_C=N, VAL_B,
       IIF(FLG_A=N and FLG_B=N and FLG_C=Y, VAL_C,
        IIF(FLG_A=N and FLG_B=N and FLG_C=N, 0.0))))))))
Can be optimized to:
IIF(FLG_A=Y, VAL_A, 0.0) + IIF(FLG_B=Y, VAL_B, 0.0) + IIF(FLG_C=Y, VAL_C, 0.0)
The original expression had 8 IIFs, 16 ANDs and 24 comparisons. The optimized expression results in 3 IIFs, 3 comparisons and two additions.
Be creative in making expressions more efficient. The following is an example of rework of an expression which eliminates three comparisons down to one:
For example:
IIF(X=1 OR X=5 OR X=9, 'yes', 'no')
Can be optimized to:
IIF(MOD(X, 4) = 1, 'yes', 'no')
Calculate Once, Use Many Times
Avoid calculating or testing the same value multiple times. If the same sub-expression is used several times in a transformation, consider making the sub-expression a local variable. The local variable can be used only within the transformation but by calculating the variable only once can speed performance.
Choose Numeric versus String Operations
The Informatica Server processes numeric operations faster than string operations. For example, if a lookup is done on a large amount of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.
Optimizing Char-Char and Char-Varchar Comparisons
When the Informatica Server performs comparisons between CHAR and VARCHAR columns, it slows each time it finds trailing blank spaces in the row.  The Treat CHAR as CHAR On Read option can be set in the Informatica Server setup so that the Informatica Server does not trim trailing spaces from the end of CHAR source fields.
Use DECODE instead of LOOKUP
When a LOOKUP function is used, the Informatica Server must lookup a table in the database. When a DECODE function is used, the lookup values are incorporated into the expression itself so the Informatica Server does not need to lookup a separate table. Thus, when looking up a small set of unchanging values, using DECODE may improve performance.
Reduce the Number of Transformations in a Mapping
Whenever possible the number of transformations should be reduced. As there is always overhead involved in moving data between transformations. Along the same lines, unnecessary links between transformations should be removed to minimize the amount of data moved. This is especially important with data being pulled from the Source Qualifier Transformation.
Use Pre- and Post-Session SQL Commands
You can specify pre- and post-session SQL commands in the Properties tab of the Source Qualifier transformation and in the Properties tab of the target instance in a mapping. To increase the load speed, use these commands to drop indexes on the target before the session runs, then recreate them when the session completes.
Apply the following guidelines when using the SQL statements:
  • You can use any command that is valid for the database type. However, the PowerCenter Server does not allow nested comments, even though the database might.
  • You can use mapping parameters and variables in SQL executed against the source, but not against the target.
  • Use a semi-colon (;) to separate multiple statements.
  • The PowerCenter Server ignores semi-colons within single quotes, double quotes, or within /* ...*/.
  • If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\).
  • The Workflow Manager does not validate the SQL.
Use Environmental SQL
For relational databases, you can execute SQL commands in the database environment when connecting to the database. You can use this for source, target, lookup, and stored procedure connection. For instance, you can set isolation levels on the source and target systems to avoid deadlocks. Follow the guidelines mentioned above for using the SQL statements.