Feb 7, 2016

Teradata: Primary Index

Primary Index

Every table must have at least one column as the Primary Index. The Primary Index is defined when the table is created.There are two reasons you might pick a different Primary Index then your Primary Key. They are (1) for Performance reasons and (2) known access paths.

Primary Index Rules
Rule 1: One Primary Index per table.
Rule 2: A Primary Index value can be unique or non-unique.
Rule 3: The Primary Index value can be NULL.
Rule 4: The Primary Index value can be modified.
Rule 5: The Primary Index of a populated table cannot be modified.
Rule 6: A Primary Index has a limit of 64 columns.

Two Types of Primary Indexes (UPI or NUPI)

Unique Primary Index(UPI)
A Unique Primary Index (UPI) is unique and cannot have any duplicates.
If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. An UPI enforces UNIQUENESS for a column.
 
A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always aone-AMP operation.
 
We have selected EMP_NO to be our Primary Index. Because we have designated EMP_NO to be a Unique Primary Index, there can be no duplicate employee numbers in the table.       

Non-Unique Primary Index (NUPI) 
A Non-Unique Primary Index (NUPI) means that the values for the selected column can be non-unique. Duplicate values can exist.

A Non-Unique Primary Index will almost never spread the table rows evenly.
An All-AMP operation will take longer if the data is unevenly distributed. You might pick a NUPI over an UPI because the NUPI column may be more effective for query access and joins.

We have selected LAST_NAME to be our Primary Index. Because we have designated LAST_NAME to be a Non-Unique Primary Index we are anticipating that there will be individuals in the table with the same last name.




Multi-Column Primary Indexes:
Teradata allows more than one column to be designated as the Primary Index. It is still only one Primary Index, but it is merely made up by combining multiple columns together. Teradata allows up to 64 combined columns to make up the one Primary Index required for a table.

On the following page you can see we have designated First_Name and Last_Name combined to make up the Primary Index.
 This is often done for two reasons:
 (1) To get better data distribution among the AMPs
 (2) Users often use multiple keys consistently to query

Data distribution using Primary Index 
When a user submits an SQL request against a table using a Primary Index, the request becomes a one-AMP operation, which is the most direct and efficient way for the system to find a row. The process is explained below.

Hashing Process
1.The primary index value goes into the hashing algorithm.
2.The output of the hashing algorithm is the row hash value.
3.The hash map points to the specific AMP where the row resides.
4.The PE sends the request directly to the identified AMP.
5.The AMP locates the row(s) on its vdisk.
6.The data is sent over the BYNET to the PE, and the PE sends the answer set on to the client application.

NOTE:
If you forget to define PRIMARY INDEX while doing a CREATE TABLE, the default will be use i.e. on the following:

* PRIMARY key
* First UNIQUE constraint
* First column

PRIMARY KEY
PRIMARY INDEX
It cannot be NULL
NULL is possible
It is not mandatory in Teradata
It is mandatory in Teradata
No data distribution
Helps in data distribution
It has to be unique
Unique or non-unique
It is logical implementation
Physical implementation
Cannot be changed
 can be changed

1 comment:

  1. Let's assume that my table has a tentative UPI of Coupon_id and Source_id (combination of 2 attributes will always be unique)

    The access paths will be by coupon or by coupon and source_id.

    Why shall I choose a PI over a UPI? Will searching by just coupon on this table that has a composite UPI be worse than having the table with NON UPI of just coupon?

    Thanks

    ReplyDelete