Feb 9, 2016

Oracle: Analytical Functions

ROW_NUMBER is an analytic function. It assigns a unique number to each row to which it is applied (either each row in the partition or each row returned by the query),
in the ordered sequence of rows specified in the order_by_clause, beginning with 1.

HR>SELECT d.department_name, e.last_name, e.salary
          ,ROW_NUMBER() OVER ( PARTITION BY e.department_id ORDER BY e.salary) AS rownumber
     FROM employees e, departments d
    WHERE e.department_id = d.department_id
      AND d.department_id IN ('20','30', '40')
    ORDER BY d.department_name, e.salary, e.last_name, rownumber;

DEPARTMENT LAST_NAME   SALARY  ROWNUMBER
---------- ---------- ------- ----------
Human Reso       Mavris        6500          1
Marketing           Fay             6000          1
Marketing           Hartstein    13000         2
Purchasing         Colmenares 2500          1
Purchasing         Himuro       2600          2


LAG and LEAD functions have the same usage, as shown below.

LAG  (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)
LEAD (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)

value_expression - Can be a column or a built-in function, except for other analytic functions.
offset - The number of rows preceeding/following the current row, from which the data is to be retrieved. The default value is 1.
default - The value returned if the offset is outside the scope of the window. The default value is NULL.

LAG

The LAG function is used to access data from a previous row. The following query returns the salary from the previous row to calculate the difference between the salary of the current row and that of the previous row. Notice that the ORDER BY of the LAG function is used to order the data by salary.

SELECT empno,
       ename,
       job,
       sal,
       LAG(sal, 1, 0) OVER (ORDER BY sal) AS sal_prev,
       sal - LAG(sal, 1, 0) OVER (ORDER BY sal) AS sal_diff
FROM   emp;

     EMPNO ENAME      JOB          SAL   SAL_PREV   SAL_DIFF
---------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK            800          0           800
      7900 JAMES      CLERK            950        800        150
      7876 ADAMS    CLERK           1100        950        150

The LEAD function is used to return data from the next row. The following query returns the salary from the next row to calulate the difference between the salary of the current row and the following row.

SELECT empno,
       ename,
       job,
       sal,
       LEAD(sal, 1, 0) OVER (ORDER BY sal) AS sal_next,
       LEAD(sal, 1, 0) OVER (ORDER BY sal) - sal AS sal_diff
FROM   emp;

     EMPNO ENAME      JOB           SAL   SAL_NEXT   SAL_DIFF
---------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK            800        950        150
      7900 JAMES      CLERK            950       1100       150
      7876 ADAMS    CLERK           1100      1250       150

RANK

Let's assume we want to assign a sequential order, or rank, to people within a department based on salary, we might use the RANK function like.

SELECT empno,
       deptno,
       sal,
       RANK() OVER (PARTITION BY deptno ORDER BY sal) "rank"
FROM   emp;

     EMPNO  DEPTNO        SAL       rank
---------- ---------- ---------- ----------
      7900         30        950          1
      7654         30       1250          2
      7521         30       1250          2
      7844         30       1500          4

DENSE_RANK

The DENSE_RANK function acts like the RANK function except that it assigns consecutive ranks.

SELECT empno,
       deptno,
       sal,
       DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) "rank"
FROM   emp;

     EMPNO     DEPTNO        SAL       rank
---------- ---------- ---------- ----------
      7900         30        950          1
      7654         30       1250          2
      7521         30       1250          2
      7844         30       1500          3

FIRST and LAST

The FIRST and LAST functions can be used to return the first or last value from an ordered sequence. Say we want to display the salary of each employee, along with the lowest and highest within their department we may use something like.

SELECT empno,
       deptno,
       sal,
       MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) "Lowest",
       MAX(sal) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) "Highest"
FROM   emp
ORDER BY deptno, sal;

     EMPNO     DEPTNO        SAL     Lowest    Highest
---------- ---------- ---------- ---------- ----------
      7934         10       1300       1300       5000
      7782         10       2450       1300       5000
      7839         10       5000       1300       5000


FIRST_VALUE

The FIRST_VALUE analytic function is similar to the FIRST analytic function, allowing you to return the first result from an ordered set.

SELECT empno,
       deptno,
       sal,
       FIRST_VALUE(sal) IGNORE NULLS
         OVER (PARTITION BY deptno ORDER BY sal) AS lowest_in_dept
FROM   emp;

     EMPNO     DEPTNO        SAL LOWEST_IN_DEPT
---------- ---------- ---------- --------------
      7934         10       1300           1300
      7782         10       2450           1300
      7839         10       5000           1300

The "{RESPECT | IGNORE} NULLS" clause indicates if NULLs are considered when determining results.

The windowing clause can be used to alter the window of operation. The following example uses "ROWS 1 PRECEDING" to give a result similar, but not quite the same, to a LAG of 1 row.

SELECT empno,
       deptno,
       sal,
       FIRST_VALUE(sal) IGNORE NULLS
         OVER (PARTITION BY deptno ORDER BY sal ROWS 1 PRECEDING) AS preceding_in_dept
FROM   emp;

     EMPNO     DEPTNO        SAL PRECEDING_IN_DEPT
---------- ---------- ---------- -----------------
      7934         10       1300              1300
      7782         10       2450              1300
      7839         10       5000              2450

LAST_VALUE

The LAST_VALUE analytic function is similar to the LAST analytic function, allowing you to return the last result from an ordered set. Using the default windowing clause the result can be a little unexpected.

SELECT empno,
       deptno,
       sal,
       LAST_VALUE(sal) IGNORE NULLS
         OVER (PARTITION BY deptno ORDER BY sal) AS highest_in_dept
FROM   emp;

     EMPNO     DEPTNO        SAL HIGHEST_IN_DEPT
---------- ---------- ---------- ---------------
      7934         10       1300            1300
      7782         10       2450            2450
      7839         10       5000            5000

This is because the default windowing clause is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", which in this example means the first row with the same value as that of the current row will always be the last row considered. Altering the windowing clause to "RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" gives us the result we probably expected.

SELECT empno,
       deptno,
       sal,
       LAST_VALUE(sal) IGNORE NULLS
         OVER (PARTITION BY deptno ORDER BY sal RANGE BETWEEN
           UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS highest_in_dept
FROM   emp;

     EMPNO     DEPTNO        SAL HIGHEST_IN_DEPT
---------- ---------- ---------- ---------------
      7934         10       1300            5000
      7782         10       2450            5000
      7839         10       5000            5000

Reference:
https://oracle-base.com/articles/misc/analytic-functions

No comments:

Post a Comment