Rank Functions introduced in SQL 2005/2008

Hello friends good morning..Today we are going to learn new inbuilt functions those were introduced by Microsoft in SQL 2005 and later versions. We will talk about each Rank function one by one with examples.
Lets have a look on rank functions available in SQL server 2005 and later versions.
  • Row_Number()
  • Rank()
  • Dense_Rank()
  • Ntile()
The above four functions were introduced as rank function in SQL server 2005 and later versions.
Lets take one by one and see the difference.

Row_Number() :- Row_Number() function generate a sequence of numbers starting from 1 up to no of records we are selecting. It means it generates a unique id for each record in a sequence. Lets take an example to make it more understandable.

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[Employee](

[EID] [int] IDENTITY(10,5) NOT NULL,
[Name] [varchar](50) NULL
) ON [PRIMARY]

GO


SET ANSI_PADDING OFF

GO

I have created a table named Employee having two columns in it. Now I am going to insert 5 rows in it.


insert into Employee(Name) values('neeraj')

insert into Employee(Name) values('vikas')
insert into Employee(Name) values('raj')
insert into Employee(Name) values('shyam')
insert into Employee(Name) values('dheeraj')

Syntax of using Row_Number() function :- Row_Number() over(partition by [field Name] order by [Field name])

Whenever we use Row_Number() function we need to use Over clause, It means on which field we want to generate the row number. Over clause is mandatory and there are two parameters in over clause where one parameter is mandatory to specify. 


Partition by [field Name]--> Optional
Order by [field Name]--> Mandatory

In the above picture, I have written a query using row_number() function and included order by parameter in my query but not included the partition by parameter in my query. you can see the result set that row_number is returning a sequence of numbers starting from 1 to 5. As we have 5 records in the Employee table, I hope you got the point. Now lets take another example to see what happens when "partition by" parameter is used in the same query. I am going to add few rows in the table.


insert into Employee(Name) values('raj')

insert into Employee(Name) values('raj')
insert into Employee(Name) values('raj')
insert into Employee(Name) values('neeraj')
insert into Employee(Name) values('neeraj')

Lets see what happens, if I include the second parameter "partition by" in the over clause.

select ROW_NUMBER() over(partition by Name order by EID) ,EID ,Name  from Employee 



In the above query result set following are the observations.

1)In first row, row_number function has returned value 1.
2)In 2nd to 4th row, row_number function has returned sequential no starting from 1 to 3.
3)In 5th to 8th row, row_number function has returned sequential no starting from 1 to 4.
4)In 9th and 10 row, row_number function has returned 1 and 1.

Now coming to the query, we have used Partition by parameter with field(column) "Name", It means that SQL server will make a group of  same values present in the column "Name" and treat that group as one partition. In that way we will have(5 groups, "dheeraj","neeraj","raj","shayam" and "vikas").


As we all know that row_number function generate a sequence no for each row, but here we have included partition parameter as well so first SQL server will break the result set in partition and then it will apply sequence no to each row in each partition. But the only difference is that for each partition, It resets the row_number value to 1. Hence for "dheeraj" we have row_number value 1, then for "neeraj" we have row_number value(1 to 3)  and so on. 


Rank() :- Rank() function used for ranking and ranking could be in ascending or descending  order.

Lets take an example. I am going to run the below query in management studio and lets see what result comes up.
select RANK() over(order by Name),EID ,Name  from Employee



Lets have a look on the result set, I have used "Name" column as a order by parameter, Now look at the result of Rank() function, following are the observations.


1)1st row is returning value 1.

2)2nd to 4th row is returning value(2).
3)5th to 8th row is returning value (5). 
4)9th row is returning value(9).
5)10th row is returning value(10).

Now Lets understand why it is showing this result, since we have included column "name" in order by parameter, Rank() function will give rank based on "Name" Column.

So first row will be ranked as value(1) and 2nd to 4th row will be ranked as 2 as names are same in all 3 rows. Similarly in 5th to 8th row "name" value is same hence these rows will have same rank value i.e. 5.

Question:- Why rank value is 5 for 5th to 8th row, while earlier rank was 2 , hence it should be 3.

Answer:- Formula for rank function is :->> Next Rank=Last Rank+No of rows with same rank(Last rank value)

In the above case last rank was 2 and there were three entries with same rank, hence next rank is 2+3=5.

similarly next rank is 9= Last rank(5) + No of rows (4) =5+4

Lets take another example :




In the above query, I have changed the order by parameter as desc. hence the result set is different. 

First result set was sorted in descending order than ranked each row.

select RANK() over(partition by Name order by EID),EID ,Name  from Employee


Lets take another example with "Partition by" parameter in Rank function.
and see how SQL server process the above query.



Step 1:- As we have put "name" column in the "partition by" parameter, same names will form a group or partition.

Step 2:- Since in "order by" parameter we have put "EID" SQL Server sort the EID value in ascending order in each group or partition.
Step 3:-  Now rank function give the rank based on EID as we are order the result set with EID.

First row is getting rank value 1. Now 2nd row is also getting same rank value(1), this is happened because its a different partition or group based on Name. Similarly 5th row is also getting rank value(1) , because it belongs to other partition and we know that in each partition rank functions reset their values.


Dense_Rank() :- There is not much difference between Rank() and Dense_Rank() function, the only difference between Rank() and Dense_Rank() is that in case of same record the above formula explained by me for next rank got failed.
 Formula for rank function is :->> Next Rank=Last Rank+No of rows with same rank(Last rank value)

New formula would be --> Next Rank=Last Rank +1 


lets see the result set :-




In the above result set you can see following observations.


1)1st row is having dense_rank() value 1.

2)2nd to 4th row is having dense_rank() value 2.
3)5th to 8th row is having dense_rank value 3 (which was 5 for rank() function)-->only difference between Rank() and Dense_Rank()

NTILE() :- NTILE function does logical grouping of records and assign sequential no to each group and each record within a group has same group no. Lets take an example and see how NTILE()function works.




NTILE() function takes an additional parameter NTILE(n) where n is the no of logical grouping we want to form. I have set the value n=2. Lets have a look on the result set and understand the formula.


Total no of records in the table =10

No of logical grouping(n)=2
Formula =Total no of records in the table / No of logical grouping 
NTILE= 10/2= 5 

If the above formula returns a whole no, then NTILE function will create two logical group having five rows in each group and each group will be having grouping no in sequence starting from 1. You can see the result set first five rows are having logical grouping no 1 and rest five with logical grouping no 2.

Lets take another example :- I am going to set the value of n=3 , lets see what would be the formula.
Total no of records in the table =10
No of logical grouping(n)=3
Formula= [Total no of records in the table] / [No of logical grouping]
NTILE=10/3=3.33 

If the above formula returns a fraction value then below would be the formula for NTILE function.

first logical group :- 10/3=3.33 take the whole no from the result and add 1= 3+1=4
Rows left for grouping :- 10-4=6
No of logical grouping left=total no of grouping(3)-1(as we have already created the one logical grouping)=2
Second logical group= 6/2=3 since result is coming a whole value, hence second and third logical group will have 3 rows in each group. lets see the below result set.

The above picture clearly depicts that first four rows are having 1 as logical group no and second logical group is having 3 rows with logical no 2. and third logical group is having 3 rows with logical no 3.

Lets take another example where n=4

first group =10/4=2.5(not a whole no), hence 2 +1 =3
second group =10-3=7/3(total logical grouping -1)=2.33(not a whole no), hence 2+1=3
third group= 10-6=4/2(total logical grouping -2)=2

Hence first group will have 3 rows second group will also have 3 rows and rest(third, fourth) group will have 2 rows in each. Below is the result set.


I hope this article will help you a lot to understand Rank functions. I will talk about CTE(Common table expression) in my next article. Till then take care and have a wonderful day.

No comments: