I recently came across a problem with a query in Sql Server which I thought about sharing in my blog as this could be useful to many. Following is a summary of the problem.
A similar problem is discussed in the Blog Entry First Non Null Puzzle by Itzik Ben-Gan. But my own problem had a little extension to the Puzzle discussed by Itzik Ben-Gan, even though I must admit it helped a lot in finding solution to the problem.
Problem.
Given a Table myTable
, with 3 columns ID (BigInt,PK, Identity) , FirstValue (Varchar,Nullable) and SecondValue(Varchar,Nullable), the business logic requires an entry in the FirstValue and SecondValue to be NULL
if there is no change in the value in particular column from the preceding row. For example, for the following data
// Actual Data
| Id | FirstValue | SecondValue |
|----|------------|------------:|
| 1 | Value 1001 | Value 2001 |
| 2 | Value 1001 | Value 2002 |
| 3 | Value 1003 | Value 2002 |
The bussiness logic requires our table to be as following.
// Representation in Db
| Id | FirstValue | SecondValue |
|----|------------|------------:|
| 1 | Value 1001 | Value 2001 |
| 2 | NULLL | Value 2002 |
| 3 | Value 1003 | NULL |
Given the state of our table, the problem arises when we query the table and need to fill in the gaps created by the NULL
values. While retrieving the results, we would like to fetch the result as shown in the first Table.
First Solution
This is where Itzik Ben-Gan’s puzzle comes to the our rescue. Extending the solution proposed in the blog for two columns, we get the following query.
WITH
Expr1 AS
(
SELECT id, FirstValue,
MAX( CASE WHEN FirstValue IS NOT NULL THEN Id END )
OVER( ORDER BY Id
ROWS UNBOUNDED PRECEDING ) AS group1
FROM myTable
),
Expr2 AS
(
SELECT id, SecondValue,
MAX( CASE WHEN SecondValue IS NOT NULL THEN Id END )
OVER( ORDER BY Id
ROWS UNBOUNDED PRECEDING ) AS group2
FROM myTable
)
SELECT Expr1.Id,
MAX(FirstValue) OVER( PARTITION BY group1 ORDER BY Expr1.Id ROWS UNBOUNDED PRECEDING ) AS FirstValue,
MAX(SecondValue) OVER( PARTITION BY group2 ORDER BY Expr2.Id ROWS UNBOUNDED PRECEDING ) AS SecondValue
FROM Expr1 Join Expr2
On Expr1.Id = Expr2.Id;
This would yield us the desired result.
The Problem Extended
As mentioned earlier, while the blog provides us direction for our query, there is a little problem of our own which is different from the blog entry. For explaining the scenario, let us add more data to our table.
// Representation in Db
| Id | FirstValue | SecondValue |
|----|------------|------------:|
| 1 | Value 1001 | Value 2001 |
| 2 | NULLL | Value 2002 |
| 3 | Value 1003 | NULL |
| 4 | NULL | NULL |
| 5 | Value 1005 | Value 2005 |
| 6 | Value 1006 | NULL |
For the above data, if we need to retrieve data only for rows which satisfies the condition ID>=3
, our initial approach would be add the condition and expect it to be working as desired.
WITH
Expr1 AS
(
SELECT id, FirstValue,
MAX( CASE WHEN FirstValue IS NOT NULL THEN Id END )
OVER( ORDER BY Id
ROWS UNBOUNDED PRECEDING ) AS group1
FROM myTable
Where Id >= 3
),
Expr2 AS
(
SELECT id, SecondValue,
MAX( CASE WHEN SecondValue IS NOT NULL THEN Id END )
OVER( ORDER BY Id
ROWS UNBOUNDED PRECEDING ) AS group2
FROM myTable
Where Id >= 3
)
SELECT Expr1.Id,
MAX(FirstValue) OVER( PARTITION BY group1 ORDER BY Expr1.Id ROWS UNBOUNDED PRECEDING ) AS FirstValue,
MAX(SecondValue) OVER( PARTITION BY group2 ORDER BY Expr2.Id ROWS UNBOUNDED PRECEDING ) AS SecondValue
FROM Expr1 Join Expr2
On Expr1.Id = Expr2.Id;
The result of above query would be as following.
// Representation in Db
| Id | FirstValue | SecondValue |
|----|------------|------------:|
| 3 | Value 1003 | NULL |
| 4 | NULL | NULL |
| 5 | Value 1005 | Value 2005 |
| 6 | Value 1006 | NULL |
However this result is not as per our business expectation. SecondValue
for Id = 3
is supposed to be Value 2002
. This would be repeated in the succeeding row as well since value is not changing. However, since the scope of our query begins with a Null
Value, our desired results are not correct.
Final Solution
Thanks to Stackoverflow, I managed to find a solution to the problem. As expected, we would need to modify our query to ensure we fetch last non-null value from the preceding rows using a sub query. For example
WITH
Expr1 AS
(
SELECT Id, FirstValue
, MAX(CASE WHEN FirstValue IS NOT NULL THEN Id END) OVER (ORDER BY Id ROWS UNBOUNDED PRECEDING) AS group1
, (SELECT TOP 1 FirstValue FROM myTable t2 WHERE t2.Id < t1.Id AND FirstValue IS NOT NULL ORDER BY T2.Id DESC) PreviousValue
FROM myTable t1
WHERE Id >= 3
),
Expr2 AS
(
SELECT Id, SecondValue
, MAX(CASE WHEN SecondValue IS NOT NULL THEN Id END) OVER (ORDER BY Id ROWS UNBOUNDED PRECEDING) AS group2
, (SELECT TOP 1 SecondValue FROM myTable t2 WHERE t2.Id < t1.Id AND SecondValue IS NOT NULL ORDER BY T2.Id DESC) PreviousValue
FROM myTable t1
WHERE Id >= 3
)
SELECT Expr1.Id,
CASE WHEN group1 IS NULL THEN Expr1.PreviousValue ELSE MAX(Expr1.FirstValue) OVER (PARTITION BY group1 ORDER BY Expr1.Id ROWS UNBOUNDED PRECEDING) END AS FirstValue,
CASE WHEN group2 IS NULL THEN Expr2.PreviousValue ELSE MAX(Expr2.SecondValue) OVER (PARTITION BY group2 ORDER BY Expr2.Id ROWS UNBOUNDED PRECEDING) END AS SecondValue
FROM Expr1 join Expr2
On Expr1.Id = Expr2.Id;
This would produce our desired result of fetching the Last Non-Null Value even when the previous non-value is out of the scope of resultant rows.