SQL SERVER PICK RANDOM (OR FIRST) VALUE WITH AGGREGATION

ezra heywood

Total Post:145

Points:1019
Posted by  ezra heywood
 1181  View(s)
Ratings:
Rate this:
Hi Everyone!

How can I get SQL Server to return the first value (any one, I don't care, it just needs to be fast) it comes across when aggregating?

For example, let's say I have:

ID      Group
1       A
2       A
3       A
4       B
5       B
and I need to get any one of the ID's for each group. I can do this as follows:

Select 
max(id)
,group 
from Table 
group by group
which returns

ID      Group
3       A
5       B
That does the job, but it seems stupid to me to ask SQL Server to calculate the highest ID when all it really needs to do is to pick the first ID it comes across.

Thanks

PS - the fields are indexed, so maybe it doesn't really make a difference?

  1. AVADHESH PATEL

    Post:604

    Points:4228
    Re: SQL Server pick random (or first) value with aggregation

    Hi Ezra!


    There is an undocumented aggregate called ANY which is not valid syntax but is possible to get to appear in your execution plans. This does not provide any performance advantage however.

    Assuming the following table and index structure

    CREATE TABLE T
    (
    id int identity primary key,
    [group] char(1) 
    )

    CREATE NONCLUSTERED INDEX ix ON T([group])

    INSERT INTO T
    SELECT TOP 1000000 CHAR( 65 + ROW_NUMBER() OVER (ORDER BY @@SPID) % 3)
    FROM sys.all_objects o1, sys.all_objects o2, sys.all_objects o3
    I have also populated with sample data such that there are many rows per group.

    Your original query

    SELECT MAX(id),
           [group]
    FROM   T
    GROUP  BY [group]  
    Gives Table 'T'. Scan count 1, logical reads 1367 and the plan

      |--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([Expr1003]=MAX([[T].[id])))
           |--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)
    Rewritten to get the ANY aggregate...

    ;WITH cte AS
    (
    SELECT *,
            ROW_NUMBER() OVER (PARTITION BY [group] ORDER BY [group] ) AS RN
    FROM T)
    SELECT id,
           [group]
    FROM    cte     
    WHERE RN=1
    Gives Table 'T'. Scan count 1, logical reads 1367 and the plan

      |--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([[T].[id]=ANY([[T].[id])))
           |--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)
    Even though potentially SQL Server could stop processing the group as soon as the first value is found and skip to the next one it doesn't. It still processes all rows and the logical reads are the same.

    For this particular example with many rows in the group a more efficient version would be a recursive CTE.

    WITH    RecursiveCTE
    AS      (
            SELECT TOP 1 id, [group]
            FROM T
            ORDER BY [group]
            UNION   ALL
            SELECT  R.id, R.[group]
            FROM    (
                    SELECT  T.*,
                            rn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
                    FROM    T
                    JOIN    RecursiveCTE R
                            ON  R.[group] < T.[group]
                    ) R
            WHERE   R.rn = 1
            )
    SELECT  *
    FROM    RecursiveCTE
    OPTION  (MAXRECURSION 0);
    Which gives

    Table 'Worktable'. Scan count 2, logical reads 19
    Table 'T'. Scan count 4, logical reads 12
    The logical reads are much less as it retrieves the first row per group then seeks into the next group rather than reading a load of records that don't contribute to the final result.

Answer

NEWSLETTER

Enter your email address here always to be updated. We promise not to spam!