当前位置:  开发笔记 > 编程语言 > 正文

TSQL'滞后'分析功能 - 意外性能差

如何解决《TSQL'滞后'分析功能-意外性能差》经验,为你挑选了1个好方法。

在之前的工作中,我们不得不将项目x与项目x-1进行比较,以获得大量数据(〜十亿行).由于这是在SQL Server 2008 R2上完成的,我们必须使用自联接.这很慢.

我以为我会试验滞后函数; 如果速度快,这将是非常有价值的.我发现它快2到3倍,但因为它应该是一个简单的操作引擎盖,并且因为它的查询计划/表扫描更简单/大大减少,我非常失望.代码重现如下.

创建数据库:

IF EXISTS (SELECT name 
           FROM sys.databases 
           WHERE name = N'TestDBForLag')
   DROP DATABASE TestDBForLag
GO

create database TestDBForLag
ALTER DATABASE TestDBForLag SET RECOVERY SIMPLE 
go

use TestDBForLag
go

set nocount on

create table big (g binary(16) not null)
go

begin transaction

declare @c int = 0

while @c < 100
begin
    insert into big(g) values(cast(newid() as binary(16)))
    set @c += 1
end
commit

go 10000 -- n repeats of last batch, "big" now has 1,000,000 rows

alter table big
    add constraint clustered_PK primary key clustered (g)

查询:

set statistics time on
set statistics io on

-- new style
select  
    g, lag(g, 1) over (order by g) as xx
from big
order by g

-- old style
select  obig.g, 
(
    select max(g)
    from big as ibig
    where ibig.g < obig.g
) as xx
from big as obig
order by g

您可以自己查看实际/估计的查询计划,但这里是统计数据的结果(查询运行两次以折扣编译时间):

(1000000 row(s) affected)
Table 'Worktable'. {edit: everything zero here}.

**Table 'big'. Scan count 1, logical reads 3109**, {edit: everything else is zero here}.

SQL Server Execution Times: CPU time = 1045 ms,  elapsed time = 3516 ms.

---

(1000000 row(s) affected)

**Table 'big'. Scan count 1000001, logical reads 3190609**, {edit: everything else is zero here}.

SQL Server Execution Times:CPU time = 2683 ms,  elapsed time = 3439 ms.

因此,lag需要1次扫描+ 3109次读取并需要约1秒的CPU时间,一个复杂的自连接必须重复遍历btree需要100万次扫描+320万次读取需要~2.7秒.

我认为这种糟糕的表现没有任何理由.有任何想法吗?

在ThinkServer 140,8G ram上运行(所以完全是mem驻留),双核,没有磁盘争用.我很满意将结果集传输到在同一台机器上运行的客户端的时间是可以忽略的.

select @@version 

收益:

Microsoft SQL Server 2014 - 12.0.4213.0 (X64) Developer Edition (64-bit) 
on Windows NT 6.1  (Build 7601: Service Pack 1)

编辑:

根据@ vnov的评论,在我发布之前,我确实小心地折扣了客户端开销.我在看CPU时间而不是整体时间.测试:

select *
from big

Table 'big'. Scan count 1, logical reads 3109, {rest zero}
SQL Server Execution Times: CPU time = 125 ms,  elapsed time = 2840 ms.

select count(*)
from big

Table 'big'. Scan count 1, logical reads 3109, {rest zero}
SQL Server Execution Times: CPU time = 109 ms,  elapsed time = 129 ms.

lag 只是不应该添加任何重要的AFAICS,更不用说一个数量级了.



EDIT2:

@Frisbee不明白为什么我认为滞后很差.基本上,该算法是记住先前的值并在之后的n行中传递它.如果n = 1这更加微不足道,那么我使用游标做了一些代码,有和没有自制滞后,并测量.我还简单地总结了结果,因此根据vnov的观点,它并没有返回巨大的结果集.游标和选择都给出了相同的sumg = 127539666,sumglag = 127539460的结果.代码使用与上面创建的相同的DB +表.

选择版本:

select 
    sum(cast(g as tinyint)) as sumg
from (
    select g
    from big
) as xx


select 
    sum(cast(g as tinyint)) as sumg, 
    sum(cast(glag as tinyint)) as sumglag
from (
    select g, lag(g, 1) over (order by g) as glag
    from big
) as xx

我没有进行批量测量,但通过观察,这里的普通选择与滞后相当一致~360-400ms对比~1700-1900ms,因此慢4或5倍.

对于游标,顶部模拟第一个选择,底部模拟选择滞后:

---------- nonlagging batch --------------
use TestDBForLag
set nocount on

DECLARE crsr CURSOR FAST_FORWARD READ_ONLY FOR 
select g from big order by g 

DECLARE @g binary(16), @sumg int = 0
OPEN crsr

FETCH NEXT FROM crsr INTO @g
WHILE (@@fetch_status <> -1)
BEGIN
    IF (@@fetch_status <> -2)
    BEGIN
        set @sumg += cast(@g as tinyint)
    END
    FETCH NEXT FROM crsr INTO @g
END

CLOSE crsr
DEALLOCATE crsr

select @sumg as sumg

go 300


---------- lagging batch --------------
use TestDBForLag
set nocount on

DECLARE crsr CURSOR FAST_FORWARD READ_ONLY FOR 
select g from big order by g

DECLARE @g binary(16), @sumg int = 0 
DECLARE @glag binary(16) = 0, @sumglag int = 0
OPEN crsr

FETCH NEXT FROM crsr INTO @g
WHILE (@@fetch_status <> -1)
BEGIN
    IF (@@fetch_status <> -2)
    BEGIN
        set @sumg += cast(@g as tinyint)
        set @sumglag += cast(@glag as tinyint)  -- the only ...
        set @glag = @g  -- ... differences
    END
    FETCH NEXT FROM crsr INTO @g
END

CLOSE crsr
DEALLOCATE crsr

select @sumg as sumg, @sumglag as sumglag

go 300

运行上面的SQL分析器(删除SQL:批量启动事件),对我来说需要大约2.5小时,将跟踪保存为名为"跟踪"的表,然后运行此命令以获得平均持续时间

-- trace save duration as microseconds, 
-- divide by 1000 to get back to milli
select 
    cast(textdata as varchar(8000)) as textdata, 
    avg(duration/1000) as avg_duration_ms
from trace
group by cast(textdata as varchar(8000))

对我来说,非滞留游标平均需要13.65秒,光标仿真延迟需要16.04秒.后者的大部分额外时间都来自解释器处理额外语句的开销(如果在C中实现,我预计它会少得多),但无论如何,计算滞后的时间少于20% .

那么,这个解释听起来是否合理,并且任何人都可以建议为什么滞后在select语句中表现如此糟糕?



1> Vladimir Bar..:

检查两种变体的执行计划,你会看到发生了什么.我为此使用免费版本的SQL Sentry Plan Explorer.

我正在比较这三个查询(另外还有一个OUTER APPLY):

select count(*)
from big;

-- new style
select  
    g, lag(g) over (order by g) as xx
from big
order by g;

-- old style
select  obig.g, 
(
    select max(g)
    from big as ibig
    where ibig.g < obig.g
) as xx
from big as obig
order by g;

统计

Q1

Q2

Q3

1)LAG使用Window Spool实现,它提供两倍于临时工作表的行数(1,999,999)(在这种情况下它在内存中,但仍然).该窗口滑不缓存在工作台全部1,000,000行,它只缓存窗口大小.

Window Spool操作符将每一行扩展为表示与其关联的窗口的行集.

计划中还有许多其他不那么重的运营商.这里的要点是LAG在游标测试中没有实现.

2)旧式查询的计划非常好.优化器可以智能地扫描一次表TOP并对每行进行索引搜索以进行计算MAX.是的,这是百万寻求,但一切都在记忆中,所以它相对较快.

3)将鼠标悬停在计划运算符之间的粗箭头上,您将看到实际的数据大小.它是Window Spool的两倍大.因此,当一切都在内存和CPU绑定时,它变得很重要.

4)您的旧样式查询可以重写为:

select  obig.g, A.g
from big as obig
OUTER APPLY
(
    SELECT TOP(1) ibig.g
    FROM big as ibig
    WHERE ibig.g < obig.g
    ORDER BY ibig.g DESC
) AS A
order by obig.g;

Q4

,效率更高一些(参见截图中的CPU列).


因此,LAG读取页数非常高效,但使用CPU相当多.

推荐阅读
无名有名我无名_593
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有