在Haskell中,有一种称为"as"-operator的语言特性(有时称为别名).这个想法如下:假设您有一个函数,例如将列表作为输入并希望返回所有尾部,您可以将其实现为:
tails a@(_:xs) = a : tails xs tails [] = [[]]
的@
,你必须把全部参数的引用,以及对参数的结构的某些部分的引用保证.这是智能性能(它更像是一个性能破解,因为重建一个数组(x:xs)
)在第一行的主体中,如果没有被编译器优化,将导致分配一个新对象,修改字段等.见这里欲获得更多信息.
我想知道Prolog是否有相同的东西:例如,如果你想在Prolog中实现尾部,可以通过以下方式完成:
tails([H|T],[[H|T]|TA]) :- tails(T,TA). tails([],[[]]).
但如果有一个"as"运算符,它可能会更有效:
tails(L@[_|T],[L|TA]) :- %This does not compile tails(T,TA). tails([],[[]]).
有没有这样的结构或语言扩展?
TL; DR:好主意1!加速似乎限制在约20%(对于大多数列表大小).
在这个答案中,我们比较了三个不同的谓词,这些谓词在@
数据重用方面有所不同:
list_tails([], [[]]). % (1) like `tails/2` given by the OP ... list_tails([E|Es], [[E|Es]|Ess]) :- % ....... but with a better name :-) list_tails(Es, Ess). list_sfxs1(Es, [Es|Ess]) :- % (2) "re-use, mutual recursion" aux_list_sfxs1(Es, Ess). % "sfxs" is short for "suffixes" aux_list_sfxs1([], []). aux_list_sfxs1([_|Es], Ess) :- list_sfxs1(Es, Ess). list_sfxs2([], [[]]). % (3) "re-use, direct recursion" list_sfxs2(Es0, [Es0|Ess]) :- Es0 = [_|Es], list_sfxs2(Es, Ess).
要测量运行时,我们使用以下代码:
:-( dif(D,sicstus), current_prolog_flag(dialect,D) ; use_module(library(between)) ). run_benchs(P_2s, P_2, L, N, T_ms) :- between(1, 6, I), L is 10^I, N is 10^(8-I), length(Xs, L), member(P_2, P_2s), garbage_collect, call_walltime(run_bench_core(P_2,Xs,N), T_ms). run_bench_core(P_2, Xs, N) :- between(1, N, _), call(P_2, Xs, _), false. run_bench_core(_, _, _).
为了测量壁垒时间2,我们利用-a变化:call_walltime/2
call_time/2
call_walltime(G, T_ms) :- statistics(walltime, [T0|_]), G, statistics(walltime, [T1|_]), T_ms is T1 - T0.
让我们将上面的代码变体放到测试中......
...使用不同的列表长度L
...
...并且多次运行每个测试N
(为了更好的准确性).
首先,我们使用swi-prolog版本7.3.14(64位):
?- run_benchs([list_sfxs1,list_sfxs2,list_tails], P_2, L, N, T_ms). P_2 = list_sfxs1, L*N = 10*10000000, T_ms = 7925 ; P_2 = list_sfxs2, L*N = 10*10000000, T_ms = 7524 ; P_2 = list_tails, L*N = 10*10000000, T_ms = 6936 ; P_2 = list_sfxs1, L*N = 100*1000000, T_ms = 6502 ; P_2 = list_sfxs2, L*N = 100*1000000, T_ms = 5861 ; P_2 = list_tails, L*N = 100*1000000, T_ms = 5618 ; P_2 = list_sfxs1, L*N = 1000*100000, T_ms = 6434 ; P_2 = list_sfxs2, L*N = 1000*100000, T_ms = 5817 ; P_2 = list_tails, L*N = 1000*100000, T_ms = 9916 ; P_2 = list_sfxs1, L*N = 10000*10000, T_ms = 6328 ; P_2 = list_sfxs2, L*N = 10000*10000, T_ms = 5688 ; P_2 = list_tails, L*N = 10000*10000, T_ms = 9442 ; P_2 = list_sfxs1, L*N = 100000*1000, T_ms = 10255 ; P_2 = list_sfxs2, L*N = 100000*1000, T_ms = 10296 ; P_2 = list_tails, L*N = 100000*1000, T_ms = 14592 ; P_2 = list_sfxs1, L*N = 1000000*100, T_ms = 6955 ; P_2 = list_sfxs2, L*N = 1000000*100, T_ms = 6534 ; P_2 = list_tails, L*N = 1000000*100, T_ms = 9738.
然后,我们使用sicstus-prolog版本4.3.2(64位)重复上一个查询3:
?- run_benchs([list_sfxs1,list_sfxs2,list_tails], P_2, L, N, T_ms). P_2 = list_sfxs1, L*N = 10*10000000, T_ms = 1580 ; P_2 = list_sfxs2, L*N = 10*10000000, T_ms = 1610 ; P_2 = list_tails, L*N = 10*10000000, T_ms = 1580 ; P_2 = list_sfxs1, L*N = 100*1000000, T_ms = 710 ; P_2 = list_sfxs2, L*N = 100*1000000, T_ms = 750 ; P_2 = list_tails, L*N = 100*1000000, T_ms = 840 ; P_2 = list_sfxs1, L*N = 1000*100000, T_ms = 650 ; P_2 = list_sfxs2, L*N = 1000*100000, T_ms = 660 ; P_2 = list_tails, L*N = 1000*100000, T_ms = 740 ; P_2 = list_sfxs1, L*N = 10000*10000, T_ms = 620 ; P_2 = list_sfxs2, L*N = 10000*10000, T_ms = 650 ; P_2 = list_tails, L*N = 10000*10000, T_ms = 740 ; P_2 = list_sfxs1, L*N = 100000*1000, T_ms = 670 ; P_2 = list_sfxs2, L*N = 100000*1000, T_ms = 650 ; P_2 = list_tails, L*N = 100000*1000, T_ms = 750 ; P_2 = list_sfxs1, L*N = 1000000*100, T_ms = 12610 ; P_2 = list_sfxs2, L*N = 1000000*100, T_ms = 12560 ; P_2 = list_tails, L*N = 1000000*100, T_ms = 33460.
摘要:
该别名啄可以和确实改善性能显著.
在上面的测试中,与SWI-Prolog相比,SICStus Prolog jit 4的速度提高了10倍!
脚注1:为什么投入(@)/2
规则头的特技?最终得到非惯用的 Prolog代码?
脚注2:我们对总运行时间感兴趣.为什么?因为垃圾收集成本显示更大的数据量!
脚注3:为了便于阅读,答案序列已经过后处理.
脚注4:自4.3.0版以来可用.目前的目标架构包括IA-32和AMD64.