12赞

当前位置: 开发笔记 > 编程语言 > 正文

什么是"大O"符号的简单英语解释？

作者：保佑欣疼你的芯疼 | 2023-08-31 13:54

如何解决《什么是"大O"符号的简单英语解释？》经验，为你挑选了28个好方法。

我更喜欢尽可能少的正式定义和简单的数学.

1> cletus..：

快速注意,这几乎肯定会混淆Big O符号(这是一个上限)与Theta符号(这是一个双边界限).根据我的经验,这实际上是非学术环境中的典型讨论.对所引起的任何混乱道歉.

使用此图可以显示Big O复杂度:

大O分析

我可以为Big-O表示法给出的最简单的定义是:

Big-O表示法是算法复杂性的相对表示.

在这句话中有一些重要且刻意选择的词:

亲戚:你只能比较苹果和苹果.您无法将算法与算术乘法进行比较,而是对整数列表进行排序.但是比较两个算法进行算术运算(一次乘法,一次加法)会告诉你一些有意义的东西;

表示: Big-O(最简单的形式)减少了算法与单个变量之间的比较.该变量基于观察或假设来选择.例如,通常基于比较操作来比较排序算法(比较两个节点以确定它们的相对排序).这假设比较昂贵.但是,如果比较便宜但交换费用昂贵呢？它改变了比较; 和

复杂性:如果需要一秒钟才能对10,000个元素进行排序,那么我需要多长时间才能排序一百万个元素？在这种情况下,复杂性是对其他事物的相对衡量.

当你读完其余部分后,回过头来重读上面的内容.

我能想到的Big-O最好的例子就是做算术.取两个数字(123456和789012).我们在学校学到的基本算术运算是:

加成;

减法;

乘法; 和

师.

这些都是操作或问题.解决这些问题的方法称为算法.

增加是最简单的.您将数字向上排列(向右)并在列中添加数字,在结果中写入该添加的最后一个数字.该数字的"十"部分将转移到下一列.

让我们假设添加这些数字是该算法中最昂贵的操作.按理说,要将这两个数字加在一起,我们必须将6位数加在一起(并且可能带有7位数).如果我们将两个100位数字加在一起,我们必须做100次加法.如果我们添加两个 10,000位数字,我们必须添加10,000个.

看模式？的复杂性(即操作的次数)正比于数字数量Ñ在较大的数量.我们称之为O(n)或线性复杂度.

减法是类似的(除了你可能需要借用而不是随身携带).

乘法是不同的.你将数字排成一行,取下面数字中的第一个数字,然后依次将它与顶部数字中的每个数字相乘,依此类推.因此,要将两个6位数相乘,我们必须进行36次乘法运算.我们可能需要多做10或11列添加才能获得最终结果.

如果我们有两个100位数字,我们需要进行10,000次乘法和200次加法.对于两百万个数字,我们需要进行一万亿(10 ¹²)次乘法和两百万次加法.

当算法按n 平方缩放时,这是O(n ²)或二次复杂度.这是介绍另一个重要概念的好时机:

我们只关心复杂性中最重要的部分.

精明的人可能已经意识到我们可以将操作次数表示为:n ² + 2n.但正如你从我们的例子中看到的那样,每个数字有两个数字,第二个词(2n)变得微不足道(占该阶段总操作的0.0002%).

人们可以注意到我们已经假设了最糟糕的情况.如果其中一个是4位数而另一个是6位数,则乘以6位数字,那么我们只有24次乘法.我们仍然计算'n'的最坏情况,即当两者都是6位数时.因此,Big-O表示法是关于算法的最坏情况

电话簿

我能想到的下一个最好的例子是电话簿,通常称为白页或类似的,但它会因国家而异.但我说的是那个按姓氏列出人名,然后是姓名首字母或名字,可能是地址,然后是电话号码的人.

现在,如果您指示计算机在包含1,000,000个名字的电话簿中查找"John Smith"的电话号码,您会怎么做？忽略一个事实,你可以猜到S的开始有多远(让我们假设你不能),你会做什么？

一个典型的实现可能是开到中间,拿50万^个,并把它比作"史密斯".如果恰好是"史密斯,约翰",我们真的很幸运.更有可能的是,"约翰史密斯"将在该名称之前或之后.如果是在我们之后,我们将电话簿的后半部分分成两半并重复.如果是在那之前,我们将电话簿的前半部分分成两半并重复.等等.

这称为二进制搜索,无论您是否意识到,它每天都会在编程中使用.

因此,如果您想在一百万个名字的电话簿中找到一个名字,您最多可以通过这样做20次来找到任何名称.在比较搜索算法时,我们认为这种比较是我们的'n'.

对于3个名字的电话簿,它需要进行2次比较(最多).

对于7最多需要3个.

15岁需要4个.

...

1,000,000需要20.

那是惊人的好不是吗？

在Big-O术语中,这是O(log n)或对数复杂度.现在,所讨论的对数可以是ln(基数e),log ₁₀,log ₂或其他一些基数.无论如何它仍然是O(log n)就像O(2n ²)和O(100n ²)仍然都是O(n ²).

在这一点上值得解释的是Big O可用于通过算法确定三种情况:

最佳案例:在电话簿搜索中,最好的情况是我们在一次比较中找到了名称.这是O(1)或不变的复杂性 ;

预期案例:如上所述,这是O(log n); 和

最坏情况:这也是O(log n).

通常我们不关心最好的情况.我们对预期和最坏的情况感兴趣.有时这些中的一个或另一个将更重要.

回到电话簿.

如果您有电话号码并想要找到名字怎么办？警方有一本反向电话簿,但这些查询被公众拒绝.或者是他们？从技术上讲,您可以在普通电话簿中反向查找数字.怎么样？

您从名字开始并比较数字.如果它是一场比赛,那么很棒,如果没有,你继续前进.你必须这样做,因为电话簿是无序的(无论如何通过电话号码).

所以要给出一个给出电话号码的名字(反向查询):

最佳案例: O(1);

预期案例: O(n)(500,000); 和

最坏情况: O(n)(1,000,000).

旅行推销员

这是计算机科学中一个非常着名的问题,值得一提.在这个问题上你有N个城镇.这些城镇中的每一个都通过一定距离的道路与一个或多个其他城镇相连.旅行推销员的问题是找到访问每个城镇的最短旅行.

听起来很简单？再想想.

如果您有3个城镇A,B和C,所有城市之间都有道路,那么您可以去:

A→B→C

A→C→B

B→C→A

B→A→C

C→A→B

C→B→A

实际上还不到那个,因为其中一些是等价的(例如,A→B→C和C→B→A是等价的,因为它们使用相同的道路,正好相反).

实际上有3种可能性.

把它带到4个城镇,你有(iirc)12种可能性.

5岁就是60岁.

6变为360.

这是称为阶乘的数学运算的函数.基本上:

5!= 5×4×3×2×1 = 120

6!= 6×5×4×3×2×1 = 720

7!= 7×6×5×4×3×2×1 = 5040

...

25!= 25×24×...×2×1 = 15,511,210,043,330,985,984,000,000

...

50!= 50×49×...×2×1 = 3.04140932×10 ⁶⁴

因此,旅行商问题的大O是O(n!)或阶乘或组合复杂性.

当你到达200个城镇时,宇宙中没有足够的时间来解决传统计算机的问题.

需要考虑的事情.

多项式时间

我想要快速提及的另一点是,任何具有O(n ^a)复杂度的算法都被认为具有多项式复杂性或者在多项式时间内是可解的.

O(n),O(n ²)等都是多项式时间.有些问题在多项式时间内无法解决.因此,世界上使用了某些东西.公钥加密是一个很好的例子.在计算上很难找到两个非常大的素数因子.如果不是,我们就无法使用我们使用的公钥系统.

无论如何,这就是我对Big O(修订版)的解释(希望是简单的英语).

而其他答案则侧重于解释O(1),O(n ^ 2)等人之间的差异......你的问题是详细说明如何将算法分类为n ^ 2,nlog(n)等.+ 1是一个很好的答案,帮助我理解Big O符号

-1:这是明显错误的:_"BigOh是算法复杂性的相对表示".不,BigOh是一个渐近的上界,并且很好地独立于计算机科学.O(n)是线性的.不,你把BigOh与theta混淆了.log n是O(n).1是O(n).这个答案(以及评论)的赞成数量,这使得将Theta与BigOh混淆的基本错误令人非常尴尬......

_"当你到达200个城镇时,宇宙中没有足够的时间来解决传统计算机的问题."_当宇宙即将结束时？

有人可能想补充一点,大O代表一个上界(由算法给出),大欧米茄给出一个下界(通常作为独立于特定算法的证明)和大-Theta意味着一个"最优"算法达到那个下限是众所周知的.

如果你正在寻找最长的答案,这是很好的,但不是以最简单的方式解释Big-O的答案.

我关注的是@jimifiki.Big-O仅在N很大时才有用.当N很小时,前因子通常很重要.一个很好的例子是[插入排序](http://en.wikipedia.org/wiki/Insertion_sort).插入排序是O(N ^ 2),但具有非常好的缓存局部性.当列表很小(<10个元素)时,这使得它比许多O(N log N)算法更快.类似地,对于小N,在二叉树中查找通常比哈希表更快.良好的散列函数可以咀嚼很多循环,使前因非常重要.

@Moberg f(n)为O(g(n))的事实不排除f(n)对于与g(n)不同的h(n)为O(h(n))的可能性.平凡地,n是O(n)并且n也是O(2n).实际上,O(log(n))是O(n)的子集.所以在log(n)的例子中,log(n)都在O(log(n))和O(n)中.你混淆了Theta符号和大O符号.

@Isaac这并不重要:`200!纳秒〜= 1.8×10 ^ 348×宇宙年龄`https://www.wolframalpha.com/input/?i=200%21+nanoseconds

观众,不要忘记这个答案中的错误让Big-O,Omega和Theta感到困惑.阅读这个答案,欣赏它,然后查找Theta(粗略预期案例)和Omega(粗略下限); 因为Big-O完全是粗糙的上限.

@Paul Fisher:NP-hard并不意味着"比最难的NP完全问题更难",它意味着"至少和NP完全问题一样难".有一个很大的区别!

我猜他应该回答Omega和Theta,以便上面的所有评论都会得到回答,也会建议将问题改为bigOh和omega以及theta之间的差异.

@Aryabhatta你的意思是log(n)是O(n)？log(n)显然是O(log(n)),不是吗？(见http://en.wikipedia.org/wiki/Big_oh_notation)

@JacobAkkerboom啊,是的,这也是事实.我正在以正确的方式阅读帖子.事情并不是那么清楚.虽然,我并没有把它与Theta混淆.因为我之前从未听说过Theta.但显然它是平均值而不是上限.

`传统计算机可以解决多项式时间问题.他们可以或不可以吗？

当j是常数时,@ Josh`log(n ^ c)= c*log(n)`和`O(c*log(n))= O(log(n))`.所以,`O(log(n ^ 2))= O(log(n ^ 3))= O(log(n))`.因此,更改日志库不会影响大O表示法,并且您引用的语句是正确的.

2> Ray Hidayat..：

它显示了算法如何扩展.

O(n ²):称为二次复杂度

1项:1秒

10项:100秒

100项:10000秒

注意,项目数由10倍的增加,但通过10倍的时间增加².基本上,n = 10,因此O(n ²)给出了比例因子n ²,即10 ².

O(n):称为线性复杂度

1项:1秒

10项:10秒

100项:100秒

这次项目数量增加了10倍,时间也增加了10倍.n = 10,所以O(n)的比例因子是10.

O(1):称为常数复杂度

1项:1秒

10项:1秒

100项:1秒

项目数仍然增加10倍,但O(1)的比例因子始终为1.

O(log n):称为对数复杂度

1项:1秒

10项:2秒

100项:3秒

1000项:4秒

10000项:5秒

计算次数仅增加输入值的对数.因此,在这种情况下,假设每次计算需要1秒n,因此输入的日志是所需的时间log n.

这是它的要点.他们减少了数学,因此它可能不是n ²或它们所说的任何东西,但这将是缩放的主要因素.

不是秒,操作.而且,你错过了阶乘和对数时间.

这并不能很好地解释O(n ^ 2)可能正在描述一个精确运行的算法.01*n ^ 2 + 999999*n + 999999.重要的是要知道算法是用这个比例进行比较的,那个当n'足够大'时,比较有效.Python的timsort实际上对小型数组使用插入排序(最差/平均情况O(n ^ 2)),因为它的开销很小.

这个答案也混淆了大O符号和Theta符号.对于所有输入(通常简写为1)返回1的n的函数实际上是O(n ^ 2)(即使它也在O(1)中).类似地,仅需要执行花费恒定时间量的一个步骤的算法也被认为是O(1)算法,但也被认为是O(n)和O(n ^ 2)算法.但也许数学家和计算机科学家不同意这个定义: - /.

这个定义到底意味着什么？(项目数仍然增加10倍,但O(1)的比例因子始终为1.)

3> ninjagecko..：

当你忽略原点附近的常数因子和东西时, Big-O表示法(也称为"渐近增长"表示法)是"看起来像"的功能.我们用它来谈论事物的规模.

基本

对于"足够"的大输入......

f(x) ? O(upperbound)意思是f"增长不快"upperbound

f(x) ? ?(justlikethis)意思是f"长得很像"justlikethis

f(x) ? ?(lowerbound)意思是f"增长不慢"lowerbound

big-O notation doesn't care about constant factors: the function 9x² is said to "grow exactly like" 10x². Neither does big-O asymptotic notation care about non-asymptotic stuff ("stuff near the origin" or "what happens when the problem size is small"): the function 10x² is said to "grow exactly like" 10x² - x + 2.

Why would you want to ignore the smaller parts of the equation? Because they become completely dwarfed by the big parts of the equation as you consider larger and larger scales; their contribution becomes dwarfed and irrelevant. (See example section.)

Put another way, it's all about the ratio as you go to infinity. If you divide the actual time it takes by the O(...), you will get a constant factor in the limit of large inputs. Intuitively this makes sense: functions "scale like" one another if you can multiply one to get the other. That is, when we say...

actualAlgorithmTime(N) ? O(bound(N))
                                       e.g. "time to mergesort N elements 
                                             is O(N log(N))"

... this means that for "large enough" problem sizes N (if we ignore stuff near the origin), there exists some constant (e.g. 2.5, completely made up) such that:

actualAlgorithmTime(N)                 e.g. "mergesort_duration(N)       "
?????????????????????? < constant            ????????????????????? < 2.5 
       bound(N)                                    N log(N)

There are many choices of constant; often the "best" choice is known as the "constant factor" of the algorithm... but we often ignore it like we ignore non-largest terms (see Constant Factors section for why they don't usually matter). You can also think of the above equation as a bound, saying "In the worst-case scenario, the time it takes will never be worse than roughly N*log(N), within a factor of 2.5 (a constant factor we don't care much about)".

In general, O(...) is the most useful one because we often care about worst-case behavior. If f(x) represents something "bad" like processor or memory usage, then "f(x) ? O(upperbound)" means "upperbound is the worst-case scenario of processor/memory usage".

Applications

作为纯粹的数学结构,big-O表示法不仅限于讨论处理时间和内存.您可以使用它来讨论缩放有意义的任何事物的渐近性,例如:

N聚会中人们可能握手的次数(?(N²)具体而言N(N-1)/2,但重要的是它"按比例缩放" N²)

概率预期的一些人将某些病毒式营销视为时间的函数

网站延迟如何随CPU或GPU或计算机集群中的处理单元数量而变化

CPU上的热量输出如何随晶体管数量,电压等而变化.

作为输入大小的函数,算法需要运行多长时间

作为输入大小的函数,算法需要运行多少空间

例

For the handshake example above, everyone in a room shakes everyone else's hand. In that example, #handshakes ? ?(N²). Why?

Back up a bit: the number of handshakes is exactly n-choose-2 or N*(N-1)/2 (each of N people shakes the hands of N-1 other people, but this double-counts handshakes so divide by 2):

everyone handshakes everyone else. Image credit and license per wikipedia/wikimedia commons adjacency matrix

However, for very large numbers of people, the linear term N is dwarfed and effectively contributes 0 to the ratio (in the chart: the fraction of empty boxes on the diagonal over total boxes gets smaller as the number of participants becomes larger). Therefore the scaling behavior is order N², or the number of handshakes "grows like N²".

#handshakes(N)
?????????????? ? 1/2
     N²

It's as if the empty boxes on the diagonal of the chart (N*(N-1)/2 checkmarks) weren't even there (N² checkmarks asymptotically).

(temporary digression from "plain English":) If you wanted to prove this to yourself, you could perform some simple algebra on the ratio to split it up into multiple terms (lim means "considered in the limit of", just ignore it if you haven't seen it, it's just notation for "and N is really really big"):

    N²/2 - N/2         (N²)/2   N/2         1/2
lim ?????????? = lim ( ?????? - ??? ) = lim ??? = 1/2
N??     N²       N??     N²     N²      N??  1
                               ?????
             this is 0 in the limit of N??:
             graph it, or plug in a really large number for N

tl;dr: The number of handshakes 'looks like' x² so much for large values, that if we were to write down the ratio #handshakes/x², the fact that we don't need exactly x² handshakes wouldn't even show up in the decimal for an arbitrarily large while.

e.g. for x=1million, ratio #handshakes/x²: 0.499999...

Building Intuition

This lets us make statements like...

"For large enough inputsize=N, no matter what the constant factor is, if I double the input size...

... I double the time an O(N) ("linear time") algorithm takes."

N ? (2N) = 2(N)

... I double-squared (quadruple) the time an O(N²) ("quadratic time") algorithm takes." (e.g. a problem 100x as big takes 100²=10000x as long... possibly unsustainable)

N² ? (2N)² = 4(N²)

... I double-cubed (octuple) the time an O(N³) ("cubic time") algorithm takes." (e.g. a problem 100x as big takes 100³=1000000x as long... very unsustainable)

cN³ ? c(2N)³ = 8(cN³)

... I add a fixed amount to the time an O(log(N)) ("logarithmic time") algorithm takes." (cheap!)

c log(N) ? c log(2N) = (c log(2))+(c log(N)) = (fixed amount)+(c log(N))

... I don't change the time an O(1) ("constant time") algorithm takes." (the cheapest!)

c*1 ? c*1

... I "(basically) double" the time an O(N log(N)) algorithm takes." (fairly common)

it's less than O(N^1.000001), which you might be willing to call basically linear

... I ridiculously increase the time a O(2^N) ("exponential time") algorithm takes." (you'd double (or triple, etc.) the time just by increasing the problem by a single unit)

2^N ? 2^2N = (4^N)............put another way...... 2^N ? 2^N+1 = 2^N2¹ = 2 2^N

[for the mathematically inclined, you can mouse over the spoilers for minor sidenotes]

(with credit to /sf/ask/17360801/ )

(technically the constant factor could maybe matter in some more esoteric examples, but I've phrased things above (e.g. in log(N)) such that it doesn't)

These are the bread-and-butter orders of growth that programmers and applied computer scientists use as reference points. They see these all the time. (So while you could technically think "Doubling the input makes an O(?N) algorithm 1.414 times slower," it's better to think of it as "this is worse than logarithmic but better than linear".)

Constant factors

Usually we don't care what the specific constant factors are, because they don't affect the way the function grows. For example, two algorithms may both take O(N) time to complete, but one may be twice as slow as the other. We usually don't care too much unless the factor is very large, since optimizing is tricky business ( When is optimisation premature? ); also the mere act of picking an algorithm with a better big-O will often improve performance by orders of magnitude.

Some asymptotically superior algorithms (e.g. a non-comparison O(N log(log(N))) sort) can have so large a constant factor (e.g. 100000*N log(log(N))), or overhead that is relatively large like O(N log(log(N))) with a hidden + 100*N, that they are rarely worth using even on "big data".

Why O(N) is sometimes the best you can do, i.e. why we need datastructures

O(N) algorithms are in some sense the "best" algorithms if you need to read all your data. The very act of reading a bunch of data is an O(N) operation. Loading it into memory is usually O(N) (or faster if you have hardware support, or no time at all if you've already read the data). However if you touch or even look at every piece of data (or even every other piece of data), your algorithm will take O(N) time to perform this looking. Nomatter how long your actual algorithm takes, it will be at least O(N) because it spent that time looking at all the data.

The same can be said for the very act of writing. All algorithms which print out N things will take N time, because the output is at least that long (e.g. printing out all permutations (ways to rearrange) a set of N playing cards is factorial: O(N!)).

This motivates the use of data structures: a data structure requires reading the data only once (usually O(N) time), plus some arbitrary amount of preprocessing (e.g. O(N) or O(N log(N)) or O(N²)) which we try to keep small. Thereafter, modifying the data structure (insertions/deletions/etc.) and making queries on the data take very little time, such as O(1) or O(log(N)). You then proceed to make a large number of queries! In general, the more work you're willing to do ahead of time, the less work you'll have to do later on.

For example, say you had the latitude and longitude coordinates of millions of roads segments, and wanted to find all street intersections.

Naive method: If you had the coordinates of a street intersection, and wanted to examine nearby streets, you would have to go through the millions of segments each time, and check each one for adjacency.

If you only needed to do this once, it would not be a problem to have to do the naive method of O(N) work only once, but if you want to do it many times (in this case, N times, once for each segment), we'd have to do O(N²) work, or 1000000²=1000000000000 operations. Not good (a modern computer can perform about a billion operations per second).

If we use a simple structure called a hash table (an instant-speed lookup table, also known as a hashmap or dictionary), we pay a small cost by preprocessing everything in O(N) time. Thereafter, it only takes constant time on average to look up something by its key (in this case, our key is the latitude and longitude coordinates, rounded into a grid; we search the adjacent gridspaces of which there are only 9, which is a constant).

Our task went from an infeasible O(N²) to a manageable O(N), and all we had to do was pay a minor cost to make a hash table.

analogy: The analogy in this particular case is a jigsaw puzzle: We created a data structure which exploits some property of the data. If our road segments are like puzzle pieces, we group them by matching color and pattern. We then exploit this to avoid doing extra work later (comparing puzzle pieces of like color to each other, not to every other single puzzle piece).

The moral of the story: a data structure lets us speed up operations. Even more advanced data structures can let you combine, delay, or even ignore operations in incredibly clever ways. Different problems would have different analogies, but they'd all involve organizing the data in a way that exploits some structure we care about, or which we've artificially imposed on it for bookkeeping. We do work ahead of time (basically planning and organizing), and now repeated tasks are much much easier!

Practical example: visualizing orders of growth while coding

Asymptotic notation is, at its core, quite separate from programming. Asymptotic notation is a mathematical framework for thinking about how things scale, and can be used in many different fields. That said... this is how you apply asymptotic notation to coding.

The basics: Whenever we interact with every element in a collection of size A (such as an array, a set, all keys of a map, etc.), or perform A iterations of a loop, that is a multiplcative factor of size A. Why do I say "a multiplicative factor"?--because loops and functions (almost by definition) have multiplicative running time: the number of iterations, times work done in the loop (or for functions: the number of times you call the function, times work done in the function). (This holds if we don't do anything fancy, like skip loops or exit the loop early, or change control flow in the function based on arguments, which is very common.) Here are some examples of visualization techniques, with accompanying pseudocode.

(here, the xs represent constant-time units of work, processor instructions, interpreter opcodes, whatever)

for(i=0; i A*1 --> O(A) time

visualization:

|<------ A ------->|
1 2 3 4 5 x x ... x

other languages, multiplying orders of growth:
  javascript, O(A) time and space
    someListOfSizeA.map((x,i) => [x,i])               
  python, O(rows*cols) time and space
    [[r*c for c in range(cols)] for r in range(rows)]

Example 2:

for every x in listOfSizeA:   // A * (...
    some O(1) operation         // 1
    some O(B) operation         // B
    for every y in listOfSizeC: // C * (...
        some O(1) operation       // 1))

--> O(A*(1 + B + C))
    O(A*(B+C))        (1 is dwarfed)

visualization:

|<------ A ------->|
1 x x x x x x ... x

2 x x x x x x ... x ^
3 x x x x x x ... x |
4 x x x x x x ... x |
5 x x x x x x ... x B  <-- A*B
x x x x x x x ... x |
................... |
x x x x x x x ... x v

x x x x x x x ... x ^
x x x x x x x ... x |
x x x x x x x ... x |
x x x x x x x ... x C  <-- A*C
x x x x x x x ... x |
................... |
x x x x x x x ... x v

Example 3:

function nSquaredFunction(n) {
    total = 0
    for i in 1..n:        // N *
        for j in 1..n:      // N *
            total += i*k      // 1
    return total
}
// O(n^2)

function nCubedFunction(a) {
    for i in 1..n:                // A *
        print(nSquaredFunction(a))  // A^2
}
// O(a^3)

If we do something slightly complicated, you might still be able to imagine visually what's going on:

for x in range(A):
    for y in range(1..x):
        simpleOperation(x*y)

x x x x x x x x x x |
x x x x x x x x x   |
x x x x x x x x     |
x x x x x x x       |
x x x x x x         |
x x x x x           |
x x x x             |
x x x               |
x x                 |
x___________________|

Here, the smallest recognizable outline you can draw is what matters; a triangle is a two dimensional shape (0.5 A^2), just like a square is a two-dimensional shape (A^2); the constant factor of two here remains in the asymptotic ratio between the two, however we ignore it like all factors... (There are some unfortunate nuances to this technique I don't go into here; it can mislead you.)

Of course this does not mean that loops and functions are bad; on the contrary, they are the building blocks of modern programming languages, and we love them. However, we can see that the way we weave loops and functions and conditionals together with our data (control flow, etc.) mimics the time and space usage of our program! If time and space usage becomes an issue, that is when we resort to cleverness, and find an easy algorithm or data structure we hadn't considered, to reduce the order of growth somehow. Nevertheless, these visualization techniques (though they don't always work) can give you a naive guess at a worst-case running time.

Here is another thing we can recognize visually:

<----------------------------- N ----------------------------->
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x
x x x x x x x x
x x x x
x x
x

We can just rearrange this and see it's O(N):

<----------------------------- N ----------------------------->
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x|x x x x x x x x|x x x x|x x|x

Or maybe you do log(N) passes of the data, for O(N*log(N)) total time:

   <----------------------------- N ----------------------------->
 ^  x x x x x x x x x x x x x x x x|x x x x x x x x x x x x x x x x
 |  x x x x x x x x|x x x x x x x x|x x x x x x x x|x x x x x x x x
lgN x x x x|x x x x|x x x x|x x x x|x x x x|x x x x|x x x x|x x x x
 |  x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x
 v  x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x

Unrelatedly but worth mentioning again: If we perform a hash (e.g. a dictionary/hashtable lookup), that is a factor of O(1). That's pretty fast.

[myDictionary.has(x) for x in listOfSizeA]
 \----- O(1) ------/    

--> A*1 --> O(A)

If we do something very complicated, such as with a recursive function or divide-and-conquer algorithm, ~~you can use the Master Theorem (usually works), or in ridiculous cases the Akra-Bazzi Theorem (almost always works)~~ you look up the running time of your algorithm on Wikipedia.

But, programmers don't think like this because eventually, algorithm intuition just becomes second nature. You will start to code something inefficient, and immediately think "am I doing something grossly inefficient?". If the answer is "yes" AND you foresee it actually mattering, then you can take a step back and think of various tricks to make things run faster (the answer is almost always "use a hashtable", rarely "use a tree", and very rarely something a bit more complicated).

Amortized and average-case complexity

There is also the concept of "amortized" and/or "average case" (note that these are different).

Average Case: This is no more than using big-O notation for the expected value of a function, rather than the function itself. In the usual case where you consider all inputs to be equally likely, the average case is just the average of the running time. For example with quicksort, even though the worst-case is O(N^2) for some really bad inputs, the average case is the usual O(N log(N)) (the really bad inputs are very small in number, so few that we don't notice them in the average case).

Amortized Worst-Case: Some data structures may have a worst-case complexity that is large, but guarantee that if you do many of these operations, the average amount of work you do will be better than worst-case. For example you may have a data structure that normally takes constant O(1) time. However, occasionally it will 'hiccup' and take O(N) time for one random operation, because maybe it needs to do some bookkeeping or garbage collection or something... but it promises you that if it does hiccup, it won't hiccup again for N more operations. The worst-case cost is still O(N) per operation, but the amortized cost over many runs is O(N)/N = O(1) per operation. Because the big operations are sufficiently rare, the massive amount of occasional work can be considered to blend in with the rest of the work as a constant factor. We say the work is "amortized" over a sufficiently large number of calls that it disappears asymptotically.

The analogy for amortized analysis:

You drive a car. Occasionally, you need to spend 10 minutes going to the gas station and then spend 1 minute refilling the tank with gas. If you did this every time you went anywhere with your car (spend 10 minutes driving to the gas station, spend a few seconds filling up a fraction of a gallon), it would be very inefficient. But if you fill up the tank once every few days, the 11 minutes spent driving to the gas station is "amortized" over a sufficiently large number of trips, that you can ignore it and pretend all your trips were maybe 5% longer.

Comparison between average-case and amortized worst-case:

Average-case: We make some assumptions about our inputs; i.e. if our inputs have different probabilities, then our outputs/runtimes will have different probabilities (which we take the average of). Usually we assume that our inputs are all equally likely (uniform probability), but if the real-world inputs don't fit our assumptions of "average input", the average output/runtime calculations may be meaningless. If you anticipate uniformly random inputs though, this is useful to think about!

Amortized worst-case: If you use an amortized worst-case data structure, the performance is guaranteed to be within the am

据推测,OP以外的人可能会对这个问题的答案感兴趣.这不是网站的指导原则吗？

一个很好的数学答案,但OP要求一个简单的英语答案.这种水平的数学描述不是理解答案所必需的,尽管对于特别注重数学的人来说,理解它可能比"普通英语"简单得多.然而OP要求后者.

虽然我可以看到为什么人们会撇开我的答案,并认为它太蹩脚(特别是"数学是新的普通英语"讽刺言论,自从删除),原始问题询问关于函数的大O,所以我试图明确地以一种补充普通英语直觉的方式谈论功能.这里的数学经常可以被掩盖,或者用高中数学背景来理解.我确实觉得人们最后可能会看到数学附录,并认为这是答案的一部分,当它只是在那里看到*真实*数学的样子.

这是一个很棒的答案; 比投票最多的IMO要好得多.所需的"数学"不会超出理解"O"之后括号中的表达式所需要的,没有使用任何示例的合理解释可以避免.

4> Jon Skeet..：

编辑:快速注意,这几乎肯定会混淆Big O符号(这是一个上限)与Theta符号(这是一个上限和下限).根据我的经验,这实际上是非学术环境中的典型讨论.对所引起的任何混乱道歉.

用一句话:随着工作规模的增加,完成工作需要多长时间？

显然,只使用"大小"作为输入,"时间"作为输出 - 如果你想谈论内存使用等,同样的想法也适用.

这是一个我们想要干燥的N T恤的例子.我们假设让它们处于干燥位置非常快(即人类的相互作用可以忽略不计).现实情况并非如此,当然......

在外面使用清洗线:假设你有一个无限大的后院,洗涤在O(1)时间内干燥.无论你有多少,它都会得到相同的阳光和新鲜空气,因此尺寸不会影响干燥时间.

使用滚筒式烘干机:每次装入10件衬衫,然后一小时后完成.(忽略这里的实际数字 - 它们是无关紧要的.)因此,干燥50件衬衫所需的时间约为干燥10件衬衫的5倍.

将所有东西放在一个通风橱中:如果我们将所有东西都放在一个大堆中,只是让一般的温暖,它将需要很长时间才能使中间衬衫变干.我不想猜测细节,但我怀疑这至少是O(N ^ 2) - 随着你增加洗涤负荷,干燥时间增加得更快.

"大O"符号的一个重要方面是它没有说明给定大小的哪种算法会更快.获取哈希表(字符串键,整数值)与对数组(字符串,整数).基于字符串,在哈希表或数组中的元素中查找键是否更快？(即对于数组,"找到字符串部分与给定键匹配的第一个元素.")哈希表通常是摊销的(〜="平均")O(1) - 一旦它们被设置,它应该采取同时在100条目表中查找条目,如1,000,000条目表中所示.在数组中查找元素(基于内容而不是索引)是线性的,即O(N) - 平均而言,您将不得不查看一半条目.

这是否使哈希表比查找数组更快？不必要.如果你有一个非常小的条目集合,一个数组可能会更快 - 你可以在计算你正在查看的哈希码的时间内检查所有字符串.然而,随着数据集变大,哈希表最终会击败数组.

jon的解释对我认为的问题非常重要.这正是一个人可以向某个妈妈解释它的方式,她最终会理解它我认为:)我喜欢衣服的例子(特别是最后一个,它解释了复杂性的指数增长)

哈希表需要运行算法来计算实际数组的索引(取决于实现).数组只有O(1),因为它只是一个地址.但这与问题无关,只是一个观察:)

Filip:我不是在谈论通过索引来解决数组,我在谈论在数组中找到匹配的条目.你能重新阅读答案,看看是否仍然不清楚？

@Filip Ekberg我想你正在考虑一个直接地址表,其中每个索引直接映射到一个键,因此是O(1),但我相信Jon正在讨论一个未排序的键/值对数组,你必须搜索通过线性.

@RBT:不,这不是二元查询.只需基于从哈希码到桶索引的转换,它就可以立即获得正确的哈希*桶*.之后,在存储桶中找到正确的哈希码可能是线性的,也可能是二进制搜索...但到那时你只需要字典总大小的一小部分.

5> starblue..：

Big O描述了当输入变大时函数的增长行为的上限,例如程序的运行时.

例子:

O(n):如果我将输入大小加倍,则运行时间加倍

O(n ²):如果输入大小加倍运行时四倍

O(log n):如果输入大小加倍,则运行时间增加1

O(2 ⁿ):如果输入大小增加1,则运行时间加倍

输入大小通常是表示输入所需的位数.

不正确!例如O(n):如果我将输入大小加倍,则运行时将乘以有限非零常数.我的意思是O(n)= O(n + n)

我在谈论f(n)= O(g(n))中的f,而不是你似乎理解的g.

您应该为O(n log n)添加一个示例.

6> cdiggins..：

程序员最常使用Big O表示法作为计算(算法)完成所需时间的近似度量,表示为输入集大小的函数.

Big O可用于比较两种算法随着输入数量的增加而扩展的程度.

更准确地说,Big O表示法用于表示函数的渐近行为.这意味着函数在接近无穷大时的行为方式.

在许多情况下,算法的"O"将属于以下情况之一:

O(1) - 无论输入集的大小如何,完成时间都是相同的.一个例子是通过索引访问数组元素.

O(Log N) - 完成时间大致与log2(n)一致.例如,1024个项目大约需要32个项目的两倍,因为Log2(1024)= 10并且Log2(32)= 5.例如,在二叉搜索树(BST)中查找项目.

O(N) - 完成时间与输入集的大小成线性比例.换句话说,如果您将输入集中的项目数加倍,则算法大约需要两倍的时间.一个例子是计算链表中的项目数.

O(N Log N) - 完成时间增加项目数乘以Log2(N)的结果.一个例子是堆排序和快速排序.

O(N ^ 2) - 完成时间大致等于项目数的平方.一个例子是冒泡排序.

O(N!) - 完成时间是输入集的阶乘.这方面的一个例子是旅行商问题暴力解决方案.

当输入大小朝向无穷大增加时,Big O忽略了对函数的增长曲线没有有意义贡献的因素.这意味着简单地忽略了添加到函数或乘以函数的常量.

7> Filip Ekberg..：

Big O只是一种以一种常见方式"表达"自己的方式,"运行我的代码需要多少时间/空间？".

您可能经常看到O(n),O(n ²),O(nlogn)等等,所有这些只是展示的方式; 算法如何改变？

O(n)意味着大O是n,现在你可能会想,"什么是n!？" "n"是元素的数量.想要在阵列中搜索项目的图像.您必须查看每个元素并将其视为"您是正确的元素/项目吗？" 在最坏的情况下,该项目位于最后一个索引,这意味着它花费的时间与列表中的项目一样多,因此为了通用,我们说"哦,嘿,n是一个公平的给定数量的值!" .

那么你可能会理解"n ² "意味着什么,但更具体地说,你可以想到你有一个简单,最简单的排序算法; 冒泡.该算法需要查看每个项目的整个列表.

我的列表

这里的流程将是:

比较1和6,哪个最大？Ok 6处于正确的位置,向前迈进!

比较6和3,哦,3更少!让我们动起来吧,好的清单改变了,我们需要从现在开始!

这是O n ²因为,您需要查看列表中的所有项目都有"n"项.对于每个项目,您再次查看所有项目,为了进行比较,这也是"n",因此对于每个项目,您看起来"n"次意味着n*n = n ²

我希望这就像你想要的一样简单.

但请记住,Big O只是一种以时间和空间的方式超越自我的方式.

8> Wedge..：

Big O描述了算法的基本缩放特性.

Big O没有告诉你有关给定算法的大量信息.它切入骨骼并仅提供有关算法的缩放性质的信息,特别是算法的资源使用(思考时间或内存)如何根据"输入大小"进行缩放.

考虑蒸汽机和火箭之间的区别.它们不仅仅是同一种物品的不同品种(例如,普锐斯发动机与兰博基尼发动机),但它们的核心是不同类型的推进系统.蒸汽机可能比玩具火箭更快,但没有蒸汽活塞发动机能够达到轨道运载火箭的速度.这是因为这些系统在达到给定速度("输入尺寸")所需的燃料关系("资源使用")方面具有不同的缩放特性.

为什么这个这么重要？因为软件处理的问题可能因数据大小不同而有所不同.考虑一下.前往月球所需的速度与人类行走速度之间的比率小于10,000:1,与软件可能面临的输入尺寸范围相比,这是非常小的.而且由于软件可能面临输入大小的天文范围,因此算法的Big O复杂性可能会超越任何实现细节,这是基本的扩展性质.

考虑规范排序示例.冒泡排序为O(n ²),而合并排序为O(n log n).假设您有两个排序应用程序,即使用冒泡排序的应用程序A和使用合并排序的应用程序B,并且假设对于大约30个元素的输入大小,应用程序A在排序时比应用程序B快1,000倍.如果您不必排序超过30个元素,那么您应该更喜欢应用程序A,因为它在这些输入大小上要快得多.但是,如果您发现可能需要对一千万个项目进行排序,那么您所期望的是,在这种情况下,应用程序B实际上最终比应用程序A快数千倍,这完全取决于每种算法的扩展方式.

9> Andrew Prock..：

这是在解释Big-O的常见变种时我倾向于使用的普通英语动物

在所有情况下,更喜欢列表中较高的算法到列表中较低的算法.但是,迁移到更昂贵的复杂性类别的成本差别很大.

O(1):

没有增长.无论问题有多大,您都可以在相同的时间内解决问题.这有点类似于广播,其中在给定距离上广播需要相同的能量,而不管广播范围内的人数.

O(log n):

这种复杂性与O(1)相同,只是它稍微差一点.出于所有实际目的,您可以将其视为非常大的常量缩放.处理1千到10亿件物品之间的工作差异只是因素六.

O(n):

解决问题的成本与问题的大小成正比.如果您的问题规模增加一倍,那么解决方案的成本会翻倍.由于大多数问题必须以某种方式扫描到计算机中,如数据输入,磁盘读取或网络流量,这通常是一个负担得起的缩放因子.

O(n log n):

这种复杂性与O(n)非常相似.出于所有实际目的,这两者是等价的.这种复杂程度通常仍被认为是可扩展的.通过调整假设,可以将一些O(n log n)算法转换为O(n)算法.例如,限制键的大小会减少从O(n log n)到O(n)的排序.

O(n ²):

生长为正方形,其中n是正方形边长.这与"网络效应"的增长率相同,网络中的每个人都可能知道网络中的其他人.增长是昂贵的.大多数可扩展的解决方案不能使用具有这种复杂程度的算法,而无需进行重要的体操.这通常适用于所有其他多项式复杂度 - O(n ^k) - .

O(2 ⁿ):

不规模.你没有希望解决任何非平凡的问题.有助于知道要避免什么,以及专家找到O(n ^k)中的近似算法.

你能否考虑一下O(1)的另一个类比？我的工程师想要讨论由于障碍物造成的射频阻抗.

我确实使用了"有点"这个词.

10> Brownie..：

Big O是算法相对于其输入大小使用的时间/空间的度量.

如果算法是O(n),那么时间/空间将以与其输入相同的速率增加.

如果算法是O(n ²),则时间/空间以其输入平方的速率增加.

等等.

我一直相信它可以是时间或空间.但不是两个同时.

复杂性绝对可以是空间.看看这个:http://en.wikipedia.org/wiki/PSPACE

这个答案是这里最"平淡"的答案.以前的人实际上假设读者已经足够了解他们,但作家并不知道.他们认为他们很简单,但绝对不是.用漂亮的格式编写大量文本并制作难以为非CS人员制作的花哨的人工例子并不简单明了,对于大多数CS人来说非常有吸引力.用简单的英语解释CS术语根本不需要代码和数学.这个答案的+1虽然还不够好.

这不是空间.这是关于复杂性,这意味着时间.

11> James Oravec..：

什么是Big O的简单英语解释？尽可能少的正式定义和简单的数学.

关于Big-O符号需要的简单英语解释:

当我们编程时,我们正试图解决问题.我们编码的是一种算法.Big O表示法允许我们以标准化方式比较算法的最差情况.硬件规格随时间而变化,硬件的改进可以减少算法运行所需的时间.但是替换硬件并不意味着我们的算法随着时间的推移会更好或改进,因为我们的算法仍然相同.因此,为了让我们比较不同的算法,以确定一个是否更好,我们使用Big O表示法.