我正在寻找最快的方法来确定一个long
值是否是一个完美的正方形(即它的平方根是另一个整数):
我通过使用内置Math.sqrt()
函数以简单的方式完成了它,但我想知道是否有办法通过将自己限制为仅整数域来更快地完成它.
维护查找表是不切实际的(因为大约有2 31.5个整数,其平方小于2 63).
这是我现在正在做的非常简单直接的方式:
public final static boolean isPerfectSquare(long n) { if (n < 0) return false; long tst = (long)(Math.sqrt(n) + 0.5); return tst*tst == n; }
注意:我在许多Project Euler问题中使用此函数.因此,没有其他人必须维护此代码.而这种微优化实际上可以产生影响,因为部分挑战是在不到一分钟的时间内完成每个算法,并且在某些问题中需要将此函数调用数百万次.
我尝试过不同的问题解决方案:
经过详尽的测试后,我发现0.5
不需要添加Math.sqrt()的结果,至少在我的机器上没有.
的快速逆平方根增快,但它给了不正确的结果对于n> = 410881.然而,所建议BobbyShaftoe,我们可以使用FISR劈对于n <410881.
牛顿的方法慢了一点Math.sqrt()
.这可能是因为Math.sqrt()
使用类似牛顿方法的东西,但在硬件中实现,因此它比Java快得多.此外,牛顿的方法仍然需要使用双打.
一个修改过的牛顿方法,它使用了一些技巧,只涉及整数数学,需要一些黑客来避免溢出(我希望这个函数适用于所有正64位有符号整数),并且它仍然比它慢Math.sqrt()
.
二进制斩甚至更慢.这是有道理的,因为二进制斩波平均需要16遍才能找到64位数的平方根.
根据John的测试,or
在C++中使用语句比使用语句更快switch
,但在Java和C#中,or
和之间似乎没有区别switch
.
我还尝试制作一个查找表(作为64个布尔值的私有静态数组).然后or
我会说,而不是开关或声明if(lookup[(int)(n&0x3F)]) { test } else return false;
.令我惊讶的是,这只是(稍微)慢了.这是因为在Java中检查了数组边界.
A. Rex.. 701
我发现一种方法比你的6bits + Carmack + sqrt代码快〜35%,至少用我的CPU(x86)和编程语言(C/C++).您的结果可能会有所不同,尤其是因为我不知道Java因素将如何发挥作用.
我的方法有三个方面:
首先,筛选出明显的答案.这包括负数并查看最后4位.(我发现看到最后六个没有帮助.)我也回答是0.(在阅读下面的代码时,请注意我的输入是int64 x
.)
if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) return false; if( x == 0 ) return true;
接下来,检查它是否是方模255 = 3*5*17.因为这是三个不同质数的乘积,所以只有大约1/8的mod 255残差是正方形.但是,根据我的经验,调用模运算符(%)的成本高于获得的效益,因此我使用涉及255 = 2 ^ 8-1的位技巧来计算残差.(无论好坏,我没有使用从单词中读取单个字节的技巧,只是按位 - 并且移位.)
int64 y = x; y = (y & 4294967295LL) + (y >> 32); y = (y & 65535) + (y >> 16); y = (y & 255) + ((y >> 8) & 255) + (y >> 16); // At this point, y is between 0 and 511. More code can reduce it farther.要实际检查残差是否为正方形,我在预先计算的表格中查找答案.
if( bad255[y] ) return false; // However, I just use a table of size 512
最后,尝试使用类似于Hensel引理的方法计算平方根.(我不认为它直接适用,但它可以进行一些修改.)在这之前,我用二分搜索来划分2的所有幂:
if((x & 4294967295LL) == 0) x >>= 32; if((x & 65535) == 0) x >>= 16; if((x & 255) == 0) x >>= 8; if((x & 15) == 0) x >>= 4; if((x & 3) == 0) x >>= 2;此时,对于我们的数字为正方形,它必须是1 mod 8.
if((x & 7) != 1) return false;Hensel引理的基本结构如下.(注意:未经测试的代码;如果不起作用,请尝试t = 2或8.)
int64 t = 4, r = 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; // Repeat until t is 2^33 or so. Use a loop if you want.我们的想法是,在每次迭代中,将一位添加到r,即x的"当前"平方根; 每个平方根的精确模数为2的越来越大的幂,即t/2.最后,r和t/2-r将是x modulo t/2的平方根.(注意,如果r是x的平方根,那么-r也是如此.即使是模数也是如此,但要注意,以某些数为模,事物甚至可以有2个以上的平方根;值得注意的是,这包括2的幂. )因为我们的实际平方根小于2 ^ 32,所以我们实际上可以检查r或t/2-r是否是真正的平方根.在我的实际代码中,我使用以下修改循环:
int64 r, t, z; r = start[(x >> 3) & 1023]; do { z = x - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1LL << 33) );这里的加速以三种方式获得:预先计算的起始值(相当于循环的~10次迭代),循环的早期退出以及跳过一些t值.对于最后一部分,我看一下
z = r - x * x
,并将t设置为2分割z的最大功率.这允许我跳过不会影响r值的t值.在我的情况下,预先计算的起始值选出"最小正"平方根模8192.
即使这段代码对你来说效率不高,我希望你喜欢它包含的一些想法.随后是完整的,经过测试的代码,包括预先计算的表格.
typedef signed long long int int64; int start[1024] = {1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11, 1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203, 129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395, 1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587, 257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779, 1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971, 385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163, 1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355, 513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547, 1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739, 641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931, 1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973, 769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781, 1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589, 897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397, 1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205, 1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013, 959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821, 1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629, 831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437, 1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245, 703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53, 1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139, 575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331, 1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523, 447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715, 1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907, 319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099, 1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291, 191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483, 1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675, 63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867, 2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037, 65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845, 1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653, 193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461, 1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269, 321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077, 1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885, 449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693, 1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501, 577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309, 1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117, 705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75, 1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267, 833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459, 1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651, 961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843, 1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035, 1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227, 895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419, 1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611, 767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803, 1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995, 639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909, 1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717, 511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525, 1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333, 383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141, 1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949, 255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757, 1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565, 127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373, 1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181}; bool bad255[512] = {0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, 0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, 0,0}; inline bool square( int64 x ) { // Quickfail if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) return false; if( x == 0 ) return true; // Check mod 255 = 3 * 5 * 17, for fun int64 y = x; y = (y & 4294967295LL) + (y >> 32); y = (y & 65535) + (y >> 16); y = (y & 255) + ((y >> 8) & 255) + (y >> 16); if( bad255[y] ) return false; // Divide out powers of 4 using binary search if((x & 4294967295LL) == 0) x >>= 32; if((x & 65535) == 0) x >>= 16; if((x & 255) == 0) x >>= 8; if((x & 15) == 0) x >>= 4; if((x & 3) == 0) x >>= 2; if((x & 7) != 1) return false; // Compute sqrt using something like Hensel's lemma int64 r, t, z; r = start[(x >> 3) & 1023]; do { z = x - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1LL << 33) ); return false; }
哇,这很美.我之前看到Hensel解除了(计算多项式模数为素数的根)但我甚至没有意识到引理可以一直小心地降低以计算数字的平方根; 这是......令人振奋:) (77认同)
Maartinus发布[快2倍的解决方案](http://stackoverflow.com/a/18686659/616460)(并且更短)下面,稍后,似乎没有得到太多的爱. (49认同)
哇!我将尝试将其转换为Java并进行比较,以及对结果进行准确性检查.我会让你知道我发现了什么. (5认同)
@nightcracker没有.`9 <0 => false`,`9&2 => 0`,`9&7 == 5 => false`,`9&11 == 8 => false`. (3认同)
通过滤除明显的正方形,似乎可以获得不同解决方案中的许多速度优势.有没有人通过Maartinus的解决方案对过滤掉的情况进行基准测试,然后只使用sqrt函数作为内置函数? (3认同)
maaartinus.. 352
我参加派对的时间已经很晚了,但我希望能提供更好的答案; 更短和(假设我的基准测试是正确的)也快得多.
long goodMask; // 0xC840C04048404040 computed below { for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i); } public boolean isSquare(long x) { // This tests if the 6 least significant bits are right. // Moving the to be tested bit to the highest position saves us masking. if (goodMask << x >= 0) return false; final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x); // Each square ends with an even number of zeros. if ((numberOfTrailingZeros & 1) != 0) return false; x >>= numberOfTrailingZeros; // Now x is either 0 or odd. // In binary each odd square ends with 001. // Postpone the sign test until now; handle zero in the branch. if ((x&7) != 1 | x <= 0) return x == 0; // Do it in the classical way. // The correctness is not trivial as the conversion from long to double is lossy! final long tst = (long) Math.sqrt(x); return tst * tst == x; }
第一次测试快速捕获大多数非正方形.它使用一个64项的表包装,因此没有数组访问成本(间接和边界检查).对于均匀随机long
,在此结束的概率为81.25%.
第二个测试捕获了因子分解中具有奇数个二进制数的所有数字.该方法Long.numberOfTrailingZeros
非常快,因为它被JIT编辑为单个i86指令.
在删除尾随零之后,第三个测试以二进制形式处理以011,101或111结尾的数字,这些数字不是完美的正方形.它也关心负数,也处理0.
最终的测试可以追溯到double
算术.由于double
只有53位尾数,转换long
为double
包括舍入为大值.尽管如此,测试是正确的(除非证明是错误的).
试图纳入mod255的想法是不成功的.
我发现一种方法比你的6bits + Carmack + sqrt代码快〜35%,至少用我的CPU(x86)和编程语言(C/C++).您的结果可能会有所不同,尤其是因为我不知道Java因素将如何发挥作用.
我的方法有三个方面:
首先,筛选出明显的答案.这包括负数并查看最后4位.(我发现看到最后六个没有帮助.)我也回答是0.(在阅读下面的代码时,请注意我的输入是int64 x
.)
if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) return false; if( x == 0 ) return true;
接下来,检查它是否是方模255 = 3*5*17.因为这是三个不同质数的乘积,所以只有大约1/8的mod 255残差是正方形.但是,根据我的经验,调用模运算符(%)的成本高于获得的效益,因此我使用涉及255 = 2 ^ 8-1的位技巧来计算残差.(无论好坏,我没有使用从单词中读取单个字节的技巧,只是按位 - 并且移位.)
int64 y = x; y = (y & 4294967295LL) + (y >> 32); y = (y & 65535) + (y >> 16); y = (y & 255) + ((y >> 8) & 255) + (y >> 16); // At this point, y is between 0 and 511. More code can reduce it farther.要实际检查残差是否为正方形,我在预先计算的表格中查找答案.
if( bad255[y] ) return false; // However, I just use a table of size 512
最后,尝试使用类似于Hensel引理的方法计算平方根.(我不认为它直接适用,但它可以进行一些修改.)在这之前,我用二分搜索来划分2的所有幂:
if((x & 4294967295LL) == 0) x >>= 32; if((x & 65535) == 0) x >>= 16; if((x & 255) == 0) x >>= 8; if((x & 15) == 0) x >>= 4; if((x & 3) == 0) x >>= 2;此时,对于我们的数字为正方形,它必须是1 mod 8.
if((x & 7) != 1) return false;Hensel引理的基本结构如下.(注意:未经测试的代码;如果不起作用,请尝试t = 2或8.)
int64 t = 4, r = 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; // Repeat until t is 2^33 or so. Use a loop if you want.我们的想法是,在每次迭代中,将一位添加到r,即x的"当前"平方根; 每个平方根的精确模数为2的越来越大的幂,即t/2.最后,r和t/2-r将是x modulo t/2的平方根.(注意,如果r是x的平方根,那么-r也是如此.即使是模数也是如此,但要注意,以某些数为模,事物甚至可以有2个以上的平方根;值得注意的是,这包括2的幂. )因为我们的实际平方根小于2 ^ 32,所以我们实际上可以检查r或t/2-r是否是真正的平方根.在我的实际代码中,我使用以下修改循环:
int64 r, t, z; r = start[(x >> 3) & 1023]; do { z = x - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1LL << 33) );这里的加速以三种方式获得:预先计算的起始值(相当于循环的~10次迭代),循环的早期退出以及跳过一些t值.对于最后一部分,我看一下
z = r - x * x
,并将t设置为2分割z的最大功率.这允许我跳过不会影响r值的t值.在我的情况下,预先计算的起始值选出"最小正"平方根模8192.
即使这段代码对你来说效率不高,我希望你喜欢它包含的一些想法.随后是完整的,经过测试的代码,包括预先计算的表格.
typedef signed long long int int64; int start[1024] = {1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11, 1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203, 129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395, 1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587, 257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779, 1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971, 385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163, 1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355, 513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547, 1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739, 641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931, 1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973, 769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781, 1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589, 897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397, 1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205, 1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013, 959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821, 1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629, 831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437, 1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245, 703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53, 1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139, 575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331, 1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523, 447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715, 1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907, 319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099, 1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291, 191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483, 1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675, 63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867, 2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037, 65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845, 1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653, 193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461, 1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269, 321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077, 1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885, 449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693, 1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501, 577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309, 1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117, 705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75, 1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267, 833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459, 1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651, 961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843, 1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035, 1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227, 895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419, 1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611, 767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803, 1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995, 639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909, 1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717, 511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525, 1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333, 383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141, 1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949, 255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757, 1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565, 127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373, 1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181}; bool bad255[512] = {0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, 0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, 0,0}; inline bool square( int64 x ) { // Quickfail if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) return false; if( x == 0 ) return true; // Check mod 255 = 3 * 5 * 17, for fun int64 y = x; y = (y & 4294967295LL) + (y >> 32); y = (y & 65535) + (y >> 16); y = (y & 255) + ((y >> 8) & 255) + (y >> 16); if( bad255[y] ) return false; // Divide out powers of 4 using binary search if((x & 4294967295LL) == 0) x >>= 32; if((x & 65535) == 0) x >>= 16; if((x & 255) == 0) x >>= 8; if((x & 15) == 0) x >>= 4; if((x & 3) == 0) x >>= 2; if((x & 7) != 1) return false; // Compute sqrt using something like Hensel's lemma int64 r, t, z; r = start[(x >> 3) & 1023]; do { z = x - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1LL << 33) ); return false; }
我参加派对的时间已经很晚了,但我希望能提供更好的答案; 更短和(假设我的基准测试是正确的)也快得多.
long goodMask; // 0xC840C04048404040 computed below { for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i); } public boolean isSquare(long x) { // This tests if the 6 least significant bits are right. // Moving the to be tested bit to the highest position saves us masking. if (goodMask << x >= 0) return false; final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x); // Each square ends with an even number of zeros. if ((numberOfTrailingZeros & 1) != 0) return false; x >>= numberOfTrailingZeros; // Now x is either 0 or odd. // In binary each odd square ends with 001. // Postpone the sign test until now; handle zero in the branch. if ((x&7) != 1 | x <= 0) return x == 0; // Do it in the classical way. // The correctness is not trivial as the conversion from long to double is lossy! final long tst = (long) Math.sqrt(x); return tst * tst == x; }
第一次测试快速捕获大多数非正方形.它使用一个64项的表包装,因此没有数组访问成本(间接和边界检查).对于均匀随机long
,在此结束的概率为81.25%.
第二个测试捕获了因子分解中具有奇数个二进制数的所有数字.该方法Long.numberOfTrailingZeros
非常快,因为它被JIT编辑为单个i86指令.
在删除尾随零之后,第三个测试以二进制形式处理以011,101或111结尾的数字,这些数字不是完美的正方形.它也关心负数,也处理0.
最终的测试可以追溯到double
算术.由于double
只有53位尾数,转换long
为double
包括舍入为大值.尽管如此,测试是正确的(除非证明是错误的).
试图纳入mod255的想法是不成功的.
你必须做一些基准测试.最佳算法取决于输入的分布.
您的算法可能几乎是最优的,但您可能希望在调用平方根例程之前快速检查以排除某些可能性.例如,通过逐位"和"来查看十六进制数字的最后一位数字.完美的正方形只能在基数16中以0,1,4或9结束.因此,对于75%的输入(假设它们是均匀分布的),您可以避免调用平方根来换取一些非常快速的比特.
Kip对实现hex技巧的以下代码进行了基准测试.测试数字1到100,000,000时,此代码的运行速度是原始代码的两倍.
public final static boolean isPerfectSquare(long n) { if (n < 0) return false; switch((int)(n & 0xF)) { case 0: case 1: case 4: case 9: long tst = (long)Math.sqrt(n); return tst*tst == n; default: return false; } }
当我在C++中测试类似代码时,它实际上比原来运行得慢.但是,当我删除switch语句时,十六进制技巧再次使代码快两倍.
int isPerfectSquare(int n) { int h = n & 0xF; // h is the last hex "digit" if (h > 9) return 0; // Use lazy evaluation to jump out of the if statement as soon as possible if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8) { int t = (int) floor( sqrt((double) n) + 0.5 ); return t*t == n; } return 0; }
消除switch语句对C#代码几乎没有影响.
我正在思考我在数值分析课程中度过的可怕时光.
然后我记得,这个函数围绕着来自Quake源代码的网络:
float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating point bit level hacking i = 0x5f3759df - ( i >> 1 ); // wtf? y = * ( float * ) &i; y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed #ifndef Q3_VM #ifdef __linux__ assert( !isnan(y) ); // bk010122 - FPE? #endif #endif return y; }
这基本上使用牛顿近似函数计算平方根(不能记住确切的名称).
它应该是可用的,甚至可能更快,它来自一个非凡的id软件的游戏!
它是用C++编写的,但是一旦你理解了它,在Java中重用相同的技术应该不会太难:
我最初发现于:http://www.codemaestro.com/reviews/9
牛顿的方法在维基百科上解释:http://en.wikipedia.org/wiki/Newton%27s_method
您可以按照链接获取有关其工作原理的更多说明,但如果您不太在意,那么这大致是我在阅读博客和参加数值分析课程时所记得的:
的* (long*) &y
基本上是一个快速转换到长功能,所以整数运算可以在原始字节来施加.
该0x5f3759df - (i >> 1);
行是近似函数的预先计算的种子值.
该* (float*) &i
值转换回浮点.
该y = y * ( threehalfs - ( x2 * y * y ) )
行基本上再次迭代函数的值.
在函数迭代结果时,近似函数会给出更精确的值.在Quake的情况下,一次迭代"足够好",但如果它不适合你......那么你可以根据需要添加尽可能多的迭代.
这应该更快,因为它减少了在朴素平方根中进行的除法运算的数量,直到简单除以2(实际上是* 0.5F
乘法运算),而是用几个固定数量的乘法运算代替它.
我不确定它是否会更快,甚至更准确,但你可以使用John Carmack的魔法平方根算法来更快地求解平方根.您可以轻松地测试所有可能的32位整数,并验证您实际上得到了正确的结果,因为它只是一个适应性.然而,现在我考虑一下,使用双打也是近似的,所以我不确定它会如何发挥作用.
如果你做一个二进制印章试图找到"正确的"平方根,你可以很容易地检测你得到的值是否足够接近告诉:
(n+1)^2 = n^2 + 2n + 1 (n-1)^2 = n^2 - 2n + 1
所以计算后n^2
,选项是:
n^2 = target
:完成,返回true
n^2 + 2n + 1 > target > n^2
:你很亲密,但并不完美:回归虚假
n^2 - 2n + 1 < target < n^2
:同上
target < n^2 - 2n + 1
:二进制剁低 n
target > n^2 + 2n + 1
:二进制剁更高 n
(对不起,这n
用作您当前的猜测,以及target
参数.为混淆道歉!)
我不知道这是否会更快,但值得一试.
编辑:二进制印章也不必考虑整个整数范围(2^x)^2 = 2^(2x)
,所以一旦你找到目标中的顶部设置位(这可以通过一个有点蠢的技巧完成;我忘记了如何)您可以快速获得一系列潜在答案.请注意,一个天真的二进制文章仍然只需要31或32次迭代.
我对这个线程中的几个算法进行了自己的分析,并得出了一些新的结果.你可以在这个答案的编辑历史中看到那些旧的结果,但它们不准确,因为我犯了一个错误,浪费时间分析几个不接近的算法.然而,从几个不同的答案中吸取教训,我现在有两种算法可以粉碎这个线程的"赢家".这是我做的核心事情与其他人不同:
// This is faster because a number is divisible by 2^4 or more only 6% of the time // and more than that a vanishingly small percentage. while((x & 0x3) == 0) x >>= 2; // This is effectively the same as the switch-case statement used in the original // answer. if((x & 0x7) != 1) return false;
但是,这个简单的行(大多数时候会添加一个或两个非常快的指令)大大简化了switch-case
语句到一个if语句.但是,如果许多测试数字具有显着的两个幂因子,它可以添加到运行时.
以下算法如下:
互联网 - 基普发布的答案
Durron - 我使用一遍答案作为基础的修改后的答案
DurronTwo - 我使用双遍答案(@JohnnyHeggheim)修改了答案,还有其他一些细微的修改.
如果使用生成数字,这是一个示例运行时 Math.abs(java.util.Random.nextLong())
0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials benchmark us linear runtime Internet 39.7 ============================== Durron 37.8 ============================ DurronTwo 36.0 =========================== vm: java trial: 0
如果它仅在第一百万个长度上运行,那么这是一个示例运行时:
0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials benchmark ms linear runtime Internet 2.93 =========================== Durron 2.24 ===================== DurronTwo 3.16 ============================== vm: java trial: 0
正如你所看到的,DurronTwo
对大型输入做得更好,因为它经常使用魔术技巧,但与第一种算法相比会受到破坏,Math.sqrt
因为数字要小得多.与此同时,更简单的Durron
是一个巨大的赢家,因为它在第一百万个数字中绝不会多次划分4次.
这是Durron
:
public final static boolean isPerfectSquareDurron(long n) { if(n < 0) return false; if(n == 0) return true; long x = n; // This is faster because a number is divisible by 16 only 6% of the time // and more than that a vanishingly small percentage. while((x & 0x3) == 0) x >>= 2; // This is effectively the same as the switch-case statement used in the original // answer. if((x & 0x7) == 1) { long sqrt; if(x < 410881L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); i = 0x5f3759df - ( i >> 1 ); y = Float.intBitsToFloat(i); y = y * ( 1.5F - ( x2 * y * y ) ); sqrt = (long)(1.0F/y); } else { sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }
和 DurronTwo
public final static boolean isPerfectSquareDurronTwo(long n) { if(n < 0) return false; // Needed to prevent infinite loop if(n == 0) return true; long x = n; while((x & 0x3) == 0) x >>= 2; if((x & 0x7) == 1) { long sqrt; if (x < 41529141369L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); //using the magic number from //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf //since it more accurate i = 0x5f375a86 - (i >> 1); y = Float.intBitsToFloat(i); y = y * (1.5F - (x2 * y * y)); y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate sqrt = (long) ((1.0F/y) + 0.2); } else { //Carmack hack gives incorrect answer for n >= 41529141369. sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }
我的基准测试工具:(需要谷歌卡尺0.1-rc5)
public class SquareRootBenchmark { public static class Benchmark1 extends SimpleBenchmark { private static final int ARRAY_SIZE = 10000; long[] trials = new long[ARRAY_SIZE]; @Override protected void setUp() throws Exception { Random r = new Random(); for (int i = 0; i < ARRAY_SIZE; i++) { trials[i] = Math.abs(r.nextLong()); } } public int timeInternet(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++; } } return trues; } public int timeDurron(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++; } } return trues; } public int timeDurronTwo(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++; } } return trues; } } public static void main(String... args) { Runner.main(Benchmark1.class, args); } }
更新:我已经制作了一个新算法,在某些情况下更快,在其他情况下更慢,我根据不同的输入得到了不同的基准.如果我们计算模数0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241
,我们可以消除不能为正方形的97.82%的数字.这可以(一种)在一行中完成,具有5个按位操作:
if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
得到的指数是1)残基,2)残基+ 0xFFFFFF
,或3)残基+ 0x1FFFFFE
.当然,我们需要有一个模拟0xFFFFFF
残差的查找表,它大约是一个3mb文件(在这种情况下存储为ascii文本十进制数字,不是最佳的,但可以用a ByteBuffer
等清楚地改进.但是因为那是预先计算它不会'非常重要.你可以在这里找到文件(或自己生成):
public final static boolean isPerfectSquareDurronThree(long n) { if(n < 0) return false; if(n == 0) return true; long x = n; while((x & 0x3) == 0) x >>= 2; if((x & 0x7) == 1) { if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false; long sqrt; if(x < 410881L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); i = 0x5f3759df - ( i >> 1 ); y = Float.intBitsToFloat(i); y = y * ( 1.5F - ( x2 * y * y ) ); sqrt = (long)(1.0F/y); } else { sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }
我将它加载到这样的boolean
数组中:
private static boolean[] goodLookupSquares = null; public static void initGoodLookupSquares() throws Exception { Scanner s = new Scanner(new File("24residues_squares.txt")); goodLookupSquares = new boolean[0x1FFFFFE]; while(s.hasNextLine()) { int residue = Integer.valueOf(s.nextLine()); goodLookupSquares[residue] = true; goodLookupSquares[residue + 0xFFFFFF] = true; goodLookupSquares[residue + 0x1FFFFFE] = true; } s.close(); }
示例运行时.它Durron
在我跑的每次试验中都击败了(第一版).
0% Scenario{vm=java, trial=0, benchmark=Internet} 40665.77 ns; ?=566.71 ns @ 10 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 38397.60 ns; ?=784.30 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronThree} 36171.46 ns; ?=693.02 ns @ 10 trials benchmark us linear runtime Internet 40.7 ============================== Durron 38.4 ============================ DurronThree 36.2 ========================== vm: java trial: 0
使用牛顿方法计算整数平方根应该快得多,然后将此数字平方并检查,就像在当前解决方案中一样.牛顿方法是其他一些答案中提到的Carmack解决方案的基础.您应该能够获得更快的答案,因为您只对根的整数部分感兴趣,允许您更快地停止近似算法.
您可以尝试的另一个优化:如果数字的数字根不是以1,4,7 或9结尾,则该数字不是完美的正方形.在应用较慢的平方根算法之前,这可以用作消除60%输入的快速方法.
我希望此函数能够与所有正64位有符号整数一起使用
Math.sqrt()
使用双精度作为输入参数,因此对于大于2 ^ 53的整数,您将无法获得准确的结果.
仅供记录,另一种方法是使用素数分解.如果分解的每个因子都是偶数,则该数字是完美的平方.所以你想要的是看一个数字是否可以作为素数平方的乘积进行分解.当然,您不需要获得这样的分解,只是为了查看它是否存在.
首先建立一个低于2 ^ 32的素数平方表.这远远小于达到此限制的所有整数的表.
解决方案就是这样的:
boolean isPerfectSquare(long number) { if (number < 0) return false; if (number < 2) return true; for (int i = 0; ; i++) { long square = squareTable[i]; if (square > number) return false; while (number % square == 0) { number /= square; } if (number == 1) return true; } }
我想这有点神秘.它的作用是在每一步中检查素数的平方除以输入数.如果确实如此,则只要可能就将数字除以平方,从主要分解中移除该正方形.如果通过这个过程,我们得到1,那么输入数是素数平方的分解.如果正方形变得比数字本身大,那么这个正方形或任何更大的正方形都不能分割它,因此该数字不能是素数平方的分解.
鉴于目前在硬件中完成的sqrt以及在这里计算素数的需要,我想这个解决方案要慢一些.但是它应该比使用sqrt的解决方案提供更好的结果,这不会超过2 ^ 54,正如mrzl在他的回答中所说的那样.
整数问题值得整数解决方案.从而
对(非负)整数进行二进制搜索以找到最大整数t t**2 <= n
.然后测试是否r**2 = n
完全.这需要时间O(log n).
如果您不知道如何二进制搜索正整数,因为该集合是无界的,这很容易.你首先计算你的增加函数f(上面f(t) = t**2 - n
)的2的幂.当你看到它变成正面时,你就找到了一个上限.然后你可以做标准的二进制搜索.
有人指出,d
完美正方形的最后几位只能取某些值.d
数字的最后数字(以基数表示b
)与除以n
时的余数相同,即.用C表示法.n
b
d
n % pow(b, d)
这可以推广到任何模数m
,即.n % m
可以用来排除一些百分比的数字来自完美的正方形.您当前使用的模数为64,即12,即.19%的剩余部分,尽可能为正方形.通过一点点编码,我发现了模数110880,它只允许2016年,即.1.8%的剩余可能是正方形.因此,根据模数运算(即除法)的成本和表查找与机器上的平方根,使用此模数可能会更快.
顺便说一下,如果Java有一种方法来存储查找表的打包位数组,请不要使用它.110880这些天32位字的RAM并不多,而取一个机器字比取一个位要快.
对于性能,您经常需要做一些compromsies.其他人已经表达了各种方法,但是,你注意到Carmack的黑客攻击速度要快到N的某些值.然后,你应该检查"n",如果它小于N,请使用Carmack的黑客,否则使用其他方法描述在这里的答案.
以下对maaartinus解决方案的简化似乎可以减少运行时间的几个百分点,但我在基准测试方面做得不够好,无法产生我可以信任的基准:
long goodMask; // 0xC840C04048404040 computed below { for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i); } public boolean isSquare(long x) { // This tests if the 6 least significant bits are right. // Moving the to be tested bit to the highest position saves us masking. if (goodMask << x >= 0) return false; // Remove an even number of trailing zeros, leaving at most one. x >>= (Long.numberOfTrailingZeros(x) & (-2); // Repeat the test on the 6 least significant remaining bits. if (goodMask << x >= 0 | x <= 0) return x == 0; // Do it in the classical way. // The correctness is not trivial as the conversion from long to double is lossy! final long tst = (long) Math.sqrt(x); return tst * tst == x; }
值得检查省略第一次测试的方法,
if (goodMask << x >= 0) return false;
会影响表现.
这是我能够提出的最快的Java实现,使用此线程中其他人建议的技术组合.
Mod-256测试
不精确的mod-3465测试(以一些误报为代价避免整数除法)
浮点平方根,舍入并与输入值进行比较
我也尝试过这些修改,但它们没有帮助提高性能:
额外的mod-255测试
将输入值除以4的幂
快速反向平方根(对于N的高值,它需要3次迭代,足以使它比硬件平方根函数慢.)
public class SquareTester { public static boolean isPerfectSquare(long n) { if (n < 0) { return false; } else { switch ((byte) n) { case -128: case -127: case -124: case -119: case -112: case -111: case -103: case -95: case -92: case -87: case -79: case -71: case -64: case -63: case -60: case -55: case -47: case -39: case -31: case -28: case -23: case -15: case -7: case 0: case 1: case 4: case 9: case 16: case 17: case 25: case 33: case 36: case 41: case 49: case 57: case 64: case 65: case 68: case 73: case 81: case 89: case 97: case 100: case 105: case 113: case 121: long i = (n * INV3465) >>> 52; if (! good3465[(int) i]) { return false; } else { long r = round(Math.sqrt(n)); return r*r == n; } default: return false; } } } private static int round(double x) { return (int) Double.doubleToRawLongBits(x + (double) (1L << 52)); } /** 3465-1 modulo 264 */ private static final long INV3465 = 0x8ffed161732e78b9L; private static final boolean[] good3465 = new boolean[0x1000]; static { for (int r = 0; r < 3465; ++ r) { int i = (int) ((r * r * INV3465) >>> 52); good3465[i] = good3465[i+1] = true; } } }
你应该从一开始就摆脱N的2次幂部分.
第二编辑 下面m的神奇表达应该是
m = N - (N & (N-1));
而不是书面的
第二次编辑结束
m = N & (N-1); // the lawest bit of N N /= m; byte = N & 0x0F; if ((m % 2) || (byte !=1 && byte !=9)) return false;
第一编辑:
小改进:
m = N & (N-1); // the lawest bit of N N /= m; if ((m % 2) || (N & 0x07 != 1)) return false;
第1次编辑结束
现在像往常一样继续.这样,当你到达浮点部分时,你已经摆脱了2个幂部分为奇数(大约一半)的所有数字,然后你只考虑剩下1/8的数字.即你在6%的数字上运行浮点部分.
标签中提到了Euler项目,其中的许多问题都需要检查数字>> 2 ^ 64.使用80字节缓冲区时,上面提到的大多数优化都不起作用.
我使用了java BigInteger和一个稍微修改过的牛顿方法,一个对整数有效的方法.问题是精确的正方形n ^ 2收敛到(n-1)而不是n,因为n ^ 2-1 =(n-1)(n + 1),最后的误差只比最终除数和步数低一步算法终止.在计算错误之前,通过在原始参数中添加一个来轻松修复.(为立方根添加两个,等等)
这个算法的一个很好的属性是你可以立即判断数字是否是一个完美的正方形 - 牛顿方法中的最终误差(不是校正)将为零.通过简单的修改,您还可以快速计算楼层(sqrt(x))而不是最接近的整数.这对于几个欧拉问题很方便.
这是旧的Marchant计算器算法的十进制到二进制的返工(对不起,我没有参考),在Ruby中,专门针对这个问题进行了改编:
def isexactsqrt(v) value = v.abs residue = value root = 0 onebit = 1 onebit <<= 8 while (onebit < residue) onebit >>= 2 while (onebit > residue) while (onebit > 0) x = root + onebit if (residue >= x) then residue -= x root = x + onebit end root >>= 1 onebit >>= 2 end return (residue == 0) end
这是类似的一些测试(请不要因为编码风格/气味或笨重的O/O而投票给我 - 这是重要的算法,而C++不是我的母语).在这种情况下,我们正在寻找残留== 0:
#includeusing namespace std; typedef unsigned long long int llint; class ISqrt { // Integer Square Root llint value; // Integer whose square root is required llint root; // Result: floor(sqrt(value)) llint residue; // Result: value-root*root llint onebit, x; // Working bit, working value public: ISqrt(llint v = 2) { // Constructor Root(v); // Take the root }; llint Root(llint r) { // Resets and calculates new square root value = r; // Store input residue = value; // Initialise for subtracting down root = 0; // Clear root accumulator onebit = 1; // Calculate start value of counter onebit <<= (8*sizeof(llint)-2); // Set up counter bit as greatest odd power of 2 while (onebit > residue) {onebit >>= 2; }; // Shift down until just < value while (onebit > 0) { x = root ^ onebit; // Will check root+1bit (root bit corresponding to onebit is always zero) if (residue >= x) { // Room to subtract? residue -= x; // Yes - deduct from residue root = x + onebit; // and step root }; root >>= 1; onebit >>= 2; }; return root; }; llint Residue() { // Returns residue from last calculation return residue; }; }; int main() { llint big, i, q, r, v, delta; big = 0; big = (big-1); // Kludge for "big number" ISqrt b; // Make q sqrt generator for ( i = big; i > 0 ; i /= 7 ) { // for several numbers q = b.Root(i); // Get the square root r = b.Residue(); // Get the residue v = q*q+r; // Recalc original value delta = v-i; // And diff, hopefully 0 cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n"; }; return 0; };
正如已经提到的那样,sqrt调用并不完全准确,但它的有趣和有启发性的是它不会在速度方面打破其他答案.毕竟,sqrt的汇编语言指令序列很小.英特尔有一个硬件指令,我不相信它,因为它不符合IEEE.
那么为什么它会变慢?因为Java实际上是通过JNI调用C例程,所以实际上这比调用Java子例程要慢,后者本身比内联更慢.这非常烦人,Java应该提出一个更好的解决方案,即在必要时构建浮点库调用.那好吧.
在C++中,我怀疑所有复杂的替代方案都会失去速度,但我还没有检查过它们.我做了什么,以及Java人员会发现什么有用,是一个简单的黑客攻击,是A. Rex建议的特殊案例测试的扩展.使用单个long值作为位数组,但不检查边界.这样,你有64位布尔查找.
typedef unsigned long long UVLONG UVLONG pp1,pp2; void init2() { for (int i = 0; i < 64; i++) { for (int j = 0; j < 64; j++) if (isPerfectSquare(i * 64 + j)) { pp1 |= (1 << j); pp2 |= (1 << i); break; } } cout << "pp1=" << pp1 << "," << pp2 << "\n"; } inline bool isPerfectSquare5(UVLONG x) { return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false; }
例程isPerfectSquare5在我的core2 duo机器上运行大约1/3.我怀疑沿着相同的线路进一步调整可以平均缩短时间,但每次检查时,你都会进行更多的测试以获得更多的消除,所以你不能在这条路上走得更远.
当然,您可以以相同的方式检查高6位,而不是单独测试负数.
请注意,我正在做的就是消除可能的正方形,但是当我有一个潜在的情况时,我必须调用原始的内联isPerfectSquare.
调用init2例程一次以初始化pp1和pp2的静态值.请注意,在我的C++实现中,我使用的是unsigned long long,所以既然你已经签名了,你就必须使用>>>运算符.
没有必要对数组进行边界检查,但是Java的优化器必须很快地解决这个问题,所以我不会因此而责怪它们.
我喜欢在一些输入上使用几乎正确的方法的想法.这是一个具有更高"偏移量"的版本.代码似乎工作并通过我的简单测试用例.
只需更换你的:
if(n < 410881L){...}
代码与这一个:
if (n < 11043908100L) { //John Carmack hack, converted to Java. // See: http://www.codemaestro.com/reviews/9 int i; float x2, y; x2 = n * 0.5F; y = n; i = Float.floatToRawIntBits(y); //using the magic number from //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf //since it more accurate i = 0x5f375a86 - (i >> 1); y = Float.intBitsToFloat(i); y = y * (1.5F - (x2 * y * y)); y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate sqrt = Math.round(1.0F / y); } else { //Carmack hack gives incorrect answer for n >= 11043908100. sqrt = (long) Math.sqrt(n); }
考虑到一般的位长(尽管我在这里使用了特定的类型),我试图设计简单的算法,如下所示.最初需要简单明了地检查0,1,2或<0.以下是简单的意义,它不会尝试使用任何现有的数学函数.大多数运算符可以用逐位运算符替换.我没有测试过任何基准测试数据.我不是数学或计算机算法设计的专家,我很乐意看到你指出问题.我知道那里有很多改进机会.
int main() { unsigned int c1=0 ,c2 = 0; unsigned int x = 0; unsigned int p = 0; int k1 = 0; scanf("%d",&p); if(p % 2 == 0) { x = p/2; } else { x = (p/2) +1; } while(x) { if((x*x) > p) { c1 = x; x = x/2; }else { c2 = x; break; } } if((p%2) != 0) c2++; while(c2 < c1) { if((c2 * c2 ) == p) { k1 = 1; break; } c2++; } if(k1) printf("\n Perfect square for %d", c2); else printf("\n Not perfect but nearest to :%d :", c2); return 0; }
当观察到正方形的最后n位时,我检查了所有可能的结果.通过连续检查更多位,可以消除多达5/6的输入.我实际上是设计这个来实现Fermat的分解算法,并且它非常快.
public static boolean isSquare(final long val) { if ((val & 2) == 2 || (val & 7) == 5) { return false; } if ((val & 11) == 8 || (val & 31) == 20) { return false; } if ((val & 47) == 32 || (val & 127) == 80) { return false; } if ((val & 191) == 128 || (val & 511) == 320) { return false; } // if((val & a == b) || (val & c == d){ // return false; // } if (!modSq[(int) (val % modSq.length)]) { return false; } final long root = (long) Math.sqrt(val); return root * root == val; }
伪代码的最后一位可用于扩展测试以消除更多值.上述测试适用于k = 0,1,2,3
a的形式为(3 << 2k) - 1
b的形式(2 << 2k)
c的形式为(2 << 2k + 2) - 1
d的形式为(2 << 2k - 1)*10
它首先测试它是否具有功率模数为2的平方残差,然后根据最终模数进行测试,然后使用Math.sqrt进行最终测试.我从最高职位上提出了这个想法,并尝试对其进行扩展.我感谢任何意见或建议.
更新:使用模数,(modSq)和模数基数44352进行测试,我的测试在OP更新中的96%的时间内运行,数字最多为1,000,000,000.