在之前关于格式化double[][]
为CSV格式的问题中,Marc Gravell 表示使用StringBuilder
将比使用更快String.Join
.这是真的?
简答:这取决于.
答案很长:如果你已经有一个字符串数组连接在一起(用分隔符),这String.Join
是最快的方法.
String.Join
可以查看所有字符串以计算出所需的确切长度,然后再次复制并复制所有数据.这意味着不会涉及额外的复制.该唯一的缺点是,它要经过串的两倍,这意味着潜在吹内存缓存更多的时间比必要的.
如果您事先没有将字符串作为数组,那么它的使用速度可能会更快StringBuilder
- 但是会出现这种情况.如果使用StringBuilder
手段做很多很多副本,那么构建一个数组然后调用String.Join
可能会更快.
编辑:这是String.Join
对一连串调用的一次调用StringBuilder.Append
.在最初的问题中,我们有两个不同级别的String.Join
调用,因此每个嵌套调用都会创建一个中间字符串.换句话说,它更复杂,更难以猜测.我会惊讶地发现,与典型数据相比,(在复杂性方面)要么"获胜".
编辑:当我在家时,我会写一个尽可能痛苦的基准StringBuilder
.基本上,如果你有一个数组,其中每个元素大小是前一个元素的两倍,并且你得到它恰到好处,你应该能够强制复制每个追加(元素,而不是分隔符,尽管这需要也要考虑到).那时它几乎和简单的字符串连接一样糟糕 - 但是String.Join
没有问题.
这是我的试验台,int[][]
用于简单; 结果第一:
Join: 9420ms (chk: 210710000 OneBuilder: 9021ms (chk: 210710000
(更新double
结果:)
Join: 11635ms (chk: 210710000 OneBuilder: 11385ms (chk: 210710000
(更新re 2048*64*150)
Join: 11620ms (chk: 206409600 OneBuilder: 11132ms (chk: 206409600
并启用OptimizeForTesting:
Join: 11180ms (chk: 206409600 OneBuilder: 10784ms (chk: 206409600
如此之快,但不是那么大; rig(在控制台上运行,在发布模式下运行等):
using System; using System.Collections.Generic; using System.Diagnostics; using System.Text; namespace ConsoleApplication2 { class Program { static void Collect() { GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); GC.WaitForPendingFinalizers(); GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); GC.WaitForPendingFinalizers(); } static void Main(string[] args) { const int ROWS = 500, COLS = 20, LOOPS = 2000; int[][] data = new int[ROWS][]; Random rand = new Random(123456); for (int row = 0; row < ROWS; row++) { int[] cells = new int[COLS]; for (int col = 0; col < COLS; col++) { cells[col] = rand.Next(); } data[row] = cells; } Collect(); int chksum = 0; Stopwatch watch = Stopwatch.StartNew(); for (int i = 0; i < LOOPS; i++) { chksum += Join(data).Length; } watch.Stop(); Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum); Collect(); chksum = 0; watch = Stopwatch.StartNew(); for (int i = 0; i < LOOPS; i++) { chksum += OneBuilder(data).Length; } watch.Stop(); Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum); Console.WriteLine("done"); Console.ReadLine(); } public static string Join(int[][] array) { return String.Join(Environment.NewLine, Array.ConvertAll(array, row => String.Join(",", Array.ConvertAll(row, x => x.ToString())))); } public static string OneBuilder(IEnumerablesource) { StringBuilder sb = new StringBuilder(); bool firstRow = true; foreach (var row in source) { if (firstRow) { firstRow = false; } else { sb.AppendLine(); } if (row.Length > 0) { sb.Append(row[0]); for (int i = 1; i < row.Length; i++) { sb.Append(',').Append(row[i]); } } } return sb.ToString(); } } }
我不这么认为.透过Reflector看,执行效果String.Join
非常优化.它还具有预先知道要创建的字符串总大小的额外好处,因此不需要任何重新分配.
我创建了两种测试方法来比较它们:
public static string TestStringJoin(double[][] array) { return String.Join(Environment.NewLine, Array.ConvertAll(array, row => String.Join(",", Array.ConvertAll(row, x => x.ToString())))); } public static string TestStringBuilder(double[][] source) { // based on Marc Gravell's code StringBuilder sb = new StringBuilder(); foreach (var row in source) { if (row.Length > 0) { sb.Append(row[0]); for (int i = 1; i < row.Length; i++) { sb.Append(',').Append(row[i]); } } } return sb.ToString(); }
我运行了每个方法50次,传入一个大小的数组[2048][64]
.我为两个阵列做了这个; 一个填充零,另一个填充随机值.我在我的机器上得到了以下结果(P4 3.0 GHz,单核,无HT,从CMD运行释放模式):
// with zeros: TestStringJoin took 00:00:02.2755280 TestStringBuilder took 00:00:02.3536041 // with random values: TestStringJoin took 00:00:05.6412147 TestStringBuilder took 00:00:05.8394650
增加数组的大小[2048][512]
,同时将迭代次数减少到10得到以下结果:
// with zeros: TestStringJoin took 00:00:03.7146628 TestStringBuilder took 00:00:03.8886978 // with random values: TestStringJoin took 00:00:09.4991765 TestStringBuilder took 00:00:09.3033365
结果是可重复的(几乎;由不同的随机值引起的小波动).显然String.Join
大部分时间都要快一点(虽然幅度非常小).
这是我用于测试的代码:
const int Iterations = 50; const int Rows = 2048; const int Cols = 64; // 512 static void Main() { OptimizeForTesting(); // set process priority to RealTime // test 1: zeros double[][] array = new double[Rows][]; for (int i = 0; i < array.Length; ++i) array[i] = new double[Cols]; CompareMethods(array); // test 2: random values Random random = new Random(); double[] template = new double[Cols]; for (int i = 0; i < template.Length; ++i) template[i] = random.NextDouble(); for (int i = 0; i < array.Length; ++i) array[i] = template; CompareMethods(array); } static void CompareMethods(double[][] array) { Stopwatch stopwatch = Stopwatch.StartNew(); for (int i = 0; i < Iterations; ++i) TestStringJoin(array); stopwatch.Stop(); Console.WriteLine("TestStringJoin took " + stopwatch.Elapsed); stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < Iterations; ++i) TestStringBuilder(array); stopwatch.Stop(); Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed); } static void OptimizeForTesting() { Thread.CurrentThread.Priority = ThreadPriority.Highest; Process currentProcess = Process.GetCurrentProcess(); currentProcess.PriorityClass = ProcessPriorityClass.RealTime; if (Environment.ProcessorCount > 1) { // use last core only currentProcess.ProcessorAffinity = new IntPtr(1 << (Environment.ProcessorCount - 1)); } }
除非1%的差异在整个程序运行的时间内变成重要的东西,否则这看起来像微优化.我会编写最易读/可理解的代码,而不用担心1%的性能差异.