我有一个大型数据集(大约1.1M文档),我需要运行mapreduce.
要分组的字段是名为xref的数组.由于集合的大小以及我在32位环境中这样做的事实,我正在尝试将集合减少到新数据库中的另一个集合.
首先,这是一个数据样本:
{ "_id" : ObjectId("4ec6d3aa61910ad451f12e01"), "bii" : -32.9867, "class" : 2456, "decdeg" : -82.4856, "lii" : 297.4896, "name" : "HD 22237", "radeg" : 50.3284, "vmag" : 8, "xref" : ["HD 22237", "CPD -82 65", "-82 64","PPM 376283", "SAO 258336", "CP-82 65","GC 4125" ] } { "_id" : ObjectId("4ec6d44661910ad451f78eba"), "bii" : -32.9901, "class" : 2450, "decdeg" : -82.4781, "decpm" : 0.013, "lii" : 297.4807, "name" : "PPM 376283", "radeg" : 50.3543, "rapm" : 0.0357, "vmag" : 8.4, "xref" : ["HD 22237", "CPD -82 65", "-82 64","PPM 376283", "SAO 258336", "CP-82 65","GC 4125" ] } { "_id" : ObjectId("4ec6d48a61910ad451feae04"), "bii" : -32.9903, "class" : 2450, "decdeg" : -82.4779, "decpm" : 0.027, "hd_component" : 0, "lii" : 297.4806, "name" : "SAO 258336", "radeg" : 50.3543, "rapm" : 0.0355, "vmag" : 8, "xref" : ["HD 22237", "CPD -82 65", "-82 64","PPM 376283", "SAO 258336", "CP-82 65","GC 4125" ] }
这是map和reduce函数(现在我只是lii和bii字段):
function map() { try { emit(this.xref, {lii:this.lii, bii:this.bii}); } catch(e) { } } function reduce(key, values) { var result = {xref:key, lii: 0.0, bii: 0.0}; try { values.forEach(function(value) { if (value.lii && value.bii) { result.lii += value.lii; result.bii += value.bii; } }); result.bii /= values.length; result.lii /= values.length; } catch(e) { } return result; }
不幸的是,运行它最终会出现一条错误消息:
db.catalog.mapReduce(map, reduce, {out:{replace:"catalog2", db:"astro2"}}); Wed Nov 23 10:12:25 uncaught exception: map reduce failed:{ "assertion" : "_id cannot be an array", "assertionCode" : 10099, "errmsg" : "db assertion failure", "ok" : 0
外部参照字段是一个数组,但该数组中的所有值都相等.它是否尝试将该数组用作新集合中的id字段?
是的,无法将_id设置为数组,因为它具有索引的特殊行为.您发出的键在输出集合中用作_id.如果结果很小,这可能只适用于"内联"输出模式,因为它不会进入集合.但理想情况下,您可以将数组转换为字符串(例如,将值连接起来)并将其用作_id,或者将其设置为子对象而不是数组.
另请注意,reduce函数的结果不应包含键.回来{lii:..,bii:..}