我在弹性搜索中有下一个字段映射(YML中的定义):
my_analyzer: type: custom tokenizer: keyword filter: lowercase products_filter: type: "nested" properties: filter_name: {"type" : "string", analyzer: "my_analyzer"} filter_value: {"type" : "string" , analyzer: "my_analyzer"}
每个文档都有很多过滤器,看起来像:
"products_filter": [ { "filter_name": "Rahmengröße", "filter_value": "33,5 cm" } , { "filter_name": "color", "filter_value": "gelb" } , { "filter_name": "Rahmengröße", "filter_value": "39,5 cm" } , { "filter_name": "Rahmengröße", "filter_value": "45,5 cm" }]
我试图获取每个过滤器的唯一过滤器名称列表和唯一过滤器值列表.
我的意思是,我希望获得如下结构:Rahmengröße:
39,5 cm 45,5
cm
33,5
cm
颜色:
gelb
为了得到它,我尝试了几种聚合变体,例如:
{ "aggs": { "bla": { "terms": { "field": "products_filter.filter_name" }, "aggs": { "bla2": { "terms": { "field": "products_filter.filter_value" } } } } } }
这个要求是错误的.
它将返回我的唯一过滤器名称列表,每个过滤器名称都包含所有filter_values列表.
"bla": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 103, "buckets": [ { "key": "color", "doc_count": 9, "bla2": { "doc_count_error_upper_bound": 4, "sum_other_doc_count": 366, "buckets": [ { "key": "100", "doc_count": 5 } , { "key": "cm", "doc_count": 5 } , { "key": "unisex", "doc_count": 5 } , { "key": "11", "doc_count": 4 } , { "key": "160", "doc_count": 4 } , { "key": "22", "doc_count": 4 } , { "key": "a", "doc_count": 4 } , { "key": "alu", "doc_count": 4 } , { "key": "aluminium", "doc_count": 4 } , { "key": "aus", "doc_count": 4 } ] } } ,
另外我尝试使用Reverse嵌套聚合,但它对我没有帮助.
所以我认为我的尝试有一些逻辑错误?
正如我所说的那样.您的问题是您的文本被分析,而弹性搜索总是在令牌级别聚合.因此,为了解决这个问题,您的字段值必须编入索引为单个标记.有两种选择:
不要分析它们
使用关键字分析器+小写(不区分大小写的aggs)索引它们
因此,这将是创建自定义关键字分析器的设置,其中包含小写过滤器和已删除的重音字符(ö => o
以及ß => ss
字段的其他字段,因此它们可用于聚合(raw
和keyword
):
PUT /test { "settings": { "analysis": { "analyzer": { "my_analyzer_keyword": { "type": "custom", "tokenizer": "keyword", "filter": [ "asciifolding", "lowercase" ] } } } }, "mappings": { "data": { "properties": { "products_filter": { "type": "nested", "properties": { "filter_name": { "type": "string", "analyzer": "standard", "fields": { "raw": { "type": "string", "index": "not_analyzed" }, "keyword": { "type": "string", "analyzer": "my_analyzer_keyword" } } }, "filter_value": { "type": "string", "analyzer": "standard", "fields": { "raw": { "type": "string", "index": "not_analyzed" }, "keyword": { "type": "string", "analyzer": "my_analyzer_keyword" } } } } } } } } }
你给我们的测试文件:
PUT /test/data/1 { "products_filter": [ { "filter_name": "Rahmengröße", "filter_value": "33,5 cm" }, { "filter_name": "color", "filter_value": "gelb" }, { "filter_name": "Rahmengröße", "filter_value": "39,5 cm" }, { "filter_name": "Rahmengröße", "filter_value": "45,5 cm" } ] }
那将是使用raw
字段聚合的查询:
GET /test/_search { "size": 0, "aggs": { "Nesting": { "nested": { "path": "products_filter" }, "aggs": { "raw_names": { "terms": { "field": "products_filter.filter_name.raw", "size": 0 }, "aggs": { "raw_values": { "terms": { "field": "products_filter.filter_value.raw", "size": 0 } } } } } } } }
它确实带来了预期的结果(带有过滤器名称的桶和带有值的子桶):
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0, "hits": [] }, "aggregations": { "Nesting": { "doc_count": 4, "raw_names": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Rahmengröße", "doc_count": 3, "raw_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "33,5 cm", "doc_count": 1 }, { "key": "39,5 cm", "doc_count": 1 }, { "key": "45,5 cm", "doc_count": 1 } ] } }, { "key": "color", "doc_count": 1, "raw_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "gelb", "doc_count": 1 } ] } } ] } } } }
另外,您可以使用带有关键字分析器的字段(以及一些规范化)来获得更多通用和不区分大小写的结果:
GET /test/_search { "size": 0, "aggs": { "Nesting": { "nested": { "path": "products_filter" }, "aggs": { "keyword_names": { "terms": { "field": "products_filter.filter_name.keyword", "size": 0 }, "aggs": { "keyword_values": { "terms": { "field": "products_filter.filter_value.keyword", "size": 0 } } } } } } } }
这就是结果:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0, "hits": [] }, "aggregations": { "Nesting": { "doc_count": 4, "keyword_names": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "rahmengrosse", "doc_count": 3, "keyword_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "33,5 cm", "doc_count": 1 }, { "key": "39,5 cm", "doc_count": 1 }, { "key": "45,5 cm", "doc_count": 1 } ] } }, { "key": "color", "doc_count": 1, "keyword_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "gelb", "doc_count": 1 } ] } } ] } } } }