ElasticSearch 5.x对Suggester API(文档)引入了一些(重大)更改.最值得注意的变化如下:
完成建议是面向文档的
建议知道他们所属的文件.现在,关联的文档(
_source
)将作为完成建议的一部分返回.
简而言之,所有完成查询都返回所有匹配的文档而不是匹配的单词.这就是问题所在 - 如果自动填充的单词出现在多个文档中,则会重复这些单词.
假设我们有这个简单的映射:
{ "my-index": { "mappings": { "users": { "properties": { "firstName": { "type": "text" }, "lastName": { "type": "text" }, "suggest": { "type": "completion", "analyzer": "simple" } } } } } }
有一些测试文件:
{ "_index": "my-index", "_type": "users", "_id": "1", "_source": { "firstName": "John", "lastName": "Doe", "suggest": [ { "input": [ "John", "Doe" ] } ] } }, { "_index": "my-index", "_type": "users", "_id": "2", "_source": { "firstName": "John", "lastName": "Smith", "suggest": [ { "input": [ "John", "Smith" ] } ] } }
一本书的查询:
POST /my-index/_suggest?pretty { "my-suggest" : { "text" : "joh", "completion" : { "field" : "suggest" } } }
这产生以下结果:
{ "_shards": { "total": 5, "successful": 5, "failed": 0 }, "my-suggest": [ { "text": "joh", "offset": 0, "length": 3, "options": [ { "text": "John", "_index": "my-index", "_type": "users", "_id": "1", "_score": 1, "_source": { "firstName": "John", "lastName": "Doe", "suggest": [ { "input": [ "John", "Doe" ] } ] } }, { "text": "John", "_index": "my-index", "_type": "users", "_id": "2", "_score": 1, "_source": { "firstName": "John", "lastName": "Smith", "suggest": [ { "input": [ "John", "Smith" ] } ] } } ] } ] }
简而言之,对于文本"joh"的完成建议,返回了两(2)个文档 - John和两者都具有相同的text
属性值.
但是,我想收到一(1)个字.像这样简单:
{ "_shards": { "total": 5, "successful": 5, "failed": 0 }, "my-suggest": [ { "text": "joh", "offset": 0, "length": 3, "options": [ "John" ] } ] }
问题:如何实现基于单词的完成建议器.没有必要返回任何与文档相关的数据,因为此时我不需要它.
"完成建议者"是否适合我的情景?或者我应该使用完全不同的方法?
编辑:正如你们许多人所指出的那样,一个额外的完成指数将是一个可行的解决方案.但是,我可以看到这种方法存在多个问题:
保持新索引同步.
自动完成后续单词可能是全局的,而不是缩小范围.例如,假设您在附加索引中有以下单词:"John", "Doe", "David", "Smith"
.在查询时"John D"
,不完整单词的结果应该是,"Doe"
而不是"Doe", "David"
.
要克服第二点,仅索引单个单词是不够的,因为您还需要将所有单词映射到文档,以便正确缩小自动完成后续单词.有了这个,你实际上遇到了与查询原始索引相同的问题.因此,附加索引不再有意义.
正如在评论中暗示的那样,在不获取重复文档的情况下实现此目的的另一种方法是为firstname
包含该字段的ngrams的字段创建子字段.首先,您可以像这样定义映射:
PUT my-index { "settings": { "analysis": { "analyzer": { "completion_analyzer": { "type": "custom", "filter": [ "lowercase", "completion_filter" ], "tokenizer": "keyword" } }, "filter": { "completion_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 24 } } } }, "mappings": { "users": { "properties": { "autocomplete": { "type": "text", "fields": { "raw": { "type": "keyword" }, "completion": { "type": "text", "analyzer": "completion_analyzer", "search_analyzer": "standard" } } }, "firstName": { "type": "text" }, "lastName": { "type": "text" } } } } }
然后你索引一些文件:
POST my-index/users/_bulk {"index":{}} { "firstName": "John", "lastName": "Doe", "autocomplete": "John Doe"} {"index":{}} { "firstName": "John", "lastName": "Deere", "autocomplete": "John Deere" } {"index":{}} { "firstName": "Johnny", "lastName": "Cash", "autocomplete": "Johnny Cash" }
然后你可以查询joh
并获得一个结果John
,另一个结果Johnny
{ "size": 0, "query": { "term": { "autocomplete.completion": "john d" } }, "aggs": { "suggestions": { "terms": { "field": "autocomplete.raw" } } } }
结果:
{ "aggregations": { "suggestions": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "John Doe", "doc_count": 1 }, { "key": "John Deere", "doc_count": 1 } ] } } }