我正在发送此请求
curl -XGET 'host/process_test_3/14/_search' -d '{ "query" : { "query_string" : { "query" : "\"*cor interface*\"", "fields" : ["title", "obj_id"] } } }'
而且我得到了正确的结果
{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 5.421598, "hits": [ { "_index": "process_test_3", "_type": "14", "_id": "141_dashboard_14", "_score": 5.421598, "_source": { "obj_type": "dashboard", "obj_id": "141", "title": "Cor Interface Monitoring" } } ] } }
但是当我想通过单词部分搜索时,例如
curl -XGET 'host/process_test_3/14/_search' -d ' { "query" : { "query_string" : { "query" : "\"*cor inter*\"", "fields" : ["title", "obj_id"] } } }'
我没有得到任何结果:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [] } }
我究竟做错了什么?
这是因为您的title
字段可能已经过标准分析器(默认设置)进行了分析,并且标题Cor Interface Monitoring
已被标记为三个标记cor
,interface
并且monitoring
.
为了搜索单词的任何子字符串,您需要创建一个利用ngram标记过滤器的自定义分析器,以便为每个标记的所有子字符串编制索引.
您可以像这样创建索引:
curl -XPUT localhost:9200/process_test_3 -d '{ "settings": { "analysis": { "analyzer": { "substring_analyzer": { "tokenizer": "standard", "filter": ["lowercase", "substring"] } }, "filter": { "substring": { "type": "nGram", "min_gram": 2, "max_gram": 15 } } } }, "mappings": { "14": { "properties": { "title": { "type": "string", "analyzer": "substring_analyzer" } } } } }'
然后,您可以重新索引数据.这将做的是标题Cor Interface Monitoring
现在将被标记为:
co
,cor
,or
in
,int
,inte
,inter
,interf
,等
mo
,mon
,moni
,等
让你的第二个搜索查询现在将返回文档您期望,因为令牌cor
和inter
现在相匹配.