12赞

Elasticsearch通过过滤的子文档计数过滤父母

作者：mobiledu2402851203 | 2023-09-08 14:25

如何解决《Elasticsearch通过过滤的子文档计数过滤父母》经验，为你挑选了1个好方法。

我正在尝试对我拥有的一组数据执行一些弹性搜索查询.我有一个用户文档,它是许多子页面视图文档的父级.我希望返回已经查看特定页面任意次数的所有用户(由用户输入框定义).到目前为止,我有一个has_child查询,它将返回所有具有某些id的页面视图的用户.然而,这将使那些父母带着他们所有的孩子回归.接下来,我尝试在这些查询结果上编写聚合,这将基本上以聚合形式执行相同的has_child查询.现在,我有过滤子文档的正确文档计数.我需要使用此文档计数返回并过滤父项.要用单词解释查询,"将查看特定页面的所有用户返回给我4次以上".我可能需要重构我的数据.有什么想法吗？

这是我到目前为止的查询:

curl -XGET 'http://localhost:9200/development_users/_search?pretty=true' -d '
{
    "query" : { 
      "has_child" : {
        "type" : "page_view",
        "query" : {
          "terms" : {
            "viewed_id" : [175,180]
          }
        }
      }
    },
    "aggs" : {
      "to_page_view": {
        "children": {
          "type" : "page_view"
        },
        "aggs" : {
          "page_views_that_match" : {
            "filter" : { "terms": { "viewed_id" : [175,180] } }
          }
        }
      }
    }
}'

这会给我一个回复,如:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "development_users",
      "_type" : "user",
      "_id" : "22548",
      "_score" : 1.0,
      "_source":{"id":22548,"account_id":1009}
    } ]
  },
  "aggregations" : {
    "to_page_view" : {
      "doc_count" : 53,
      "page_views_that_match" : {
        "doc_count" : 2
      }
    }
  }
}

相关映射:

{
  "development_users" : {
    "mappings" : {
      "page_view" : {
        "dynamic" : "false",
        "_parent" : {
          "type" : "user"
        },
        "_routing" : {
          "required" : true
        },
        "properties" : {
          "created_at" : {
            "type" : "date",
            "format" : "date_time"
          },
          "id" : {
            "type" : "integer"
          },
          "viewed_id" : {
            "type" : "integer"
          },
          "time_on_page" : {
            "type" : "integer"
          },
          "title" : {
            "type" : "string"
          },
          "type" : {
            "type" : "string"
          },
          "updated_at" : {
            "type" : "date",
            "format" : "date_time"
          },
          "url" : {
            "type" : "string"
          }
        }
      },
      "user" : {
        "dynamic" : "false",
        "properties" : {
          "account_id" : {
            "type" : "integer"
          },
          "id" : {
            "type" : "integer"
          }
        }
      }
    }
  }
}

Sloan Ahrens.. 6

好的,所以这是一种参与.我做了一些简化,以保持在我脑海中.首先,我使用了这个映射:

PUT /test_index
{
    "mappings": {
        "page_view": {
            "_parent": {
               "type": "development_user"
            },
            "properties": {
                "viewed_id": {
                    "type": "string"
                }
            }
        },
        "development_user": {
            "properties": {
                "id": {
                    "type": "string"
                }
            }
        }
    }
}

然后我添加了一些数据.在这个小小的宇宙中,我有三个用户和两个页面.我想查找"page_a"至少查看过两次的用户,因此如果我构造了正确的查询,则只3返回用户.

POST /test_index/development_user/_bulk
{"index":{"_type":"development_user","_id":1}}
{"id":"user_1"}
{"index":{"_type":"page_view","_parent":1}}
{"viewed_id":"page_a"}
{"index":{"_type":"development_user","_id":2}}
{"id":"user_2"}
{"index":{"_type":"page_view","_parent":2}}
{"viewed_id":"page_b"}
{"index":{"_type":"development_user","_id":3}}
{"id":"user_3"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_b"}

为了得到答案,我们将使用聚合.请注意,我不希望返回文档(正常方式),但我确实希望过滤掉我们分析的文档,因为它会提高效率.所以我使用你以前的基本过滤器.

因此,聚合树开始时terms_parent_id将仅分离父文档.在我的内部,children_page_view它将子文档过滤到我想要的那些("page_a"),并且在层次结构中bucket_selector_page_id_term_count它旁边是使用桶选择器(你需要ES 2.x)来过滤那些符合父文档的父文档标准,然后最后一个顶部命中聚合,向我们展示符合要求的文件.

POST /test_index/development_user/_search
{
   "size": 0,
   "query": {
      "has_child": {
         "type": "page_view",
         "query": {
            "terms": {
               "viewed_id": [
                  "page_a"
               ]
            }
         }
      }
   },
   "aggs": {
      "terms_parent_id": {
         "terms": {
            "field": "id"
         },
         "aggs": {
            "children_page_view": {
               "children": {
                  "type": "page_view"
               },
               "aggs": {
                  "filter_page_ids": {
                     "filter": {
                        "terms": {
                           "viewed_id": [
                              "page_a"
                           ]
                        }
                     }
                  }
               }
            },
            "bucket_selector_page_id_term_count": {
               "bucket_selector": {
                  "buckets_path": {
                     "children_count": "children_page_view>filter_page_ids._count"
                  },
                  "script": "children_count >= 2"
               }
            },
            "top_hits_users": {
               "top_hits": {
                  "_source": {
                     "include": [
                        "id"
                     ]
                  }
               }
            }
         }
      }
   }
}

{
   "took": 14,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "terms_parent_id": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "user_3",
               "doc_count": 1,
               "children_page_view": {
                  "doc_count": 3,
                  "filter_page_ids": {
                     "doc_count": 2
                  }
               },
               "top_hits_users": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "test_index",
                           "_type": "development_user",
                           "_id": "3",
                           "_score": 1,
                           "_source": {
                              "id": "user_3"
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

这是我使用的所有代码:

http://sense.qbox.io/gist/43f24461448519dc884039db40ebd8e2f5b7304f

1> Sloan Ahrens..：

好的,所以这是一种参与.我做了一些简化,以保持在我脑海中.首先,我使用了这个映射:

PUT /test_index
{
    "mappings": {
        "page_view": {
            "_parent": {
               "type": "development_user"
            },
            "properties": {
                "viewed_id": {
                    "type": "string"
                }
            }
        },
        "development_user": {
            "properties": {
                "id": {
                    "type": "string"
                }
            }
        }
    }
}

POST /test_index/development_user/_bulk
{"index":{"_type":"development_user","_id":1}}
{"id":"user_1"}
{"index":{"_type":"page_view","_parent":1}}
{"viewed_id":"page_a"}
{"index":{"_type":"development_user","_id":2}}
{"id":"user_2"}
{"index":{"_type":"page_view","_parent":2}}
{"viewed_id":"page_b"}
{"index":{"_type":"development_user","_id":3}}
{"id":"user_3"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_b"}

POST /test_index/development_user/_search
{
   "size": 0,
   "query": {
      "has_child": {
         "type": "page_view",
         "query": {
            "terms": {
               "viewed_id": [
                  "page_a"
               ]
            }
         }
      }
   },
   "aggs": {
      "terms_parent_id": {
         "terms": {
            "field": "id"
         },
         "aggs": {
            "children_page_view": {
               "children": {
                  "type": "page_view"
               },
               "aggs": {
                  "filter_page_ids": {
                     "filter": {
                        "terms": {
                           "viewed_id": [
                              "page_a"
                           ]
                        }
                     }
                  }
               }
            },
            "bucket_selector_page_id_term_count": {
               "bucket_selector": {
                  "buckets_path": {
                     "children_count": "children_page_view>filter_page_ids._count"
                  },
                  "script": "children_count >= 2"
               }
            },
            "top_hits_users": {
               "top_hits": {
                  "_source": {
                     "include": [
                        "id"
                     ]
                  }
               }
            }
         }
      }
   }
}

{
   "took": 14,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "terms_parent_id": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "user_3",
               "doc_count": 1,
               "children_page_view": {
                  "doc_count": 3,
                  "filter_page_ids": {
                     "doc_count": 2
                  }
               },
               "top_hits_users": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "test_index",
                           "_type": "development_user",
                           "_id": "3",
                           "_score": 1,
                           "_source": {
                              "id": "user_3"
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

这是我使用的所有代码:

http://sense.qbox.io/gist/43f24461448519dc884039db40ebd8e2f5b7304f

推荐阅读

程序员
如何使用SQL在一个主键中创建2个字段？

如何解决《如何使用SQL在一个主键中创建2个字段？》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在CreateView的form_valid方法中引发错误

如何解决《如何在CreateView的form_valid方法中引发错误》经验，为你挑选了1个好方法。 ... [详细]
程序员
正则表达式4非连续且没有重复数字

如何解决《正则表达式4非连续且没有重复数字》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何为多个项目设置Pycharm

如何解决《如何为多个项目设置Pycharm》经验，为你挑选了2个好方法。 ... [详细]
程序员
C#表单身份验证.ASPXAUTH Cookie用于SSO

如何解决《C#表单身份验证.ASPXAUTHCookie用于SSO》经验，为你挑选了0个好方法。 ... [详细]
程序员
MVC 6 WebFarm:无法解密防伪令牌

如何解决《MVC6WebFarm:无法解密防伪令牌》经验，为你挑选了1个好方法。 ... [详细]
程序员
通过绑定设置文本时,TextBox删除按钮(小x)不可见

如何解决《通过绑定设置文本时,TextBox删除按钮(小x)不可见》经验，为你挑选了0个好方法。 ... [详细]
程序员
Nvcc的版本与CUDA不同

如何解决《Nvcc的版本与CUDA不同》经验，为你挑选了0个好方法。 ... [详细]
程序员
不引人注目的JQuery验证在弹出的PartialViews中不起作用

如何解决《不引人注目的JQuery验证在弹出的PartialViews中不起作用》经验，为你挑选了1个好方法。 ... [详细]
程序员
无法找到依赖项Microsoft.AspNet.Server.WebListener> = 1.0.0-rc1-final

如何解决《无法找到依赖项Microsoft.AspNet.Server.WebListener>=1.0.0-rc1-final》经验，为你挑选了1个好方法。 ... [详细]
程序员
iOS上的IFrame高度问题(手机游戏)

如何解决《iOS上的IFrame高度问题(手机游戏)》经验，为你挑选了2个好方法。 ... [详细]
程序员
Angular 2 Beta:Visual Studio ASP .NET MVC

如何解决《Angular2Beta:VisualStudioASP.NETMVC》经验，为你挑选了1个好方法。 ... [详细]
程序员
通过Android中的深层链接向意图传递额外的值

如何解决《通过Android中的深层链接向意图传递额外的值》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何通过7Zip使用批处理脚本解压缩PASSWORD PROTECTED文件？

如何解决《如何通过7Zip使用批处理脚本解压缩PASSWORDPROTECTED文件？》经验，为你挑选了1个好方法。 ... [详细]
程序员
关于使用pandas inplace关键字参数的指南

如何解决《关于使用pandasinplace关键字参数的指南》经验，为你挑选了1个好方法。 ... [详细]
程序员
Objective-C - 如何符合多种协议？

如何解决《Objective-C-如何符合多种协议？》经验，为你挑选了1个好方法。 ... [详细]
程序员
Python，将mongodump的bson输出转换为json对象数组（字典）

如何解决《Python，将mongodump的bson输出转换为json对象数组（字典）》经验，为你挑选了1个好方法。 ... [详细]
程序员
以编程方式设置UIButton文本颜色

如何解决《以编程方式设置UIButton文本颜色》经验，为你挑选了1个好方法。 ... [详细]
程序员
NumPy中astype的有效参数

如何解决《NumPy中astype的有效参数》经验，为你挑选了1个好方法。 ... [详细]
程序员
在MAC OS X上编译Httrack

如何解决《在MACOSX上编译Httrack》经验，为你挑选了1个好方法。 ... [详细]

mobiledu2402851203

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章