当前位置:  开发笔记 > 编程语言 > 正文

给出真理表创建二叉树所需的帮助

如何解决《给出真理表创建二叉树所需的帮助》经验,为你挑选了2个好方法。

首先,为了提供完整的信息披露,我想指出这与机器学习课程中的家庭作业有关.这个问题不是作业问题,而是我需要弄清楚的是为了完成创建ID3决策树算法的更大问题.

在给出真值表时,我需要生成类似于以下的树

let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))

learnedTree是BinaryTree类型,我定义如下:

type BinaryTree =
    | Leaf of int
    | Node of int * string * BinaryTree * BinaryTree

ID3算法考虑了各种方程式来确定在哪里拆分树,而且我已经弄明白了,我只是在从真值表创建学习树时遇到了麻烦.例如,如果我有下表

A1 | A2 | A3 | Class
1     0    0      1
0     1    0      1
0     0    0      0
1     0    1      0
0     0    0      0
1     1    0      1
0     1    1      0

我决定拆分属性A1,最后会得到以下内容:

              (A1 = 1)  A1   (A1 = 0)
   A2 | A3 | Class                A2 | A3 | Class
   0     0      1                1      0      1
   0     1      0                0      0      0
   1     0      1                0      0      0
                                 0      1      1

然后我会拆分左侧并拆分右侧,并继续递归模式,直到叶子节点是纯净的,我最终得到一个类似于以下基于拆分的树.

let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))

到目前为止,这就是我一直"被黑了",但我想我可能会离开:

let rec createTree (listToSplit : list>) index =
    let leftSideSplit =
        listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
    let rightSideSplit =
        listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
    if leftSideSplit.Length > 0 then
        let pureCheck = isListPure leftSideSplit
        if pureCheck = 0 then
            printfn "%s" "Pure left node class 0"
            createTree leftSideSplit (index + 1)
        else if pureCheck = 1 then
            printfn "%s" "Pure left node class 1"
            createTree leftSideSplit (index + 1)
        else
            printfn "%s - %A" "Recursing Left" leftSideSplit
            createTree leftSideSplit (index + 1)
    else printfn "%s" "Pure left node class 0"

我应该使用模式匹配吗?任何提示/想法/帮助?谢谢你!



1> Chris Smith..:

编辑:我已经在我的博客上发布了ID3的实现:http: //blogs.msdn.com/chrsmith

嘿Jim,我一直想写一篇在F#中实现ID3的博客文章一段时间 - 感谢给我一个执行.虽然此代码未完全(或正确)实现算法,但它应该足以让您入门.

一般来说,你有正确的方法 - 将每个分支代表一个有区别的联合案例是好的.和Brian说的一样,List.partition绝对是一个方便的功能.使这项工作正确的诀窍在于确定要拆分的最佳属性/值对 - 为此,您需要通过熵等计算信息增益.

type Attribute = string
type Value = string

type Record = 
    {
        Weather : string
        Temperature : string
        PlayTennis : bool 
    }
    override this.ToString() =
        sprintf
            "{Weather = %s, Temp = %s, PlayTennis = %b}" 
            this.Weather 
            this.Temperature 
            this.PlayTennis

type Decision = Attribute * Value

type DecisionTreeNode =
    | Branch of Decision * DecisionTreeNode * DecisionTreeNode
    | Leaf of Record list

// ------------------------------------

// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data = 
    // Just group by weather, then by temperature
    let uniqueWeathers = 
        List.fold 
            (fun acc item -> Set.add item.Weather acc) 
            Set.empty
            data

    let uniqueTemperatures = 
        List.fold
            (fun acc item -> Set.add item.Temperature acc) 
            Set.empty
            data

    if uniqueWeathers.Count = 1 then
        let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
        let left, right = 
            List.partition
                (fun item -> item.Temperature = uniqueTemperatures.MinimumElement) 
                data
        (bestSplit, left, right)
    else
        let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
        let left, right =
            List.partition
                (fun item -> item.Weather = uniqueWeathers.MinimumElement)
                data
        (bestSplit, left, right)

let rec determineBranch data =
    if List.length data < 4 then
        Leaf(data)
    else
        // Use the entropy function to break the dataset on
        // the category / value that best splits the data
        let bestDecision, leftBranch, rightBranch = bestSplit data
        Branch(
            bestDecision, 
            determineBranch leftBranch, 
            determineBranch rightBranch)

// ------------------------------------    

let rec printID3Result indent branch =
    let padding = new System.String(' ', indent)
    match branch with
    | Leaf(data) ->
        data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
    | Branch(decision, lhs, rhs) ->
        printfn "%sBranch predicate [%A]" padding decision
        printfn "%sWhere predicate is true:" padding
        printID3Result (indent + 4) lhs
        printfn "%sWhere predicate is false:" padding
        printID3Result (indent + 4) rhs


// ------------------------------------    

let dataset =
    [
        { Weather = "windy"; Temperature = "hot";  PlayTennis = false }
        { Weather = "windy"; Temperature = "cool"; PlayTennis = false }
        { Weather = "nice";  Temperature = "cool"; PlayTennis = true }
        { Weather = "nice";  Temperature = "cold"; PlayTennis = true }
        { Weather = "humid"; Temperature = "hot";  PlayTennis = false }
    ]

printfn "Given input list:"
dataset |> List.iter (printfn "%A")

printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result



2> Brian..:

您可以使用List.partition而不是两个List.choose调用.

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html

(或现在http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx)

我不清楚模式匹配会在这里给你带来多少收获; 输入类型(列表列表)和处理(分区和'纯度'检查)并不真正适用于此.

当然,当你最终获得'结束'(一个纯粹的列表)时,你需要创建一个树,然后当输入只有一个'side'并且它是'pure'时,这个函数可能会创建一个Leaf,但是创建一个每个其他输入的左侧和右侧结构节点.也许.我没有完全理解算法.

希望这会有助于引导你一点点.绘制一些较小的样本输入和输出可能有助于解决函数体的各种情况.

推荐阅读
跟我搞对象吧
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有