这是我的档案文件
@relation hamspam @attribute text string @attribute class {ham,spam} @data 'good',ham 'very good',ham 'bad',spam 'very bad',spam 'very bad, very bad',spam
我想做的是在我的Java程序中使用weka clasiffier对它进行分类,但是我不知道如何使用StringToWordVector进行分类。
这是我的代码:
Classifier j48tree = new J48(); Instances train = new Instances(new BufferedReader(new FileReader("data.arff"))); StringToWordVector filter = new StringToWordVector();
接下来是什么?我不知道该怎么办。
import weka.core.Instance; //import required classes import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; import weka.core.stemmers.LovinsStemmer; import weka.classifiers.meta.FilteredClassifier; import weka.classifiers.trees.J48; import weka.filters.unsupervised.attribute.Remove; import weka.filters.unsupervised.attribute.StringToWordVector; public class ClassifierWithFilter{ public static void main(String args[]) throws Exception{ //load dataset DataSource source = new DataSource("/Users/amaryadav/Desktop/spamham.arff"); Instances dataset = source.getDataSet(); //set class index to the last attribute dataset.setClassIndex(dataset.numAttributes()-1); //the base classifier J48 tree = new J48(); //the filter StringToWordVector filter = new StringToWordVector(); filter.setInputFormat(dataset); filter.setIDFTransform(true); filter.setUseStoplist(true); LovinsStemmer stemmer = new LovinsStemmer(); filter.setStemmer(stemmer); filter.setLowerCaseTokens(true); //Create the FilteredClassifier object FilteredClassifier fc = new FilteredClassifier(); //specify filter fc.setFilter(filter); //specify base classifier fc.setClassifier(tree); //Build the meta-classifier fc.buildClassifier(dataset); System.out.println(tree.graph()); System.out.println(tree); } }
该代码使用J48决策树来构建使用spamham.arff训练的分类器。希望能有所帮助。