20赞

Hadoop - 映射器的构造函数args

作者：手机用户2402852307 | 2023-06-18 08:54

如何解决《Hadoop-映射器的构造函数args》经验，为你挑选了2个好方法。

有没有办法在Hadoop中为Mapper提供构造函数args？可能通过一些包装创造就业的图书馆？

这是我的情景:

public class HadoopTest {

    // Extractor turns a line into a "feature"
    public static interface Extractor {
        public String extract(String s);
    }

    // A concrete Extractor, configurable with a constructor parameter
    public static class PrefixExtractor implements Extractor {
        private int endIndex;

        public PrefixExtractor(int endIndex) { this.endIndex = endIndex; }

        public String extract(String s) { return s.substring(0, this.endIndex); }
    }

    public static class Map extends Mapper {
        private Extractor extractor;

        // Constructor configures the extractor
        public Map(Extractor extractor) { this.extractor = extractor; }

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String feature = extractor.extract(value.toString());
            context.write(new Text(feature), new Text(value.toString()));
        }
    }

    public static class Reduce extends Reducer {
        public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
            for (Text val : values) context.write(key, val);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "test");
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);
    }
}

应该清楚,因为Mapper只被赋予Configuration作为类的引用(Map.class),Hadoop无法传递构造函数参数并配置特定的Extractor.

有一些Hadoop包装框架就像Scoobi,Crunch,Scrunch(可能还有更多我不知道的)似乎有这种能力,但我不知道他们是如何实现的. 编辑:在与Scoobi合作之后,我发现我对此有些不对劲.如果在"映射器"中使用外部定义的对象,则Scoobi要求它是可序列化的,并且如果不是,则会在运行时进行抱怨.所以也许正确的方法就是Extractor在Mapper的设置方法中使我的序列化和反序列化......

此外,我实际上在Scala工作,所以非常欢迎基于Scala的解决方案(如果不鼓励!)

1> wutz..：

我建议告诉你的mapper通过Configuration你正在创建的对象使用哪个提取器.映射器在其setup方法(context.getConfiguration())中接收配置.看起来您不能将对象放在配置中,因为它通常是从XML文件或命令行构造的,但您可以设置枚举值并让映射器自己构造其提取器.在创建映射器之后定制映射器并不是很漂亮,但这就是我对API的解释.

2> Praveen Srip..：

在提交作业时设置实现类名

Configuration conf = new Configuration();
conf.set("PrefixExtractorClass", "com.my.class.ThreePrefixExtractor");

或者使用命令行中的-D选项设置PrefixExtractorClass选项.

下面是mapper中的实现

Extractor extractor = null;
protected void setup(Context context) throws IOException,
            InterruptedException
{
    try {
        Configuration conf = context.getConfiguration();
        String className = conf.get("PrefixExtractorClass");
        extractor = Class.forName(className);
    } Catch (ClassNotFoundException e) {
        //handle the exception
    }
}

现在使用extractormap函数中所需的对象.

包含com.my.class.ThreePrefixExtractor该类的jar 应该分发给所有节点.以下是来自Cloudera 的一篇文章,介绍了不同的方法.

在上面的例子中com.my.class.ThreePrefixExtractor应该扩展Extractor类.

使用这种方法可以使映射器实现成为通用的.这是大多数框架采用的方法(使用Class.forName)来实现可实现特定接口的可插入组件.

推荐阅读

程序员
如何使用jquery在弹出窗口中打开pdf文件

如何解决《如何使用jquery在弹出窗口中打开pdf文件》经验，为你挑选了1个好方法。 ... [详细]
程序员
HDFS上的root scratch dir:/ tmp/hive应该是可写的.当前权限是:rw-rw-rw-(在Windows上)

如何解决《HDFS上的rootscratchdir:/tmp/hive应该是可写的.当前权限是:rw-rw-rw-(在Windows上)》经验，为你挑选了5个好方法。 ... [详细]
程序员
在django-rest-framework中使用.to_representation()和.to_internal_value？

如何解决《在django-rest-framework中使用.to_representation()和.to_internal_value？》经验，为你挑选了1个好方法。 ... [详细]
程序员
复选框选中的属性在Chrome或Firefox开发者工具中不会更改

如何解决《复选框选中的属性在Chrome或Firefox开发者工具中不会更改》经验，为你挑选了1个好方法。 ... [详细]
程序员
在ArangoDB中,将使用过滤器从邻居查询是否在O(n)中完成？

如何解决《在ArangoDB中,将使用过滤器从邻居查询是否在O(n)中完成？》经验，为你挑选了0个好方法。 ... [详细]
程序员
如何在不重复代码的情况下在所有控制器中显示警报？

如何解决《如何在不重复代码的情况下在所有控制器中显示警报？》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何编辑插件以使用主题的模板文件

如何解决《如何编辑插件以使用主题的模板文件》经验，为你挑选了0个好方法。 ... [详细]
程序员
使用Globalize迁移错误

如何解决《使用Globalize迁移错误》经验，为你挑选了0个好方法。 ... [详细]
程序员
django1.8-如何在上传Excel并插入数据库时手动附加信息

如何解决《django1.8-如何在上传Excel并插入数据库时手动附加信息》经验，为你挑选了1个好方法。 ... [详细]
程序员
与ngMouseover辩论

如何解决《与ngMouseover辩论》经验，为你挑选了1个好方法。 ... [详细]
程序员
R包 - 如果未导出该函数,则无法测试@example函数

如何解决《R包-如果未导出该函数,则无法测试@example函数》经验，为你挑选了0个好方法。 ... [详细]
程序员
Spring Data Rest和Spring Data Envers:如何为扩展Revision Repository的Repository公开REST API

如何解决《SpringDataRest和SpringDataEnvers:如何为扩展RevisionRepository的Repository公开RESTAPI》经验，为你挑选了1个好方法。 ... [详细]
程序员
放大溢出：滚动

如何解决《放大溢出：滚动》经验，为你挑选了0个好方法。 ... [详细]
程序员
如何在java中将布尔对象类型设置为表列

如何解决《如何在java中将布尔对象类型设置为表列》经验，为你挑选了1个好方法。 ... [详细]
程序员
STM32如何获得最后的重置状态

如何解决《STM32如何获得最后的重置状态》经验，为你挑选了1个好方法。 ... [详细]
程序员
在离线或飞行模式下处理远程推送通知

如何解决《在离线或飞行模式下处理远程推送通知》经验，为你挑选了1个好方法。 ... [详细]
程序员
WebStorm:仅在保存时可以转换TypeScript文件

如何解决《WebStorm:仅在保存时可以转换TypeScript文件》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何使用Android中的MVP模式从我的Interactor启动服务？

如何解决《如何使用Android中的MVP模式从我的Interactor启动服务？》经验，为你挑选了0个好方法。 ... [详细]
程序员
UIKeyboardWillShowNotification为快速键盘返回错误的帧

如何解决《UIKeyboardWillShowNotification为快速键盘返回错误的帧》经验，为你挑选了0个好方法。 ... [详细]
程序员
在UWP中使用RenderTargetBitmap时出错

如何解决《在UWP中使用RenderTargetBitmap时出错》经验，为你挑选了0个好方法。 ... [详细]

手机用户2402852307

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章