4赞

MapReduce的基本内容介绍（附代码）

作者：惬听风吟jyy_802 | 2021-09-11 15:36

本篇文章给大家带来的内容是关于MapReduce的基本内容介绍（附代码），有一定的参考价值，有需要的朋友可以参考一下，希望对你有所帮助。

1、WordCount程序

1.1 WordCount源程序

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
    public WordCount() {
    }
     public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
        if(otherArgs.length < 2) {
            System.err.println("Usage: wordcount  [...] ");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.TokenizerMapper.class);
        job.setCombinerClass(WordCount.IntSumReducer.class);
        job.setReducerClass(WordCount.IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class); 
        for(int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true)?0:1);
    }
    public static class TokenizerMapper extends Mapper {
        private static final IntWritable one = new IntWritable(1);
        private Text word = new Text();
        public TokenizerMapper() {
        }
        public void map(Object key, Text value, Mapper.Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString()); 
            while(itr.hasMoreTokens()) {
                this.word.set(itr.nextToken());
                context.write(this.word, one);
            }
        }
    }
public static class IntSumReducer extends Reducer {
        private IntWritable result = new IntWritable();
        public IntSumReducer() {
        }
        public void reduce(Text key, Iterable values, Reducer.Context context) throws IOException, InterruptedException {
            int sum = 0;
            IntWritable val;
            for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {
                val = (IntWritable)i$.next();
            }
            this.result.set(sum);
            context.write(key, this.result);
        }
    }
}

1.2 运行程序，Run As->Java Applicatiion

1.3 编译打包程序，产生Jar文件

Spark 2

以上就是MapReduce的基本内容介绍（附代码）的详细内容，更多请关注第一PHP社区其它相关文章！

推荐阅读

程序员
我怎样才能在smarty中使用array_merge？

如何解决《我怎样才能在smarty中使用array_merge？》经验，为你挑选了1个好方法。 ... [详细]
程序员
错误:com.android.dex.DexIndexOverflowException:方法ID不在[0,0xffff]中:65536

如何解决《错误:com.android.dex.DexIndexOverflowException:方法ID不在[0,0xffff]中:65536》经验，为你挑选了1个好方法。 ... [详细]
程序员
自定义数据类型渲染器部署

如何解决《自定义数据类型渲染器部署》经验，为你挑选了0个好方法。 ... [详细]
程序员
Yii2中的条件验证

如何解决《Yii2中的条件验证》经验，为你挑选了1个好方法。 ... [详细]
程序员
CPU时间或经过的时间 - 这实际上意味着SQL Query的性能？

如何解决《CPU时间或经过的时间-这实际上意味着SQLQuery的性能？》经验，为你挑选了1个好方法。 ... [详细]
程序员
Ubuntu,你如何删除所有Python 3但不删除2

如何解决《Ubuntu,你如何删除所有Python3但不删除2》经验，为你挑选了3个好方法。 ... [详细]
程序员
[QueryDSL / Spring] java.lang.IllegalStateException：连接不是事务性的

如何解决《[QueryDSL/Spring]java.lang.IllegalStateException：连接不是事务性的》经验，为你挑选了0个好方法。 ... [详细]
程序员
Mifare Ultralight:锁定特定页面

如何解决《MifareUltralight:锁定特定页面》经验，为你挑选了0个好方法。 ... [详细]
程序员
安装ruby ruby-1.9.3-p551时出错

如何解决《安装rubyruby-1.9.3-p551时出错》经验，为你挑选了1个好方法。 ... [详细]
程序员
权限更改了Android 6.0中的回调

如何解决《权限更改了Android6.0中的回调》经验，为你挑选了1个好方法。 ... [详细]
程序员
Slick 3.0批量插入返回对象的顺序

如何解决《Slick3.0批量插入返回对象的顺序》经验，为你挑选了1个好方法。 ... [详细]
程序员
Instagram新API,按标签获取项目

如何解决《Instagram新API,按标签获取项目》经验，为你挑选了0个好方法。 ... [详细]
程序员
PHP:将epoch转换为MySQL DateTime格式

如何解决《PHP:将epoch转换为MySQLDateTime格式》经验，为你挑选了1个好方法。 ... [详细]
程序员
Python - 合并两个重叠的字符串

如何解决《Python-合并两个重叠的字符串》经验，为你挑选了0个好方法。 ... [详细]
程序员
在chartjs中显示饼图外的值

如何解决《在chartjs中显示饼图外的值》经验，为你挑选了0个好方法。 ... [详细]
程序员
如何防止张量流分配GPU内存的全部？

如何解决《如何防止张量流分配GPU内存的全部？》经验，为你挑选了6个好方法。 ... [详细]
程序员
使用C宏启用/禁用LOG级别

如何解决《使用C宏启用/禁用LOG级别》经验，为你挑选了2个好方法。 ... [详细]
程序员
使用PowerShell远程安装Windows 10应用程序

如何解决《使用PowerShell远程安装Windows10应用程序》经验，为你挑选了0个好方法。 ... [详细]
程序员
文本溢出省略号和flex在Firefox上不起作用

如何解决《文本溢出省略号和flex在Firefox上不起作用》经验，为你挑选了1个好方法。 ... [详细]
程序员
我怎么能自动一个一个地提交远程分支

如何解决《我怎么能自动一个一个地提交远程分支》经验，为你挑选了1个好方法。 ... [详细]

惬听风吟jyy_802

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章