Gunakan MapReduce di Apache Hadoop pada HDInsight

Artikel
01/02/2025

Pelajari cara menjalankan pekerjaan MapReduce di kluster Microsoft Azure HDInsight.

Contoh data

HDInsight menyediakan berbagai contoh kumpulan data, yang disimpan di direktori /example/data dan /HdiSamples. Direktori ini berada di penyimpanan default untuk kluster Anda. Dalam dokumen ini, kita menggunakan file /example/data/gutenberg/davinci.txt. File ini berisi buku catatan Leonardo da Vinci.

Contoh MapReduce

Contoh Aplikasi penghitungan kata MapReduce disertakan dengan kluster Microsoft Azure HDInsight Anda. Contoh ini terletak di /example/jars/hadoop-mapreduce-examples.jar pada penyimpanan default untuk kluster Anda.

Kode Java berikut adalah sumber aplikasi MapReduce yang terdapat dalam file hadoop-mapreduce-examples.jar:

package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

    public static class TokenizerMapper
        extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
        }
    }
    }

    public static class IntSumReducer
        extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                        Context context
                        ) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
        sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
    }

    public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: wordcount <in> <out>");
        System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Untuk petunjuk menulis aplikasi MapReduce Anda sendiri, lihat Mengembangkan aplikasi Java MapReduce untuk Microsoft Azure HDInsight.

Menjalankan MapReduce

Microsoft Azure HDInsight dapat menjalankan pekerjaan HiveQL dengan menggunakan berbagai metode. Gunakan tabel berikut untuk memutuskan metode yang tepat untuk Anda, kemudian ikuti tautan untuk mendapatkan panduannya.

Gunakan ini...	...untuk melakukan ini	... dari sistem operasi klien ini
SSH	Gunakan perintah Hadoop melalui SSH	Linux, Unix, `macOS X`, atau Windows
Curl	Kirimkan pekerjaan dari jarak jauh dengan menggunakan REST	Linux, Unix, `macOS X`, atau Windows
Windows PowerShell	Kirimkan pekerjaan dari jarak jauh dengan menggunakan Windows PowerShell	Windows

Langkah berikutnya

Untuk mempelajari selengkapnya tentang bekerja dengan data di Microsoft Azure HDInsight, lihat dokumen berikut ini:

Bagikan melalui

Gunakan MapReduce di Apache Hadoop pada HDInsight

Contoh data

Contoh MapReduce

Menjalankan MapReduce

Langkah berikutnya

Saran dan Komentar

Sumber Daya Tambahan: