public class SortMergeJoinExample
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
SortMergeJoinExample and
HashJoinExample. HashJoinExample which require one dataset(hashFile) must be small
enough to fit into memory, while in SortMergeJoinExample, it does not
load one data set into memory, it just sort the output of the datasets before
feeding to SortMergeJoinExample.SortMergeJoinProcessor, just like the sort phase before
reduce in traditional MapReduce. Then we could move forward the iterators of
two inputs in SortMergeJoinExample.SortMergeJoinProcessor to find the joined keys since
they are both sorted already. HashJoinExample It is required that keys
in the hashFile are unique. while for SortMergeJoinExample it is
required that keys in the both 2 datasets are unique.| Modifier and Type | Class and Description |
|---|---|
static class |
SortMergeJoinExample.SortMergeJoinProcessor
Join 2 inputs which has already been sorted.
|
| Constructor and Description |
|---|
SortMergeJoinExample() |
| Modifier and Type | Method and Description |
|---|---|
static void |
main(String[] args) |
int |
run(org.apache.hadoop.conf.Configuration conf,
String[] args,
org.apache.tez.client.TezClient tezClient) |
int |
run(String[] args) |
Copyright © 2014 Apache Software Foundation. All rights reserved.