public class JoinBolt extends BaseWindowedBolt
Modifier and Type | Class and Description |
---|---|
protected static class |
JoinBolt.FieldSelector |
protected class |
JoinBolt.JoinAccumulator |
protected static class |
JoinBolt.JoinInfo
Describes how to join the other stream with the current stream.
|
protected static class |
JoinBolt.JoinType |
protected class |
JoinBolt.ResultRecord |
static class |
JoinBolt.Selector |
BaseWindowedBolt.Count, BaseWindowedBolt.Duration
Modifier and Type | Field and Description |
---|---|
protected LinkedHashMap<String,JoinBolt.JoinInfo> |
joinCriteria |
protected JoinBolt.FieldSelector[] |
outputFields |
protected String |
outputStreamName |
protected JoinBolt.Selector |
selectorType |
timestampExtractor, windowConfiguration
Constructor and Description |
---|
JoinBolt(JoinBolt.Selector type,
String srcOrStreamId,
String fieldName)
Introduces the first stream to start the join with.
|
JoinBolt(String sourceId,
String fieldName)
Calls JoinBolt(Selector.SOURCE, sourceId, fieldName)
|
Modifier and Type | Method and Description |
---|---|
void |
declareOutputFields(OutputFieldsDeclarer declarer)
Declare the output schema for all the streams of this topology.
|
protected JoinBolt.JoinAccumulator |
doInnerJoin(JoinBolt.JoinAccumulator probe,
Map<Object,ArrayList<Tuple>> buildInput,
JoinBolt.JoinInfo joinInfo,
boolean finalJoin) |
protected JoinBolt.JoinAccumulator |
doJoin(JoinBolt.JoinAccumulator probe,
HashMap<Object,ArrayList<Tuple>> buildInput,
JoinBolt.JoinInfo joinInfo,
boolean finalJoin) |
protected JoinBolt.JoinAccumulator |
doLeftJoin(JoinBolt.JoinAccumulator probe,
Map<Object,ArrayList<Tuple>> buildInput,
JoinBolt.JoinInfo joinInfo,
boolean finalJoin) |
protected ArrayList<Object> |
doProjection(ArrayList<Tuple> tuples,
JoinBolt.FieldSelector[] projectionFields) |
void |
execute(TupleWindow inputWindow)
Process the tuple window and optionally emit new tuples based on the tuples in the input window.
|
protected JoinBolt.JoinAccumulator |
hashJoin(List<Tuple> tuples) |
JoinBolt |
join(String newStream,
String field,
String priorStream)
Performs inner Join with the newStream.
|
JoinBolt |
leftJoin(String newStream,
String field,
String priorStream)
Performs left Join with the newStream.
|
protected Object |
lookupField(JoinBolt.FieldSelector fieldSelector,
Tuple tuple) |
void |
prepare(Map<String,Object> topoConf,
TopologyContext context,
OutputCollector collector)
This is similar to the
IBolt.prepare(Map, TopologyContext, OutputCollector) except that while emitting, the tuples are automatically anchored to the tuples in the inputWindow. |
JoinBolt |
select(String commaSeparatedKeys)
Specify projection fields.
|
BaseWindowedBolt |
withLag(BaseWindowedBolt.Duration duration)
Specify the maximum time lag of the tuple timestamp in milliseconds.
|
JoinBolt |
withLateTupleStream(String streamId)
Specify a stream id on which late tuples are going to be emitted.
|
JoinBolt |
withOutputStream(String streamName)
Optional.
|
JoinBolt |
withTimestampExtractor(TimestampExtractor timestampExtractor)
Specify the timestamp extractor implementation.
|
JoinBolt |
withTimestampField(String fieldName)
Specify a field in the tuple that represents the timestamp as a long value.
|
JoinBolt |
withTumblingWindow(BaseWindowedBolt.Count count)
A count based tumbling window.
|
JoinBolt |
withTumblingWindow(BaseWindowedBolt.Duration duration)
A time duration based tumbling window.
|
BaseWindowedBolt |
withWatermarkInterval(BaseWindowedBolt.Duration interval)
Specify the watermark event generation interval.
|
JoinBolt |
withWindow(BaseWindowedBolt.Count windowLength)
A tuple count based window that slides with every incoming tuple.
|
JoinBolt |
withWindow(BaseWindowedBolt.Count windowLength,
BaseWindowedBolt.Count slidingInterval)
Tuple count based sliding window configuration.
|
JoinBolt |
withWindow(BaseWindowedBolt.Count windowLength,
BaseWindowedBolt.Duration slidingInterval)
Tuple count and time duration based sliding window configuration.
|
JoinBolt |
withWindow(BaseWindowedBolt.Duration windowLength)
A time duration based window that slides with every incoming tuple.
|
JoinBolt |
withWindow(BaseWindowedBolt.Duration windowLength,
BaseWindowedBolt.Count slidingInterval)
Time duration and count based sliding window configuration.
|
JoinBolt |
withWindow(BaseWindowedBolt.Duration windowLength,
BaseWindowedBolt.Duration slidingInterval)
Time duration based sliding window configuration.
|
cleanup, getComponentConfiguration, getTimestampExtractor
protected final JoinBolt.Selector selectorType
protected LinkedHashMap<String,JoinBolt.JoinInfo> joinCriteria
protected JoinBolt.FieldSelector[] outputFields
protected String outputStreamName
public JoinBolt(String sourceId, String fieldName)
Calls JoinBolt(Selector.SOURCE, sourceId, fieldName)
sourceId
- Id of source component (spout/bolt) from which this bolt is receiving datafieldName
- the field to use for joining the stream (x.y.z format)public JoinBolt(JoinBolt.Selector type, String srcOrStreamId, String fieldName)
Introduces the first stream to start the join with. Equivalent SQL … select …. from srcOrStreamId …
type
- Specifies whether ‘srcOrStreamId’ refers to stream name/source componentsrcOrStreamId
- name of stream OR source componentfieldName
- the field to use for joining the stream (x.y.z format)public JoinBolt withOutputStream(String streamName)
Optional. Allows naming the output stream of this bolt. If not specified, the emits will happen on ‘default’ stream.
public JoinBolt join(String newStream, String field, String priorStream)
Performs inner Join with the newStream. SQL: from priorStream inner join newStream on newStream.field = priorStream.field1
same as: new WindowedQueryBolt(priorStream,field1). join(newStream, field, priorStream);
Note: priorStream must be previously joined. Valid ex: new WindowedQueryBolt(s1,k1). join(s2,k2, s1). join(s3,k3, s2);
Invalid ex: new WindowedQueryBolt(s1,k1). join(s3,k3, s2). join(s2,k2, s1);
newStream
- Either stream name or name of upstream componentfield
- the field on which to perform the joinpublic JoinBolt leftJoin(String newStream, String field, String priorStream)
Performs left Join with the newStream. SQL : from stream1 left join stream2 on stream2.field = stream1.field1 same as: new WindowedQueryBolt(stream1, field1). leftJoin(stream2, field, stream1);
Note: priorStream must be previously joined Valid ex: new WindowedQueryBolt(s1,k1). leftJoin(s2,k2, s1). leftJoin(s3,k3, s2); Invalid ex: new WindowedQueryBolt(s1,k1). leftJoin(s3,k3, s2). leftJoin(s2,k2, s1);
newStream
- Either a name of a stream or an upstream componentfield
- the field on which to perform the joinpublic JoinBolt select(String commaSeparatedKeys)
Specify projection fields. i.e. Specifies the fields to include in the output. e.g: .select(“field1, stream2:field2, field3”) Nested Key names are supported for nested types: e.g: .select(“outerKey1.innerKey1, outerKey1.innerKey2, stream3:outerKey2.innerKey3)” Inner types (non leaf) must be Map<> in order to support nested lookup using this dot notation This selected fields implicitly declare the output fieldNames for the bolt based.
public void declareOutputFields(OutputFieldsDeclarer declarer)
IComponent
Declare the output schema for all the streams of this topology.
declareOutputFields
in interface IComponent
declareOutputFields
in class BaseWindowedBolt
declarer
- this is used to declare output stream ids, output fields, and whether or not each output stream is a direct streampublic void prepare(Map<String,Object> topoConf, TopologyContext context, OutputCollector collector)
IWindowedBolt
This is similar to the IBolt.prepare(Map, TopologyContext, OutputCollector)
except that while emitting, the tuples are automatically anchored to the tuples in the inputWindow.
prepare
in interface IWindowedBolt
prepare
in class BaseWindowedBolt
public void execute(TupleWindow inputWindow)
IWindowedBolt
Process the tuple window and optionally emit new tuples based on the tuples in the input window.
protected JoinBolt.JoinAccumulator hashJoin(List<Tuple> tuples)
protected JoinBolt.JoinAccumulator doJoin(JoinBolt.JoinAccumulator probe, HashMap<Object,ArrayList<Tuple>> buildInput, JoinBolt.JoinInfo joinInfo, boolean finalJoin)
protected JoinBolt.JoinAccumulator doInnerJoin(JoinBolt.JoinAccumulator probe, Map<Object,ArrayList<Tuple>> buildInput, JoinBolt.JoinInfo joinInfo, boolean finalJoin)
protected JoinBolt.JoinAccumulator doLeftJoin(JoinBolt.JoinAccumulator probe, Map<Object,ArrayList<Tuple>> buildInput, JoinBolt.JoinInfo joinInfo, boolean finalJoin)
protected ArrayList<Object> doProjection(ArrayList<Tuple> tuples, JoinBolt.FieldSelector[] projectionFields)
protected Object lookupField(JoinBolt.FieldSelector fieldSelector, Tuple tuple)
public JoinBolt withWindow(BaseWindowedBolt.Count windowLength, BaseWindowedBolt.Count slidingInterval)
BaseWindowedBolt
Tuple count based sliding window configuration.
withWindow
in class BaseWindowedBolt
windowLength
- the number of tuples in the windowslidingInterval
- the number of tuples after which the window slidespublic JoinBolt withWindow(BaseWindowedBolt.Count windowLength, BaseWindowedBolt.Duration slidingInterval)
BaseWindowedBolt
Tuple count and time duration based sliding window configuration.
withWindow
in class BaseWindowedBolt
windowLength
- the number of tuples in the windowslidingInterval
- the time duration after which the window slidespublic JoinBolt withWindow(BaseWindowedBolt.Duration windowLength, BaseWindowedBolt.Count slidingInterval)
BaseWindowedBolt
Time duration and count based sliding window configuration.
withWindow
in class BaseWindowedBolt
windowLength
- the time duration of the windowslidingInterval
- the number of tuples after which the window slidespublic JoinBolt withWindow(BaseWindowedBolt.Duration windowLength, BaseWindowedBolt.Duration slidingInterval)
BaseWindowedBolt
Time duration based sliding window configuration.
withWindow
in class BaseWindowedBolt
windowLength
- the time duration of the windowslidingInterval
- the time duration after which the window slidespublic JoinBolt withWindow(BaseWindowedBolt.Count windowLength)
BaseWindowedBolt
A tuple count based window that slides with every incoming tuple.
withWindow
in class BaseWindowedBolt
windowLength
- the number of tuples in the windowpublic JoinBolt withWindow(BaseWindowedBolt.Duration windowLength)
BaseWindowedBolt
A time duration based window that slides with every incoming tuple.
withWindow
in class BaseWindowedBolt
windowLength
- the time duration of the windowpublic JoinBolt withTumblingWindow(BaseWindowedBolt.Count count)
BaseWindowedBolt
A count based tumbling window.
withTumblingWindow
in class BaseWindowedBolt
count
- the number of tuples after which the window tumblespublic JoinBolt withTumblingWindow(BaseWindowedBolt.Duration duration)
BaseWindowedBolt
A time duration based tumbling window.
withTumblingWindow
in class BaseWindowedBolt
duration
- the time duration after which the window tumblespublic JoinBolt withTimestampField(String fieldName)
BaseWindowedBolt
Specify a field in the tuple that represents the timestamp as a long value. If this field is not present in the incoming tuple, an IllegalArgumentException
will be thrown. The field MUST contain a timestamp in milliseconds
withTimestampField
in class BaseWindowedBolt
fieldName
- the name of the field that contains the timestamppublic JoinBolt withTimestampExtractor(TimestampExtractor timestampExtractor)
BaseWindowedBolt
Specify the timestamp extractor implementation.
withTimestampExtractor
in class BaseWindowedBolt
timestampExtractor
- the TimestampExtractor
implementationpublic JoinBolt withLateTupleStream(String streamId)
BaseWindowedBolt
Specify a stream id on which late tuples are going to be emitted. They are going to be accessible via the WindowedBoltExecutor.LATE_TUPLE_FIELD
field. It must be defined on a per-component basis, and in conjunction with the BaseWindowedBolt.withTimestampField(java.lang.String)
, otherwise IllegalArgumentException
will be thrown.
withLateTupleStream
in class BaseWindowedBolt
streamId
- the name of the stream used to emit late tuples onpublic BaseWindowedBolt withLag(BaseWindowedBolt.Duration duration)
BaseWindowedBolt
Specify the maximum time lag of the tuple timestamp in milliseconds. It means that the tuple timestamps cannot be out of order by more than this amount.
withLag
in class BaseWindowedBolt
duration
- the max lag durationpublic BaseWindowedBolt withWatermarkInterval(BaseWindowedBolt.Duration interval)
BaseWindowedBolt
Specify the watermark event generation interval. For tuple based timestamps, watermark events are used to track the progress of time
withWatermarkInterval
in class BaseWindowedBolt
interval
- the interval at which watermark events are generatedCopyright © 2022 The Apache Software Foundation. All rights reserved.