Storm PMML Bolt

Storm integration to load PMML models and compute predictive scores for running tuples. The PMML model represents the machine learning (predictive) model used to do prediction on raw input data. The model is typically loaded into a runtime environment, which will score the raw data that comes in the tuples.

Create Instance of PMML Bolt

To create an instance of the PMMLPredictorBolt, you must provide the ModelOutputs, and a ModelRunner using a ModelRunnerFactory. The ModelOutputs represents the streams and output fields declared by the PMMLPredictorBolt. The ModelRunner represents the runtime environment to execute the predictive scoring. It has only one method:

    Map<String, List<Object>> scoredTuplePerStream(Tuple input);

This method contains the logic to compute the scored tuples from the raw inputs tuple. It's up to the discretion of the implementation to define which scored values are to be assigned to each stream. The keys of this map are the stream ids, and the values the predicted scores.

The PmmlModelRunner is an extension of ModelRunner that represents the typical steps involved in predictive scoring. Hence, it allows for the extraction of raw inputs from the tuple, pre process the raw inputs, and predict the scores from the preprocessed data.

The JPmmlModelRunner is an implementation of PmmlModelRunner that uses JPMML as runtime environment. This implementation extracts the raw inputs from the tuple for all active fields, and builds a tuple with the predicted scores for the predicted fields and output fields. In this implementation all the declared streams will have the same scored tuple.

The predicted, active, and output fields are extracted from the PMML model.

Run Bundled Examples

To run the examples you must execute the following command:

 STORM-HOME/bin/storm jar STORM-HOME/examples/storm-pmml-examples/storm-pmml-examples-2.0.0-SNAPSHOT.jar 
 org.apache.storm.pmml.JpmmlRunnerTestTopology jpmmlTopology PMMLModel.xml RawInputData.csv