T
- the type of the aggregation resultACC
- the type of the aggregation accumulator. The accumulator is used to keep the
aggregated values which are needed to compute an aggregation result. AggregateFunction
represents its state using accumulator, thereby the state of the AggregateFunction must be
put into the accumulator.@PublicEvolving public abstract class AggregateFunction<T,ACC> extends UserDefinedAggregateFunction<T,ACC>
The behavior of an AggregateFunction
can be defined by implementing a series of custom
methods. An AggregateFunction
needs at least three methods:
There are a few other methods that can be optional to have:
All these methods must be declared publicly, not static, and named exactly as the names
mentioned above. The method UserDefinedAggregateFunction.createAccumulator()
is defined in the UserDefinedAggregateFunction
function, and method getValue(ACC)
is defined in the AggregateFunction
while other methods are explained below.
Processes the input values and update the provided accumulator instance. The method
accumulate can be overloaded with different custom types and arguments. An AggregateFunction
requires at least one accumulate() method.
param: accumulator the accumulator which contains the current aggregated results
param: [user defined inputs] the input value (usually obtained from a new arrived data).
public void accumulate(ACC accumulator, [user defined inputs])
Retracts the input values from the accumulator instance. The current design assumes the
inputs are the values that have been previously accumulated. The method retract can be
overloaded with different custom types and arguments. This function must be implemented for
data stream bounded OVER aggregates.
param: accumulator the accumulator which contains the current aggregated results
param: [user defined inputs] the input value (usually obtained from a new arrived data).
public void retract(ACC accumulator, [user defined inputs])
Merges a group of accumulator instances into one accumulator instance. This function must be
implemented for data stream session window grouping aggregates and data set grouping aggregates.
param: accumulator the accumulator which will keep the merged aggregate results. It should
be noted that the accumulator may contain the previous aggregated
results. Therefore user should not replace or clean this instance in the
custom merge method.
param: its an java.lang.Iterable pointed to a group of accumulators that will be
merged.
public void merge(ACC accumulator, java.lang.Iterable<ACC> iterable)
Resets the accumulator for this AggregateFunction. This function must be implemented for
data set grouping aggregates.
param: accumulator the accumulator which needs to be reset
public void resetAccumulator(ACC accumulator)
If this aggregate function can only be applied in an OVER window, this can be declared using
the requirement FunctionRequirement.OVER_WINDOW_ONLY
in getRequirements()
.
Constructor and Description |
---|
AggregateFunction() |
Modifier and Type | Method and Description |
---|---|
FunctionKind |
getKind()
Returns the kind of function this definition describes.
|
Set<FunctionRequirement> |
getRequirements()
Returns the set of requirements this definition demands.
|
TypeInference |
getTypeInference(DataTypeFactory typeFactory)
Returns the logic for performing type inference of a call to this function definition.
|
abstract T |
getValue(ACC accumulator)
Called every time when an aggregation result should be materialized.
|
boolean |
requiresOver()
Deprecated.
Use
getRequirements() instead. |
createAccumulator, getAccumulatorType, getResultType
close, functionIdentifier, open, toString
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
isDeterministic
public abstract T getValue(ACC accumulator)
accumulator
- the accumulator which contains the current aggregated results@Deprecated public boolean requiresOver()
getRequirements()
instead.true
if this AggregateFunction
can only be applied in an OVER
window.true
if the AggregateFunction
requires an OVER window,
false
otherwise.public final FunctionKind getKind()
FunctionDefinition
public TypeInference getTypeInference(DataTypeFactory typeFactory)
UserDefinedFunction
The type inference process is responsible for inferring unknown types of input arguments, validating input arguments, and producing result types. The type inference process happens independent of a function body. The output of the type inference is used to search for a corresponding runtime implementation.
Instances of type inference can be created by using TypeInference.newBuilder()
.
See BuiltInFunctionDefinitions
for concrete usage examples.
The type inference for user-defined functions is automatically extracted using reflection.
It does this by analyzing implementation methods such as eval() or accumulate()
and
the generic parameters of a function class if present. If the reflective information is not
sufficient, it can be supported and enriched with DataTypeHint
and FunctionHint
annotations.
Note: Overriding this method is only recommended for advanced users. If a custom type inference is specified, it is the responsibility of the implementer to make sure that the output of the type inference process matches with the implementation method:
The implementation method must comply with each DataType.getConversionClass()
returned by the type inference. For example, if DataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class)
is an expected argument type, the
method must accept a call eval(java.sql.Timestamp)
.
Regular Java calling semantics (including type widening and autoboxing) are applied when
calling an implementation method which means that the signature can be eval(java.lang.Object)
.
The runtime will take care of converting the data to the data format specified by the
DataType.getConversionClass()
coming from the type inference logic.
getTypeInference
in interface FunctionDefinition
getTypeInference
in class UserDefinedFunction
public Set<FunctionRequirement> getRequirements()
FunctionDefinition
Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.