Data Set
Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).
Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).
A ChecksumHashCode with the count and checksum of elements in the data set.
org.apache.flink.api.java.Utils.ChecksumHashCodeHelper
Method that goes over all the elements in each partition in order to retrieve the total number of elements.
Method that goes over all the elements in each partition in order to retrieve the total number of elements.
a data set of tuple2 consisting of (subtask index, number of elements mappings)
Range-partitions a DataSet using the specified key selector function.
Range-partitions a DataSet on the specified fields.
Range-partitions a DataSet on the specified tuple field positions.
Generate a sample of DataSet by the probability fraction of each element.
Generate a sample of DataSet by the probability fraction of each element.
Whether element can be selected more than once.
Probability that each element is chosen, should be [0,1] without replacement, and [0, ∞) with replacement. While fraction is larger than 1, the elements are expected to be selected multi times into sample on average.
Random number generator seed.
The sampled DataSet
Generate a sample of DataSet with fixed sample size.
Generate a sample of DataSet with fixed sample size.
NOTE: Sample with fixed size is not as efficient as sample with fraction, use sample with fraction unless you need exact precision.
Whether element can be selected more than once.
The expected sample size.
Random number generator seed.
The sampled DataSet
Data Set
Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.
Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.
a data set of tuple 2 consisting of consecutive ids and initial values.
Method that assigns a unique id to all the elements of the input data set.
Method that assigns a unique id to all the elements of the input data set.
a data set of tuple 2 consisting of ids and initial values.
This class provides simple utility methods for zipping elements in a data set with an index or with a unique identifier, sampling elements from a data set.