Packages

implicit class DataSetUtils[T] extends AnyRef

This class provides simple utility methods for zipping elements in a data set with an index or with a unique identifier, sampling elements from a data set.

Annotations
@PublicEvolving()
Deprecated

All Flink Scala APIs are deprecated and will be removed in a future Flink major version. You can still build your application in Scala, but you should move to the Java version of either the DataStream and/or Table API.

See also

FLIP-265 Deprecate and remove Scala API support

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataSetUtils
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataSetUtils(self: DataSet[T])(implicit arg0: TypeInformation[T], arg1: ClassTag[T])

    self

    Data Set

    Deprecated

    All Flink Scala APIs are deprecated and will be removed in a future Flink major version. You can still build your application in Scala, but you should move to the Java version of either the DataStream and/or Table API.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def checksumHashCode(): ChecksumHashCode

    Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).

    Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).

    returns

    A ChecksumHashCode with the count and checksum of elements in the data set.

    See also

    org.apache.flink.api.java.Utils.ChecksumHashCodeHelper

  6. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  7. def countElementsPerPartition: DataSet[(Int, Long)]

    Method that goes over all the elements in each partition in order to retrieve the total number of elements.

    Method that goes over all the elements in each partition in order to retrieve the total number of elements.

    returns

    a data set of tuple2 consisting of (subtask index, number of elements mappings)

  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  16. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. def partitionByRange[K](distribution: DataDistribution, fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]

    Range-partitions a DataSet using the specified key selector function.

  18. def partitionByRange(distribution: DataDistribution, firstField: String, otherFields: String*): DataSet[T]

    Range-partitions a DataSet on the specified fields.

  19. def partitionByRange(distribution: DataDistribution, fields: Int*): DataSet[T]

    Range-partitions a DataSet on the specified tuple field positions.

  20. def sample(withReplacement: Boolean, fraction: Double, seed: Long = Utils.RNG.nextLong()): DataSet[T]

    Generate a sample of DataSet by the probability fraction of each element.

    Generate a sample of DataSet by the probability fraction of each element.

    withReplacement

    Whether element can be selected more than once.

    fraction

    Probability that each element is chosen, should be [0,1] without replacement, and [0, ∞) with replacement. While fraction is larger than 1, the elements are expected to be selected multi times into sample on average.

    seed

    Random number generator seed.

    returns

    The sampled DataSet

  21. def sampleWithSize(withReplacement: Boolean, numSamples: Int, seed: Long = Utils.RNG.nextLong()): DataSet[T]

    Generate a sample of DataSet with fixed sample size.

    Generate a sample of DataSet with fixed sample size.

    NOTE: Sample with fixed size is not as efficient as sample with fraction, use sample with fraction unless you need exact precision.

    withReplacement

    Whether element can be selected more than once.

    numSamples

    The expected sample size.

    seed

    Random number generator seed.

    returns

    The sampled DataSet

  22. val self: DataSet[T]
  23. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  24. def toString(): String
    Definition Classes
    AnyRef → Any
  25. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  28. def zipWithIndex: DataSet[(Long, T)]

    Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.

    Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.

    returns

    a data set of tuple 2 consisting of consecutive ids and initial values.

  29. def zipWithUniqueId: DataSet[(Long, T)]

    Method that assigns a unique id to all the elements of the input data set.

    Method that assigns a unique id to all the elements of the input data set.

    returns

    a data set of tuple 2 consisting of ids and initial values.

Inherited from AnyRef

Inherited from Any

Ungrouped