org.apache.flink.api.scala.utils

DataSetUtils

implicit class DataSetUtils[T] extends AnyRef

This class provides simple utility methods for zipping elements in a data set with an index or with a unique identifier, sampling elements from a data set.

Annotations
@PublicEvolving()
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DataSetUtils
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataSetUtils(self: DataSet[T])(implicit arg0: TypeInformation[T], arg1: ClassTag[T])

    self

    Data Set

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def checksumHashCode(): ChecksumHashCode

    Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).

    Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).

    returns

    A ChecksumHashCode with the count and checksum of elements in the data set.

    See also

    org.apache.flink.api.java.Utils.ChecksumHashCodeHelper

  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def countElementsPerPartition: DataSet[(Int, Long)]

    Method that goes over all the elements in each partition in order to retrieve the total number of elements.

    Method that goes over all the elements in each partition in order to retrieve the total number of elements.

    returns

    a data set of tuple2 consisting of (subtask index, number of elements mappings)

  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. def partitionByRange[K](distribution: DataDistribution, fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]

    Range-partitions a DataSet using the specified key selector function.

  20. def partitionByRange(distribution: DataDistribution, firstField: String, otherFields: String*): DataSet[T]

    Range-partitions a DataSet on the specified fields.

  21. def partitionByRange(distribution: DataDistribution, fields: Int*): DataSet[T]

    Range-partitions a DataSet on the specified tuple field positions.

  22. def sample(withReplacement: Boolean, fraction: Double, seed: Long = Utils.RNG.nextLong()): DataSet[T]

    Generate a sample of DataSet by the probability fraction of each element.

    Generate a sample of DataSet by the probability fraction of each element.

    withReplacement

    Whether element can be selected more than once.

    fraction

    Probability that each element is chosen, should be [0,1] without replacement, and [0, ∞) with replacement. While fraction is larger than 1, the elements are expected to be selected multi times into sample on average.

    seed

    Random number generator seed.

    returns

    The sampled DataSet

  23. def sampleWithSize(withReplacement: Boolean, numSamples: Int, seed: Long = Utils.RNG.nextLong()): DataSet[T]

    Generate a sample of DataSet with fixed sample size.

    Generate a sample of DataSet with fixed sample size.

    NOTE: Sample with fixed size is not as efficient as sample with fraction, use sample with fraction unless you need exact precision.

    withReplacement

    Whether element can be selected more than once.

    numSamples

    The expected sample size.

    seed

    Random number generator seed.

    returns

    The sampled DataSet

  24. val self: DataSet[T]

    Data Set

  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  26. def toString(): String

    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. def zipWithIndex: DataSet[(Long, T)]

    Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.

    Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.

    returns

    a data set of tuple 2 consisting of consecutive ids and initial values.

  31. def zipWithUniqueId: DataSet[(Long, T)]

    Method that assigns a unique id to all the elements of the input data set.

    Method that assigns a unique id to all the elements of the input data set.

    returns

    a data set of tuple 2 consisting of ids and initial values.

Inherited from AnyRef

Inherited from Any

Ungrouped