Class

org.apache.flink.api.scala.utils

DataSetUtils

Related Doc: package utils

Permalink

implicit class DataSetUtils[T] extends AnyRef

This class provides simple utility methods for zipping elements in a data set with an index or with a unique identifier, sampling elements from a data set.

Annotations
@PublicEvolving()
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataSetUtils
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataSetUtils(self: DataSet[T])(implicit arg0: TypeInformation[T], arg1: ClassTag[T])

    Permalink

    self

    Data Set

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def checksumHashCode(): ChecksumHashCode

    Permalink

    Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).

    Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).

    returns

    A ChecksumHashCode with the count and checksum of elements in the data set.

    See also

    org.apache.flink.api.java.Utils.ChecksumHashCodeHelper

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def countElementsPerPartition: DataSet[(Int, Long)]

    Permalink

    Method that goes over all the elements in each partition in order to retrieve the total number of elements.

    Method that goes over all the elements in each partition in order to retrieve the total number of elements.

    returns

    a data set of tuple2 consisting of (subtask index, number of elements mappings)

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. def partitionByRange[K](distribution: DataDistribution, fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Range-partitions a DataSet using the specified key selector function.

  18. def partitionByRange(distribution: DataDistribution, firstField: String, otherFields: String*): DataSet[T]

    Permalink

    Range-partitions a DataSet on the specified fields.

  19. def partitionByRange(distribution: DataDistribution, fields: Int*): DataSet[T]

    Permalink

    Range-partitions a DataSet on the specified tuple field positions.

  20. def sample(withReplacement: Boolean, fraction: Double, seed: Long = Utils.RNG.nextLong()): DataSet[T]

    Permalink

    Generate a sample of DataSet by the probability fraction of each element.

    Generate a sample of DataSet by the probability fraction of each element.

    withReplacement

    Whether element can be selected more than once.

    fraction

    Probability that each element is chosen, should be [0,1] without replacement, and [0, ∞) with replacement. While fraction is larger than 1, the elements are expected to be selected multi times into sample on average.

    seed

    Random number generator seed.

    returns

    The sampled DataSet

  21. def sampleWithSize(withReplacement: Boolean, numSamples: Int, seed: Long = Utils.RNG.nextLong()): DataSet[T]

    Permalink

    Generate a sample of DataSet with fixed sample size.

    Generate a sample of DataSet with fixed sample size.

    NOTE: Sample with fixed size is not as efficient as sample with fraction, use sample with fraction unless you need exact precision.

    withReplacement

    Whether element can be selected more than once.

    numSamples

    The expected sample size.

    seed

    Random number generator seed.

    returns

    The sampled DataSet

  22. val self: DataSet[T]

    Permalink

    Data Set

  23. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  24. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  25. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. def zipWithIndex: DataSet[(Long, T)]

    Permalink

    Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.

    Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.

    returns

    a data set of tuple 2 consisting of consecutive ids and initial values.

  29. def zipWithUniqueId: DataSet[(Long, T)]

    Permalink

    Method that assigns a unique id to all the elements of the input data set.

    Method that assigns a unique id to all the elements of the input data set.

    returns

    a data set of tuple 2 consisting of ids and initial values.

Inherited from AnyRef

Inherited from Any

Ungrouped