public class Splitter extends Object
Modifier and Type | Class and Description |
---|---|
static class |
Splitter.TrainTestDataSet<T> |
static class |
Splitter.TrainTestDataSet$ |
static class |
Splitter.TrainTestHoldoutDataSet<T> |
static class |
Splitter.TrainTestHoldoutDataSet$ |
Constructor and Description |
---|
Splitter() |
Modifier and Type | Method and Description |
---|---|
static <T> Splitter.TrainTestDataSet<T>[] |
kFoldSplit(DataSet<T> input,
int kFolds,
long seed,
TypeInformation<T> evidence$9,
scala.reflect.ClassTag<T> evidence$10)
Split a DataSet into an array of TrainTest DataSets
|
static <T> DataSet<T>[] |
multiRandomSplit(DataSet<T> input,
double[] fracArray,
long seed,
TypeInformation<T> evidence$7,
scala.reflect.ClassTag<T> evidence$8)
Split a DataSet by the probability fraction of each element of a vector.
|
static <T> DataSet<T>[] |
randomSplit(DataSet<T> input,
double fraction,
boolean precise,
long seed,
TypeInformation<T> evidence$5,
scala.reflect.ClassTag<T> evidence$6)
Split a DataSet by the probability fraction of each element.
|
static <T> Splitter.TrainTestHoldoutDataSet<T> |
trainTestHoldoutSplit(DataSet<T> input,
scala.Tuple3<Object,Object,Object> fracTuple,
long seed,
TypeInformation<T> evidence$13,
scala.reflect.ClassTag<T> evidence$14)
A wrapper for multiRandomSplit that yields a TrainTestHoldoutDataSet
|
static <T> Splitter.TrainTestDataSet<T> |
trainTestSplit(DataSet<T> input,
double fraction,
boolean precise,
long seed,
TypeInformation<T> evidence$11,
scala.reflect.ClassTag<T> evidence$12)
A wrapper for randomSplit that yields a TrainTestDataSet
|
public static <T> DataSet<T>[] randomSplit(DataSet<T> input, double fraction, boolean precise, long seed, TypeInformation<T> evidence$5, scala.reflect.ClassTag<T> evidence$6)
input
- DataSet to be splitfraction
- Probability that each element is chosen, should be [0,1] This fraction
refers to the first element in the resulting array.precise
- Sampling by default is random and can result in slightly lop-sided
sample sets. When precise is true, equal sample set size are forced,
however this is somewhat less efficient.seed
- Random number generator seed.public static <T> DataSet<T>[] multiRandomSplit(DataSet<T> input, double[] fracArray, long seed, TypeInformation<T> evidence$7, scala.reflect.ClassTag<T> evidence$8)
input
- DataSet to be splitfracArray
- An array of PROPORTIONS for splitting the DataSet. Unlike the
randomSplit function, number greater than 1 do not lead to over
sampling. The number of splits is dictated by the length of this array.
The number are normalized, eg. Array(1.0, 2.0) would yield
two data sets with a 33/66% split.seed
- Random number generator seed.public static <T> Splitter.TrainTestDataSet<T>[] kFoldSplit(DataSet<T> input, int kFolds, long seed, TypeInformation<T> evidence$9, scala.reflect.ClassTag<T> evidence$10)
input
- DataSet to be splitkFolds
- The number of TrainTest DataSets to be returns. Each 'testing' will be
1/k of the dataset, randomly sampled, the training will be the remainder
of the dataset. The DataSet is split into kFolds first, so that no
observation will occurin in multiple folds.seed
- Random number generator seed.public static <T> Splitter.TrainTestDataSet<T> trainTestSplit(DataSet<T> input, double fraction, boolean precise, long seed, TypeInformation<T> evidence$11, scala.reflect.ClassTag<T> evidence$12)
input
- DataSet to be splitfraction
- Probability that each element is chosen, should be [0,1].
This fraction refers to the training element in TrainTestSplitprecise
- Sampling by default is random and can result in slightly lop-sided
sample sets. When precise is true, equal sample set size are forced,
however this is somewhat less efficient.seed
- Random number generator seed.public static <T> Splitter.TrainTestHoldoutDataSet<T> trainTestHoldoutSplit(DataSet<T> input, scala.Tuple3<Object,Object,Object> fracTuple, long seed, TypeInformation<T> evidence$13, scala.reflect.ClassTag<T> evidence$14)
input
- DataSet to be splitfracTuple
- A tuple of three doubles, where the first element specifies the
size of the training set, the second element the testing set, and
the third element is the holdout set. These are proportional and
will be normalized internally.seed
- Random number generator seed.Copyright © 2014–2017 The Apache Software Foundation. All rights reserved.