public class StochasticOutlierSelection extends Object implements Transformer<StochasticOutlierSelection>
For more information about SOS, see https://github.com/jeroenjanssens/sos J.H.M. Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik. Stochastic Outlier Selection. Technical Report TiCC TR 2012-001, Tilburg University, Tilburg, the Netherlands, 2012.
Modifier and Type | Class and Description |
---|---|
static class |
StochasticOutlierSelection.BreezeLabeledVector |
static class |
StochasticOutlierSelection.BreezeLabeledVector$ |
static class |
StochasticOutlierSelection.ErrorTolerance$ |
static class |
StochasticOutlierSelection.MaxIterations$ |
static class |
StochasticOutlierSelection.Perplexity$ |
Constructor and Description |
---|
StochasticOutlierSelection() |
Modifier and Type | Method and Description |
---|---|
static StochasticOutlierSelection |
apply() |
static breeze.linalg.Vector<Object> |
binarySearch(breeze.linalg.Vector<Object> dissimilarityVector,
double logPerplexity,
int maxIterations,
double tolerance,
double beta,
double betaMin,
double betaMax,
int iteration)
Performs a binary search to get affinities in such a way that each conditional Gaussian has
the same perplexity.
|
static <P extends Predictor<P>> |
chainPredictor(P predictor) |
static <T extends Transformer<T>> |
chainTransformer(T transformer) |
static DataSet<StochasticOutlierSelection.BreezeLabeledVector> |
computeAffinity(DataSet<StochasticOutlierSelection.BreezeLabeledVector> dissimilarityVectors,
ParameterMap resultingParameters)
Approximate the affinity by fitting a Gaussian-like function
|
static DataSet<StochasticOutlierSelection.BreezeLabeledVector> |
computeBindingProbabilities(DataSet<StochasticOutlierSelection.BreezeLabeledVector> affinityVectors)
Normalizes the input vectors so each row sums up to one.
|
static DataSet<StochasticOutlierSelection.BreezeLabeledVector> |
computeDissimilarityVectors(DataSet<StochasticOutlierSelection.BreezeLabeledVector> inputVectors)
Compute pair-wise distance from each vector, to all other vectors.
|
static DataSet<scala.Tuple2<Object,Object>> |
computeOutlierProbability(DataSet<StochasticOutlierSelection.BreezeLabeledVector> bindingProbabilityVectors)
Compute the final outlier probability by taking the product of the column.
|
static <Training> void |
fit(DataSet<Training> training,
ParameterMap fitParameters,
FitOperation<Self,Training> fitOperation) |
static <Training> ParameterMap |
fit$default$2() |
static ParameterMap |
parameters() |
StochasticOutlierSelection |
setErrorTolerance(double errorTolerance)
The accepted error tolerance to reduce computational time when approximating the affinity.
|
StochasticOutlierSelection |
setMaxIterations(int maxIterations)
The maximum number of iterations to approximate the affinity of the algorithm.
|
StochasticOutlierSelection |
setPerplexity(double perplexity)
Sets the perplexity of the outlier selection algorithm, can be seen as the k of kNN
For more information, please read the Stochastic Outlier Selection algorithm technical paper.
|
static <Input,Output> |
transform(DataSet<Input> input,
ParameterMap transformParameters,
TransformDataSetOperation<Self,Input,Output> transformOperation) |
static <Input,Output> |
transform$default$2() |
static Object |
transformLabeledVectors() |
static <T extends Vector> |
transformVectors(BreezeVectorConverter<T> evidence$1,
TypeInformation<T> evidence$2,
scala.reflect.ClassTag<T> evidence$3)
TransformDataSetOperation applies the stochastic outlier selection algorithm on a
Vector which will transform the high-dimensionaly input to a single Double output. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
chainPredictor, chainTransformer, transform
parameters
public static StochasticOutlierSelection apply()
public static Object transformLabeledVectors()
public static <T extends Vector> Object transformVectors(BreezeVectorConverter<T> evidence$1, TypeInformation<T> evidence$2, scala.reflect.ClassTag<T> evidence$3)
TransformDataSetOperation
applies the stochastic outlier selection algorithm on a
Vector
which will transform the high-dimensionaly input to a single Double output.
evidence$1
- (undocumented)evidence$2
- (undocumented)evidence$3
- (undocumented)TransformDataSetOperation
a single double which represents the oulierness of
the input vectors, where the output is in [0, 1]public static DataSet<StochasticOutlierSelection.BreezeLabeledVector> computeDissimilarityVectors(DataSet<StochasticOutlierSelection.BreezeLabeledVector> inputVectors)
inputVectors
- The input vectors, will compare the vector to all other vectors based
on an distance method.StochasticOutlierSelection.BreezeLabeledVector
with dissimilarity vectorpublic static DataSet<StochasticOutlierSelection.BreezeLabeledVector> computeAffinity(DataSet<StochasticOutlierSelection.BreezeLabeledVector> dissimilarityVectors, ParameterMap resultingParameters)
dissimilarityVectors
- The dissimilarity vectors which represents the distance to the
other vectors in the data set.resultingParameters
- The user defined parameters of the algorithmStochasticOutlierSelection.BreezeLabeledVector
with dissimilarity vectorpublic static DataSet<StochasticOutlierSelection.BreezeLabeledVector> computeBindingProbabilities(DataSet<StochasticOutlierSelection.BreezeLabeledVector> affinityVectors)
affinityVectors
- The affinity vectors which is the quantification of the relationship
between the original vectors.StochasticOutlierSelection.BreezeLabeledVector
with represents the binding
probabilities, which is in fact the affinity where each row sums up to one.public static DataSet<scala.Tuple2<Object,Object>> computeOutlierProbability(DataSet<StochasticOutlierSelection.BreezeLabeledVector> bindingProbabilityVectors)
bindingProbabilityVectors
- The binding probability vectors where the binding
probability is based on the affinity and represents the
probability of a vector binding with another vector.public static breeze.linalg.Vector<Object> binarySearch(breeze.linalg.Vector<Object> dissimilarityVector, double logPerplexity, int maxIterations, double tolerance, double beta, double betaMin, double betaMax, int iteration)
dissimilarityVector
- The input dissimilarity vector which represents the current
vector distance to the other vectors in the data setlogPerplexity
- The log of the perplexity, which represents the probability of having
affinity with another vector.maxIterations
- The maximum iterations to limit the computational time.tolerance
- The allowed tolerance to sacrifice precision for decreased computational
time.beta:
- The current betabetaMin
- The lower bound of betabetaMax
- The upper bound of betaiteration
- The current iterationpublic static ParameterMap parameters()
public static <Training> void fit(DataSet<Training> training, ParameterMap fitParameters, FitOperation<Self,Training> fitOperation)
public static <Training> ParameterMap fit$default$2()
public static <Input,Output> DataSet<Output> transform(DataSet<Input> input, ParameterMap transformParameters, TransformDataSetOperation<Self,Input,Output> transformOperation)
public static <T extends Transformer<T>> ChainedTransformer<Self,T> chainTransformer(T transformer)
public static <P extends Predictor<P>> ChainedPredictor<Self,P> chainPredictor(P predictor)
public static <Input,Output> ParameterMap transform$default$2()
public StochasticOutlierSelection setPerplexity(double perplexity)
perplexity
- the perplexity of the affinity fitpublic StochasticOutlierSelection setErrorTolerance(double errorTolerance)
errorTolerance
- the accepted error tolerance with respect to the affinitypublic StochasticOutlierSelection setMaxIterations(int maxIterations)
maxIterations
- the maximum number of iterations.Copyright © 2014–2018 The Apache Software Foundation. All rights reserved.