public class ALS extends Object implements Predictor<ALS>
Given a matrix R
, ALS calculates two matricess U
and V
such that R ~~ U^TV
. The
unknown row dimension is given by the number of latent factors. Since matrix factorization
is often used in the context of recommendation, we'll call the first matrix the user and the
second matrix the item matrix. The i
th column of the user matrix is u_i
and the i
th
column of the item matrix is v_i
. The matrix R
is called the ratings matrix and
(R)_{i,j} = r_{i,j}
.
In order to find the user and item matrix, the following problem is solved:
argmin_{U,V} sum_(i,j\ with\ r_{i,j} != 0) (r_{i,j} - u_{i}^Tv_{j})^2 +
lambda (sum_(i) n_{u_i} ||u_i||^2 + sum_(j) n_{v_j} ||v_j||^2)
with \lambda
being the regularization factor, n_{u_i}
being the number of items the user i
has rated and n_{v_j}
being the number of times the item j
has been rated. This
regularization scheme to avoid overfitting is called weighted-lambda-regularization. Details
can be found in the work of Zhou et al.
.
By fixing one of the matrices U
or V
one obtains a quadratic form which can be solved. The
solution of the modified problem is guaranteed to decrease the overall cost function. By
applying this step alternately to the matrices U
and V
, we can iteratively improve the
matrix factorization.
The matrix R
is given in its sparse representation as a tuple of (i, j, r)
where i
is the
row index, j
is the column index and r
is the matrix value at position (i,j)
.
Modifier and Type | Class and Description |
---|---|
static class |
ALS.BlockedFactorization |
static class |
ALS.BlockedFactorization$ |
static class |
ALS.BlockIDGenerator |
static class |
ALS.BlockIDPartitioner |
static class |
ALS.BlockRating |
static class |
ALS.BlockRating$ |
static class |
ALS.Blocks$ |
static class |
ALS.Factorization |
static class |
ALS.Factorization$ |
static class |
ALS.Factors
Latent factor model vector
|
static class |
ALS.Factors$ |
static class |
ALS.InBlockInformation |
static class |
ALS.InBlockInformation$ |
static class |
ALS.Iterations$ |
static class |
ALS.Lambda$ |
static class |
ALS.NumFactors$ |
static class |
ALS.OutBlockInformation |
static class |
ALS.OutBlockInformation$ |
static class |
ALS.OutLinks |
static class |
ALS.Rating
Representation of a user-item rating
|
static class |
ALS.Rating$ |
static class |
ALS.Seed$ |
static class |
ALS.TemporaryPath$ |
Constructor and Description |
---|
ALS() |
Modifier and Type | Method and Description |
---|---|
static ALS |
apply() |
static scala.Tuple2<DataSet<scala.Tuple2<Object,ALS.InBlockInformation>>,DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>>> |
createBlockInformation(int userBlocks,
int itemBlocks,
DataSet<scala.Tuple2<Object,ALS.Rating>> ratings,
ALS.BlockIDPartitioner blockIDPartitioner)
Creates the meta information needed to route the item and user vectors to the respective user
and item blocks.
|
static DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> |
createInBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings,
DataSet<scala.Tuple2<Object,int[]>> usersPerBlock,
ALS.BlockIDGenerator blockIDGenerator)
Creates the incoming block information
|
static DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> |
createOutBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings,
DataSet<scala.Tuple2<Object,int[]>> usersPerBlock,
int itemBlocks,
ALS.BlockIDGenerator blockIDGenerator)
Creates the outgoing block information
|
static DataSet<scala.Tuple2<Object,int[]>> |
createUsersPerBlock(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings)
Calculates the userIDs in ascending order of each user block
|
DataSet<Object> |
empiricalRisk(DataSet<scala.Tuple3<Object,Object,Object>> labeledData,
ParameterMap riskParameters)
Empirical risk of the trained model (matrix factorization).
|
scala.Option<scala.Tuple2<DataSet<ALS.Factors>,DataSet<ALS.Factors>>> |
factorsOption() |
static Object |
fitALS()
Calculates the matrix factorization for the given ratings.
|
static void |
generateFullMatrix(double[] triangularMatrix,
double[] fullMatrix,
int factors) |
static DataSet<ALS.Factors> |
generateRandomMatrix(DataSet<Object> users,
int factors,
long seed) |
static String |
ITEM_FACTORS_FILE() |
static void |
outerProduct(double[] vector,
double[] matrix,
int factors) |
static Object |
predictRating()
Predict operation which calculates the matrix entry for the given indices
|
static double[] |
randomFactors(int factors,
scala.util.Random random) |
ALS |
setBlocks(int blocks)
Sets the number of blocks into which the user and item matrix shall be partitioned
|
ALS |
setIterations(int iterations)
Sets the number of iterations of the ALS algorithm
|
ALS |
setLambda(double lambda)
Sets the regularization coefficient lambda
|
ALS |
setNumFactors(int numFactors)
Sets the number of latent factors/row dimension of the latent model
|
ALS |
setSeed(long seed)
Sets the random seed for the initial item matrix initialization
|
ALS |
setTemporaryPath(String temporaryPath)
Sets the temporary path into which intermediate results are written in order to increase
performance.
|
static DataSet<ALS.Factors> |
unblock(DataSet<scala.Tuple2<Object,double[][]>> users,
DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> outInfo,
ALS.BlockIDPartitioner blockIDPartitioner)
Unblocks the blocked user and item matrix representation so that it is at DataSet of
column vectors.
|
static DataSet<scala.Tuple2<Object,double[][]>> |
updateFactors(int numUserBlocks,
DataSet<scala.Tuple2<Object,double[][]>> items,
DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> itemOut,
DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> userIn,
int factors,
double lambda,
Partitioner<Object> blockIDPartitioner)
Calculates a single half step of the ALS optimization.
|
static String |
USER_FACTORS_FILE() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
parameters
public static String USER_FACTORS_FILE()
public static String ITEM_FACTORS_FILE()
public static ALS apply()
public static Object predictRating()
public static Object fitALS()
public static DataSet<scala.Tuple2<Object,double[][]>> updateFactors(int numUserBlocks, DataSet<scala.Tuple2<Object,double[][]>> items, DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> itemOut, DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> userIn, int factors, double lambda, Partitioner<Object> blockIDPartitioner)
numUserBlocks
- Number of blocks in the respective dimensionitems
- Fixed matrix value for the half stepitemOut
- Out information to know where to send the vectorsuserIn
- In information for the cogroup stepfactors
- Number of latent factorslambda
- Regularization constantblockIDPartitioner
- Custom Flink partitionerpublic static scala.Tuple2<DataSet<scala.Tuple2<Object,ALS.InBlockInformation>>,DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>>> createBlockInformation(int userBlocks, int itemBlocks, DataSet<scala.Tuple2<Object,ALS.Rating>> ratings, ALS.BlockIDPartitioner blockIDPartitioner)
itemBlocks
- ratings
- blockIDPartitioner
- public static DataSet<scala.Tuple2<Object,int[]>> createUsersPerBlock(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings)
ratings
- public static DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> createOutBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings, DataSet<scala.Tuple2<Object,int[]>> usersPerBlock, int itemBlocks, ALS.BlockIDGenerator blockIDGenerator)
Creates for every user block the outgoing block information. The out block information
contains for every item block a BitSet
which indicates which
user vector has to be sent to this block. If a vector v has to be sent to a block b, then
bitsets(b)'s bit v is set to 1, otherwise 0. Additionally the user IDataSet are replaced by
the user vector's index value.
ratings
- usersPerBlock
- itemBlocks
- blockIDGenerator
- public static DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> createInBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings, DataSet<scala.Tuple2<Object,int[]>> usersPerBlock, ALS.BlockIDGenerator blockIDGenerator)
Creates for every user block the incoming block information. The incoming block information contains the userIDs of the users in the respective block and for every item block a BlockRating instance. The BlockRating instance describes for every incoming set of item vectors of an item block, which user rated these items and what the rating was. For that purpose it contains for every incoming item vector a tuple of an id array us and a rating array rs. The array us contains the indices of the users having rated the respective item vector with the ratings in rs.
ratings
- usersPerBlock
- blockIDGenerator
- public static DataSet<ALS.Factors> unblock(DataSet<scala.Tuple2<Object,double[][]>> users, DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> outInfo, ALS.BlockIDPartitioner blockIDPartitioner)
users
- outInfo
- blockIDPartitioner
- public static void outerProduct(double[] vector, double[] matrix, int factors)
public static void generateFullMatrix(double[] triangularMatrix, double[] fullMatrix, int factors)
public static DataSet<ALS.Factors> generateRandomMatrix(DataSet<Object> users, int factors, long seed)
public static double[] randomFactors(int factors, scala.util.Random random)
public scala.Option<scala.Tuple2<DataSet<ALS.Factors>,DataSet<ALS.Factors>>> factorsOption()
public ALS setNumFactors(int numFactors)
numFactors
- public ALS setLambda(double lambda)
lambda
- public ALS setIterations(int iterations)
iterations
- public ALS setBlocks(int blocks)
blocks
- public ALS setSeed(long seed)
seed
- public ALS setTemporaryPath(String temporaryPath)
temporaryPath
- public DataSet<Object> empiricalRisk(DataSet<scala.Tuple3<Object,Object,Object>> labeledData, ParameterMap riskParameters)
labeledData
- Reference datariskParameters
- Additional parameters for the empirical risk calculationCopyright © 2014–2017 The Apache Software Foundation. All rights reserved.