public class CheckpointCoordinator extends Object
Depending on the configured RecoveryMode
, the behaviour of the CompletedCheckpointStore
and CheckpointIDCounter
change. The default standalone
implementations don't support any recovery.
Modifier and Type | Field and Description |
---|---|
protected CheckpointIDCounter |
checkpointIdCounter
Checkpoint ID counter to ensure ascending IDs.
|
protected Object |
lock
Coordinator-wide lock to safeguard the checkpoint updates
|
protected int |
numberKeyGroups |
Constructor and Description |
---|
CheckpointCoordinator(JobID job,
long baseInterval,
long checkpointTimeout,
int numberKeyGroups,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
ClassLoader userClassLoader,
CheckpointIDCounter checkpointIDCounter,
CompletedCheckpointStore completedCheckpointStore,
RecoveryMode recoveryMode,
Executor executor) |
CheckpointCoordinator(JobID job,
long baseInterval,
long checkpointTimeout,
long minPauseBetweenCheckpoints,
int maxConcurrentCheckpointAttempts,
int numberKeyGroups,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
ClassLoader userClassLoader,
CheckpointIDCounter checkpointIDCounter,
CompletedCheckpointStore completedCheckpointStore,
RecoveryMode recoveryMode,
CheckpointStatsTracker statsTracker,
Executor executor) |
Modifier and Type | Method and Description |
---|---|
ActorGateway |
createActivatorDeactivator(akka.actor.ActorSystem actorSystem,
UUID leaderSessionID) |
protected List<Set<Integer>> |
createKeyGroupPartitions(int numberKeyGroups,
int parallelism)
Groups the available set of key groups into key group partitions.
|
protected long |
getAndIncrementCheckpointId() |
protected ActorGateway |
getJobStatusListener() |
int |
getNumberOfPendingCheckpoints() |
int |
getNumberOfRetainedSuccessfulCheckpoints() |
Map<Long,PendingCheckpoint> |
getPendingCheckpoints() |
List<CompletedCheckpoint> |
getSuccessfulCheckpoints() |
boolean |
isShutdown() |
protected void |
onCancelCheckpoint(long canceledCheckpointId)
Callback on cancellation of a checkpoint.
|
protected void |
onFullyAcknowledgedCheckpoint(CompletedCheckpoint checkpoint)
Callback on full acknowledgement of a checkpoint.
|
protected void |
onShutdown()
Callback on shutdown of the coordinator.
|
boolean |
receiveAcknowledgeMessage(AcknowledgeCheckpoint message)
Receives an AcknowledgeCheckpoint message and returns whether the
message was associated with a pending checkpoint.
|
boolean |
receiveDeclineMessage(DeclineCheckpoint message)
Receives a
DeclineCheckpoint message and returns whether the
message was associated with a pending checkpoint. |
boolean |
restoreLatestCheckpointedState(Map<JobVertexID,ExecutionJobVertex> tasks,
boolean errorIfNoCheckpoint,
boolean allOrNothingState) |
protected void |
setJobStatusListener(ActorGateway jobStatusListener) |
void |
shutdown()
Shuts down the checkpoint coordinator.
|
void |
startCheckpointScheduler() |
void |
stopCheckpointScheduler() |
void |
suspend()
Suspends the checkpoint coordinator.
|
boolean |
triggerCheckpoint(long timestamp)
Triggers a new checkpoint and uses the given timestamp as the checkpoint
timestamp.
|
boolean |
triggerCheckpoint(long timestamp,
long nextCheckpointId)
Triggers a new checkpoint and uses the given timestamp as the checkpoint
timestamp.
|
protected final Object lock
protected final CheckpointIDCounter checkpointIdCounter
protected final int numberKeyGroups
public CheckpointCoordinator(JobID job, long baseInterval, long checkpointTimeout, int numberKeyGroups, ExecutionVertex[] tasksToTrigger, ExecutionVertex[] tasksToWaitFor, ExecutionVertex[] tasksToCommitTo, ClassLoader userClassLoader, CheckpointIDCounter checkpointIDCounter, CompletedCheckpointStore completedCheckpointStore, RecoveryMode recoveryMode, Executor executor)
public CheckpointCoordinator(JobID job, long baseInterval, long checkpointTimeout, long minPauseBetweenCheckpoints, int maxConcurrentCheckpointAttempts, int numberKeyGroups, ExecutionVertex[] tasksToTrigger, ExecutionVertex[] tasksToWaitFor, ExecutionVertex[] tasksToCommitTo, ClassLoader userClassLoader, CheckpointIDCounter checkpointIDCounter, CompletedCheckpointStore completedCheckpointStore, RecoveryMode recoveryMode, CheckpointStatsTracker statsTracker, Executor executor)
protected void onShutdown()
protected void onCancelCheckpoint(long canceledCheckpointId)
protected void onFullyAcknowledgedCheckpoint(CompletedCheckpoint checkpoint)
public void shutdown() throws Exception
After this method has been called, the coordinator does not accept and further messages and cannot trigger any further checkpoints. All checkpoint state is discarded.
Exception
public void suspend() throws Exception
After this method has been called, the coordinator does not accept and further messages and cannot trigger any further checkpoints.
The difference to shutdown is that checkpoint state in the store and counter is kept around if possible to recover later.
Exception
public boolean isShutdown()
public boolean triggerCheckpoint(long timestamp) throws Exception
timestamp
- The timestamp for the checkpoint.Exception
public boolean triggerCheckpoint(long timestamp, long nextCheckpointId)
timestamp
- The timestamp for the checkpoint.nextCheckpointId
- The checkpoint ID to use for this checkpoint or -1
if
the checkpoint ID counter should be queried.public boolean receiveDeclineMessage(DeclineCheckpoint message)
DeclineCheckpoint
message and returns whether the
message was associated with a pending checkpoint.message
- Checkpoint decline from the task managerpublic boolean receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws CheckpointException
message
- Checkpoint ack from the task managerException
- If the checkpoint cannot be added to the completed checkpoint store.CheckpointException
public boolean restoreLatestCheckpointedState(Map<JobVertexID,ExecutionJobVertex> tasks, boolean errorIfNoCheckpoint, boolean allOrNothingState) throws Exception
Exception
protected List<Set<Integer>> createKeyGroupPartitions(int numberKeyGroups, int parallelism)
numberKeyGroups
- Number of available key groups (indexed from 0 to numberKeyGroups - 1)parallelism
- Parallelism to generate the key group partitioning forpublic int getNumberOfPendingCheckpoints()
public int getNumberOfRetainedSuccessfulCheckpoints()
public Map<Long,PendingCheckpoint> getPendingCheckpoints()
public List<CompletedCheckpoint> getSuccessfulCheckpoints() throws Exception
Exception
protected long getAndIncrementCheckpointId()
protected ActorGateway getJobStatusListener()
protected void setJobStatusListener(ActorGateway jobStatusListener)
public ActorGateway createActivatorDeactivator(akka.actor.ActorSystem actorSystem, UUID leaderSessionID)
Copyright © 2014–2017 The Apache Software Foundation. All rights reserved.