public class KubernetesHaServices extends AbstractHaServices
AbstractHaServices
using Kubernetes.
All the HA information relevant for a specific component will be stored in a single ConfigMap. For example, the Dispatcher's ConfigMap would then contain the current leader, the running jobs and the pointers to the persisted JobGraphs. The JobManager's ConfigMap would then contain the current leader, the pointers to the checkpoints and the checkpoint ID counter.
The ConfigMap name will be created with the pattern "{clusterId}-{componentName}-leader". Given that the cluster id is configured to "k8s-ha-app1", then we could get the following ConfigMap names. e.g. k8s-ha-app1-restserver-leader, k8s-ha-app1-00000000000000000000000000000000-jobmanager-leader
Note that underline("_") is not allowed in Kubernetes ConfigMap name.
configuration, ioExecutor, logger
DEFAULT_JOB_ID, DEFAULT_LEADER_ID
Modifier and Type | Method and Description |
---|---|
CheckpointRecoveryFactory |
createCheckpointRecoveryFactory()
Create the checkpoint recovery factory for the job manager.
|
JobGraphStore |
createJobGraphStore()
Create the submitted job graph store for the job manager.
|
LeaderElectionService |
createLeaderElectionService(String leaderName)
Create leader election service with specified leaderName.
|
LeaderRetrievalService |
createLeaderRetrievalService(String leaderName)
Create leader retrieval service with specified leaderName.
|
RunningJobsRegistry |
createRunningJobsRegistry()
Create the registry that holds information about whether jobs are currently running.
|
protected String |
getLeaderNameForDispatcher()
Get the leader name for Dispatcher.
|
String |
getLeaderNameForJobManager(JobID jobID)
Get the leader name for specific JobManager.
|
protected String |
getLeaderNameForResourceManager()
Get the leader name for ResourceManager.
|
protected String |
getLeaderNameForRestServer()
Get the leader name for RestServer.
|
void |
internalCleanup()
Clean up the meta data in the distributed system(e.g.
|
void |
internalCleanupJobData(JobID jobID)
Clean up the meta data in the distributed system(e.g.
|
void |
internalClose()
Closes the components which is used for external operations(e.g.
|
cleanupJobData, close, closeAndCleanupAllData, createBlobStore, getCheckpointRecoveryFactory, getClusterRestEndpointLeaderElectionService, getClusterRestEndpointLeaderRetriever, getDispatcherLeaderElectionService, getDispatcherLeaderRetriever, getJobGraphStore, getJobManagerLeaderElectionService, getJobManagerLeaderRetriever, getJobManagerLeaderRetriever, getResourceManagerLeaderElectionService, getResourceManagerLeaderRetriever, getRunningJobsRegistry
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getWebMonitorLeaderElectionService, getWebMonitorLeaderRetriever
public LeaderElectionService createLeaderElectionService(String leaderName)
AbstractHaServices
createLeaderElectionService
in class AbstractHaServices
leaderName
- ConfigMap name in Kubernetes or child node path in Zookeeper.public LeaderRetrievalService createLeaderRetrievalService(String leaderName)
AbstractHaServices
createLeaderRetrievalService
in class AbstractHaServices
leaderName
- ConfigMap name in Kubernetes or child node path in Zookeeper.public CheckpointRecoveryFactory createCheckpointRecoveryFactory()
AbstractHaServices
createCheckpointRecoveryFactory
in class AbstractHaServices
public JobGraphStore createJobGraphStore() throws Exception
AbstractHaServices
createJobGraphStore
in class AbstractHaServices
Exception
- if the submitted job graph store could not be createdpublic RunningJobsRegistry createRunningJobsRegistry()
AbstractHaServices
createRunningJobsRegistry
in class AbstractHaServices
public void internalClose()
AbstractHaServices
internalClose
in class AbstractHaServices
public void internalCleanup() throws Exception
AbstractHaServices
If an exception occurs during internal cleanup, we will continue the cleanup in AbstractHaServices.closeAndCleanupAllData()
and report exceptions only after all cleanup steps have been
attempted.
internalCleanup
in class AbstractHaServices
Exception
- when do the cleanup operation on external storage.public void internalCleanupJobData(JobID jobID) throws Exception
AbstractHaServices
internalCleanupJobData
in class AbstractHaServices
jobID
- The identifier of the job to cleanup.Exception
- when do the cleanup operation on external storage.protected String getLeaderNameForResourceManager()
AbstractHaServices
getLeaderNameForResourceManager
in class AbstractHaServices
protected String getLeaderNameForDispatcher()
AbstractHaServices
getLeaderNameForDispatcher
in class AbstractHaServices
public String getLeaderNameForJobManager(JobID jobID)
AbstractHaServices
getLeaderNameForJobManager
in class AbstractHaServices
jobID
- job idprotected String getLeaderNameForRestServer()
AbstractHaServices
getLeaderNameForRestServer
in class AbstractHaServices
Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.