This page describes how to deploy a Flink session cluster natively on Kubernetes.
~/.kube/config. You can verify permissions by running
kubectl auth can-i <list|create|edit|delete> pods.
Follow these instructions to start a Flink Session within your Kubernetes cluster.
A session will start all required Flink services (JobManager and TaskManagers) so that you can submit programs to the cluster. Note that you can run multiple programs per session.
All the Kubernetes configuration options can be found in our configuration guide.
Example: Issue the following command to start a session cluster with 4 GB of memory and 2 CPUs with 4 slots per TaskManager:
In this example we override the
resourcemanager.taskmanager-timeout setting to make
the pods with task managers remain for a longer period than the default of 30 seconds.
Although this setting may cause more cloud cost it has the effect that starting new jobs is in some scenarios
faster and during development you have more time to inspect the logfiles of your job.
The system will use the configuration in
Please follow our configuration guide if you want to change something.
If you do not specify a particular name for your session by
kubernetes.cluster-id, the Flink client will generate a UUID name.
Note A docker image with Python and PyFlink installed is required if you are going to start a session cluster for Python Flink Jobs. Please refer to the following section.
If you want to use a custom Docker image to deploy Flink containers, check the Flink Docker image documentation,
its tags, how to customize the Flink Docker image and enable plugins.
If you created a custom Docker image you can provide it by setting the
kubernetes.container.image configuration option:
To build a custom image which has Python and Pyflink prepared, you can refer to the following Dockerfile:
Build the image named as pyflink:latest:
Then you are able to start a PyFlink session cluster by setting the
configuration option value to be the name of custom image:
Use the following command to submit a Flink Job to the Kubernetes cluster.
Use the following command to submit a PyFlink Job to the Kubernetes cluster.
There are several ways to expose a Service onto an external (outside of your cluster) IP address.
This can be configured using
ClusterIP: Exposes the service on a cluster-internal IP. The Service is only reachable within the cluster. If you want to access the Job Manager ui or submit job to the existing session, you need to start a local proxy. You can then use
localhost:8081to submit a Flink job to the session or view the dashboard.
NodePort: Exposes the service on each Node’s IP at a static port (the
<NodeIP>:<NodePort> could be used to contact the Job Manager Service.
NodeIP could be easily replaced with Kubernetes ApiServer address.
You could find it in your kube config file.
LoadBalancer: Exposes the service externally using a cloud provider’s load balancer.
Since the cloud provider and Kubernetes needs some time to prepare the load balancer, you may get a
NodePort JobManager Web Interface in the client log.
You can use
kubectl get services/<ClusterId> to get EXTERNAL-IP and then construct the load balancer JobManager Web Interface manually
Warning! Your JobManager (which can run arbitary jar files) might be exposed to the public internet, without authentication.
ExternalName: Map a service to a DNS name, not supported in current version.
Please reference the official documentation on publishing services in Kubernetes for more information.
The Kubernetes session is started in detached mode by default, meaning the Flink client will exit after submitting all the resources to the Kubernetes cluster. Use the following command to attach to an existing session.
To stop a Flink Kubernetes session, attach the Flink client to the cluster and type
Flink uses Kubernetes OwnerReference’s to cleanup all cluster components.
All the Flink created resources, including
Pod, have been set the OwnerReference to
When the deployment is deleted, all other resources will be deleted automatically.
Application mode allows users to create a single image containing their Job and the Flink runtime, which will automatically create and destroy cluster components as needed. The Flink community provides base docker images customized for any use case.
Use the following command to start a Flink application.
Use the following command to start a PyFlink application, assuming the application image name is my-pyflink-app:latest.
You are able to specify the python main entry script path with
-py or main entry module name with
-pym, the path
of the python codes in the image with
-pyfs and some other options.
Note: Only “local” is supported as schema for application mode. This assumes that the jar is located in the image, not the Flink client.
Note: All the jars in the “$FLINK_HOME/usrlib” directory in the image will be added to user classpath.
When an application is stopped, all Flink cluster resources are automatically destroyed. As always, Jobs may stop when manually canceled or, in the case of bounded Jobs, complete.
By default, the JobManager and TaskManager will output the logs to the console and
/opt/flink/log in each pod simultaneously.
The STDOUT and STDERR will only be redirected to the console. You can access them via
kubectl logs <PodName>.
If the pod is running, you can also use
kubectl exec -it <PodName> bash to tunnel in and view the logs or debug the process.
In order to use plugins, they must be copied to the correct location in the Flink JobManager/TaskManager pod for them to work. You can use the built-in plugins without mounting a volume or building a custom Docker image. For example, use the following command to pass the environment variable to enable the S3 plugin for your Flink application.
Kubernetes Secrets is an object that contains a small amount of sensitive data such as a password, a token, or a key. Such information might otherwise be put in a Pod specification or in an image. Flink on Kubernetes can use Secrets in two ways:
Using Secrets as files from a pod;
Using Secrets as environment variables;
Here is an example of a Pod that mounts a Secret in a volume:
By applying this yaml, each key in foo Secrets becomes the filename under
/opt/foo path. Flink on Kubernetes can enable this feature by the following command:
For more details see the official Kubernetes documentation.
Here is an example of a Pod that uses secrets from environment variables:
By applying this yaml, an environment variable named
FOO_ENV is added into
foo container, and
FOO_ENV consumes the value of
foo_key which is defined in Secrets
Flink on Kubernetes can enable this feature by the following command:
For more details see the official Kubernetes documentation.
Namespaces in Kubernetes are a way to divide cluster resources between multiple users (via resource quota).
It is similar to the queue concept in Yarn cluster. Flink on Kubernetes can use namespaces to launch Flink clusters.
The namespace can be specified using the
-Dkubernetes.namespace=default argument when starting a Flink cluster.
ResourceQuota provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that project.
Role-based access control (RBAC) is a method of regulating access to compute or network resources based on the roles of individual users within an enterprise. Users can configure RBAC roles and service accounts used by JobManager to access the Kubernetes API server within the Kubernetes cluster.
Every namespace has a default service account, however, the
default service account may not have the permission to create or delete pods within the Kubernetes cluster.
Users may need to update the permission of
default service account or specify another service account that has the right role bound.
If you do not want to use
default service account, use the following command to create a new
flink service account and set the role binding.
Then use the config option
-Dkubernetes.jobmanager.service-account=flink to make the JobManager pod using the
flink service account to create and delete TaskManager pods.
Please reference the official Kubernetes documentation on RBAC Authorization for more information.
This section briefly explains how Flink and Kubernetes interact.
When creating a Flink Kubernetes session cluster, the Flink client will first connect to the Kubernetes ApiServer to submit the cluster description, including ConfigMap spec, Job Manager Service spec, Job Manager Deployment spec and Owner Reference. Kubernetes will then create the JobManager deployment, during which time the Kubelet will pull the image, prepare and mount the volume, and then execute the start command. After the JobManager pod has launched, the Dispatcher and KubernetesResourceManager are available and the cluster is ready to accept one or more jobs.
When users submit jobs through the Flink client, the job graph will be generated by the client and uploaded along with users jars to the Dispatcher.
The JobManager requests resources, known as slots, from the KubernetesResourceManager. If no slots are available, the resource manager will bring up TaskManager pods and registering them with the cluster.