Watch & Learn

Debugwar Blog

Step in or Step over, this is a problem ...

在K8s上构建Flink环境(Session高可用模式)

2023-05-23 04:55:29

随着各种IT基础设施的发展与完善,各种管理、编排、计算框架被大量应用在日常的IT运维与开发中。本篇文章涉及的,是其中两位佼佼者:Kubernetes和Flink。Kubernetes(下文简称k8s)作为容器编排、管理的框架,发展到今天已经非常成熟,这也方便了本文的另一位主角Flink与k8s的集成。

截止到本文发布为止,Flink稳定版(1.17)的官方提供了部署在Kubernetes上的方法参考1。但是如果直接使用官方Appendix中提供的文件,则其高可用配置模式的集群无法正常运行。一开始笔者根据官方指导文档尝试搭建一个Session模式的高可用(HA)集群,但是一直提示错误,于是笔者详细阅读了一下整篇帮助文档,可以说官方帮助文档中还是有几个坑的——虽然都能找到对应的说明,但是对于“拿来主义”相当不友好。

本文旨在提供一套拿来即用的k8s配置、部署文件,方便各位快速搭建Session模式的高可用Flink集群。

配置文件

注意:如果你创建的文件名不是按照本文的数字_文件名.yaml的话,在执行kubectl apply -f <文件>命令时,请务必按照本文中数字由小到大的顺序执行。

01_namespace.yaml

此文件用于创建服务的命名空间。

  1. apiVersion: v1  
  2. kind: Namespace  
  3. metadata:  
  4.   name: ns-public-flink  

02_rbac.yaml

此文件用于创建服务的基本角色,并赋予角色权限。这里其实涉及到官方文档参考1中的下面这段话:

Moreover, you have to start the JobManager and TaskManager pods with a service account which has the permissions to create, edit, delete ConfigMaps. See how to configure service accounts for pods for more information.

这段话点明了需要给create/edit/delete权限,但根据笔者实际操作下来,应该还漏掉了一个lock权限,而且官方文档中, 并没有给出权限的配置样例文件,但是在HA的配置文件中,却直接使用了serviceAccountName: flink-service-account的账户配置。

  1. apiVersion: v1  
  2. kind: ServiceAccount  
  3. metadata:  
  4.   name: flink-service-account  
  5.   namespace: ns-public-flink  
  6. ---  
  7. kind: Role  
  8. apiVersion: rbac.authorization.k8s.io/v1  
  9. metadata:  
  10.   name: create-edit-delete-privileges  
  11.   namespace: ns-public-flink  
  12. rules:  
  13.   - apiGroups: [""]  
  14.     resources: ["endpoints"]  
  15.     verbs: ["create""edit""delete""lock"]  
  16. ---  
  17. kind: RoleBinding  
  18. apiVersion: rbac.authorization.k8s.io/v1  
  19. metadata:  
  20.   name: bind-role-privileges  
  21.   namespace: ns-public-rabbitmq  
  22. subjects:  
  23.   - kind: ServiceAccount  
  24.     name: flink-service-account  
  25.     namespace: ns-public-flink  
  26. roleRef:  
  27.   kind: Role  
  28.   name: create-edit-delete-privileges  
  29.   apiGroup: rbac.authorization.k8s.io  

03_configmap.yaml

此节中kubernetes.cluster-idhigh-availability.storageDir两个字段与官方文档中不同。

在官方文档中kubernetes.cluster-idhdfs:///flink/recovery,但由于我们是最小化部署,并没有hdfs文件系统可用,因此本文修改成了本地文件系统,即/tmp/flink/recovery

此外,官方文档中kubernetes.cluster-id<cluster-id>,这里应该是需要修改成自己的Flink集群id的,但是官方文档没有明确指出。而且这个字段还有一个限制,即只能由小写字母与-.字符构成。

注意:复制本配置的时候,特别是log4j的.name配置项目,请务必保证不要以空格结尾,不然会出现Unable to locate plugin type for xxx类型的错误(例如Unable to locate plugin type for RollingFile <--注意最后多了一个空格),错误原因是找的带空格的插件,那么自然会找不到。

  1. apiVersion: v1  
  2. kind: ConfigMap  
  3. metadata:  
  4.   name: flink-config  
  5.   namespace: ns-public-flink  
  6.   labels:  
  7.     app: flink-cluster  
  8. data:  
  9.   flink-conf.yaml: |+  
  10.     jobmanager.rpc.address: flink-jobmanager  
  11.     taskmanager.numberOfTaskSlots: 2  
  12.     blob.server.port: 6124  
  13.     jobmanager.rpc.port: 6123  
  14.     taskmanager.rpc.port: 6122  
  15.     queryable-state.proxy.ports: 6125  
  16.     jobmanager.memory.process.size: 1600m  
  17.     taskmanager.memory.process.size: 1728m  
  18.     parallelism.default: 2      
  19.     kubernetes.cluster-id: bytehero  
  20.     high-availability: kubernetes  
  21.     high-availability.storageDir: /tmp/flink/recovery  
  22.     restart-strategy: fixed-delay  
  23.     restart-strategy.fixed-delay.attempts: 10  
  24.   log4j-console.properties: |+  
  25.     # This affects logging for both user code and Flink  
  26.     rootLogger.level = INFO  
  27.     rootLogger.appenderRef.console.ref = ConsoleAppender  
  28.     rootLogger.appenderRef.rolling.ref = RollingFileAppender  
  29.   
  30.     # Uncomment this if you want to _only_ change Flink's logging  
  31.     #logger.flink.name = org.apache.flink  
  32.     #logger.flink.level = INFO  
  33.   
  34.     # The following lines keep the log level of common libraries/connectors on  
  35.     # log level INFO. The root logger does not override this. You have to manually  
  36.     # change the log levels here.  
  37.     logger.akka.name = akka  
  38.     logger.akka.level = INFO  
  39.     logger.kafka.name= org.apache.kafka  
  40.     logger.kafka.level = INFO  
  41.     logger.hadoop.name = org.apache.hadoop  
  42.     logger.hadoop.level = INFO  
  43.     logger.zookeeper.name = org.apache.zookeeper  
  44.     logger.zookeeper.level = INFO  
  45.   
  46.     # Log all infos to the console  
  47.     appender.console.name = ConsoleAppender  
  48.     appender.console.type = CONSOLE  
  49.     appender.console.layout.type = PatternLayout  
  50.     appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n  
  51.   
  52.     # Log all infos in the given rolling file  
  53.     appender.rolling.name = RollingFileAppender  
  54.     appender.rolling.type = RollingFile  
  55.     appender.rolling.append = false  
  56.     appender.rolling.fileName = ${sys:log.file}  
  57.     appender.rolling.filePattern = ${sys:log.file}.%i  
  58.     appender.rolling.layout.type = PatternLayout  
  59.     appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n  
  60.     appender.rolling.policies.type = Policies  
  61.     appender.rolling.policies.size.type = SizeBasedTriggeringPolicy  
  62.     appender.rolling.policies.size.size=100MB  
  63.     appender.rolling.strategy.type = DefaultRolloverStrategy  
  64.     appender.rolling.strategy.max = 10  
  65.   
  66.     # Suppress the irrelevant (wrong) warnings from the Netty channel handler  
  67.     logger.netty.name = org.jboss.netty.channel.DefaultChannelPipeline  
  68.     logger.netty.level = OFF  

04_service.yaml

注意,此节的配置中,会暴露31081端口为Flink Web UI的端口,由于Flink Web UI本身不提供鉴权能力,因此实际生产环境中不应暴露此端口,而是应该使用反向代理加某种权限控制手段(例如nginx+basicauth)来代理访问内部的8081端口。

  1. apiVersion: v1  
  2. kind: Service  
  3. metadata:  
  4.   name: flink-jobmanager  
  5.   namespace: ns-public-flink  
  6. spec:  
  7.   type: ClusterIP  
  8.   ports:  
  9.   - name: rpc  
  10.     port: 6123  
  11.   - name: blob-server  
  12.     port: 6124  
  13.   - name: webui  
  14.     port: 8081  
  15.   selector:  
  16.     app: flink-cluster  
  17.     component: jobmanager  
  18. ---  
  19. kind: Service  
  20. apiVersion: v1  
  21. metadata:  
  22.   name: flink-webui  
  23.   namespace: ns-public-flink  
  24. spec:  
  25.   ports:  
  26.   - name: http  
  27.     port: 8081  
  28.     protocol: TCP  
  29.     targetPort: 8081  
  30.     nodePort: 31081  
  31.   selector:  
  32.     app: flink-cluster  
  33.     component: jobmanager  
  34.   type: NodePort  

05_jobmanager-session.yaml

Flink集群的JobManager配置,适用的为Session高可用模式。

  1. apiVersion: apps/v1  
  2. kind: Deployment  
  3. metadata:  
  4.   name: flink-jobmanager  
  5.   namespace: ns-public-flink  
  6. spec:  
  7.   replicas: 1 # Set the value to greater than 1 to start standby JobManagers  
  8.   selector:  
  9.     matchLabels:  
  10.       app: flink-cluster  
  11.       component: jobmanager  
  12.   template:  
  13.     metadata:  
  14.       labels:  
  15.         app: flink-cluster  
  16.         component: jobmanager  
  17.     spec:  
  18.       containers:  
  19.       - name: jobmanager  
  20.         image: apache/flink:1.16.0-scala_2.12  
  21.         env:  
  22.         - name: POD_IP  
  23.           valueFrom:  
  24.             fieldRef:  
  25.               apiVersion: v1  
  26.               fieldPath: status.podIP  
  27.         args: ["jobmanager""$(POD_IP)"]  
  28.         ports:  
  29.         - containerPort: 6123  
  30.           name: rpc  
  31.         - containerPort: 6124  
  32.           name: blob-server  
  33.         - containerPort: 8081  
  34.           name: webui  
  35.         livenessProbe:  
  36.           tcpSocket:  
  37.             port: 6123  
  38.           initialDelaySeconds: 30  
  39.           periodSeconds: 60  
  40.         volumeMounts:  
  41.         - name: flink-config-volume  
  42.           mountPath: /opt/flink/conf  
  43.         securityContext:  
  44.           runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary  
  45.       serviceAccountName: flink-service-account  
  46.       volumes:  
  47.       - name: flink-config-volume  
  48.         configMap:  
  49.           name: flink-config  
  50.           items:  
  51.           - key: flink-conf.yaml  
  52.             path: flink-conf.yaml  
  53.           - key: log4j-console.properties  
  54.             path: log4j-console.properties  

06_taskmanager-session.yaml

TaskManager配置文件。

  1. apiVersion: apps/v1  
  2. kind: Deployment  
  3. metadata:  
  4.   name: flink-taskmanager  
  5.   namespace: ns-public-flink  
  6. spec:  
  7.   replicas: 2  
  8.   selector:  
  9.     matchLabels:  
  10.       app: flink-cluster  
  11.       component: taskmanager  
  12.   template:  
  13.     metadata:  
  14.       labels:  
  15.         app: flink-cluster  
  16.         component: taskmanager  
  17.     spec:  
  18.       containers:  
  19.       - name: taskmanager  
  20.         image: apache/flink:1.16.0-scala_2.12  
  21.         args: ["taskmanager"]  
  22.         ports:  
  23.         - containerPort: 6122  
  24.           name: rpc  
  25.         - containerPort: 6125  
  26.           name: query-state  
  27.         livenessProbe:  
  28.           tcpSocket:  
  29.             port: 6122  
  30.           initialDelaySeconds: 30  
  31.           periodSeconds: 60  
  32.         volumeMounts:  
  33.         - name: flink-config-volume  
  34.           mountPath: /opt/flink/conf/  
  35.       volumes:  
  36.       - name: flink-config-volume  
  37.         configMap:  
  38.           name: flink-config  
  39.           items:  
  40.           - key: flink-conf.yaml  
  41.             path: flink-conf.yaml  
  42.           - key: log4j-console.properties  
  43.             path: log4j-console.properties  

如果你的文件名是按照笔者上述数字_文件名.yaml命名的话,那么可以使用如下命令将整个Flink集群部署到k8s:
  1. ls | sort | xargs -i kubectl apply -f {}  
如果命名格式不是上述格式,那么请务必保证命令执行顺序根据数字由小到大执行。

结果

集群部署完毕后,状态如下图:


通过暴露的31081端口,我们可以访问Flink Web UI,可以在此Web页面提交任务:


错误处理

上文中我们建立了一个基于k8s的高可用模式Flink集群, 配置过程中有如下配置:

  1. kubernetes.cluster-id: bytehero    
  2. high-availability: kubernetes    

此时Flink会利用k8s的ConfigMaps存储一些持久化配置, 由于是高可用模式的, 因此job的相关数据就存储在这里面(笔者的测试环境,中间有哈希值形式的是之前建立的非持久化Flink集群创建的ConfigMap):


如果你删除了整个Flink集群,但是没有删除这些配置的话,在重新创建Flink集群时,jobmanager会有如下错误:

  1. 2023-05-24 04:22:30,753 INFO  org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Retrieved job ids [00cb29fb2141a94b930dbe8ec8df0c3d, 1a7db87959722c7dfe49f18dadc07946, 3957c74dd3e6ac62989cbc001cda07be, 48e5da2d36e234fa67145da8d9d35d29, 648e72f5be8c820382479f01d5d1f1dc, 97aa9b6af5ba8ae7d44cc16e2646ba15, b9f891b343322417cd4e07392f5b75b9, dd3f75365a881e8dfae1fd58eb44222d] from KubernetesStateHandleStore{configMapName='bytehero-cluster-config-map'}  
  2. 2023-05-24 04:22:30,753 INFO  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Trying to recover job with job id 00cb29fb2141a94b930dbe8ec8df0c3d.  
  3. 2023-05-24 04:22:30,764 INFO  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Stopping SessionDispatcherLeaderProcess.  
  4. 2023-05-24 04:22:30,767 INFO  org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Stopping DefaultJobGraphStore.  
  5. 2023-05-24 04:22:30,769 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Starting the resource manager.  
  6. 2023-05-24 04:22:30,768 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error occurred in the cluster entrypoint.  
  7. java.util.concurrent.CompletionException: org.apache.flink.util.FlinkRuntimeException: Could not recover job with job id 00cb29fb2141a94b930dbe8ec8df0c3d.  
  8.     at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) [?:?]  
  9.     at java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) [?:?]  
  10.     at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) [?:?]  
  11.     at java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) [?:?]  
  12.     at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]  
  13.     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]  
  14.     at java.lang.Thread.run(Unknown Source) [?:?]  
  15. Caused by: org.apache.flink.util.FlinkRuntimeException: Could not recover job with job id 00cb29fb2141a94b930dbe8ec8df0c3d.  
  16.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.tryRecoverJob(SessionDispatcherLeaderProcess.java:183) ~[flink-dist-1.16.0.jar:1.16.0]  
  17.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:150) ~[flink-dist-1.16.0.jar:1.16.0]  
  18.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$recoverJobsIfRunning$2(SessionDispatcherLeaderProcess.java:139) ~[flink-dist-1.16.0.jar:1.16.0]  
  19.     at org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198) ~[flink-dist-1.16.0.jar:1.16.0]  
  20.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:139) ~[flink-dist-1.16.0.jar:1.16.0]  
  21.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$createDispatcherBasedOnRecoveredJobGraphsAndRecoveredDirtyJobResults$1(SessionDispatcherLeaderProcess.java:129) ~[flink-dist-1.16.0.jar:1.16.0]  
  22.     ... 5 more  
  23. Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under jobGraph-00cb29fb2141a94b930dbe8ec8df0c3d. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.  
  24.     at org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:175) ~[flink-dist-1.16.0.jar:1.16.0]  
  25.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.tryRecoverJob(SessionDispatcherLeaderProcess.java:174) ~[flink-dist-1.16.0.jar:1.16.0]  
  26.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:150) ~[flink-dist-1.16.0.jar:1.16.0]  
  27.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$recoverJobsIfRunning$2(SessionDispatcherLeaderProcess.java:139) ~[flink-dist-1.16.0.jar:1.16.0]  
  28.     at org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198) ~[flink-dist-1.16.0.jar:1.16.0]  
  29.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:139) ~[flink-dist-1.16.0.jar:1.16.0]  
  30.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$createDispatcherBasedOnRecoveredJobGraphsAndRecoveredDirtyJobResults$1(SessionDispatcherLeaderProcess.java:129) ~[flink-dist-1.16.0.jar:1.16.0]  
  31.     ... 5 more  
  32. Caused by: java.io.FileNotFoundException: /tmp/flink/recovery/default/submittedJobGrapheeca97687a07 (No such file or directory)  
  33.     at java.io.FileInputStream.open0(Native Method) ~[?:?]  
  34.     at java.io.FileInputStream.open(Unknown Source) ~[?:?]  
  35.     at java.io.FileInputStream.<init>(Unknown Source) ~[?:?]  
  36.     at org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50) ~[flink-dist-1.16.0.jar:1.16.0]  
  37.     at org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:134) ~[flink-dist-1.16.0.jar:1.16.0]  
  38.     at org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:69) ~[flink-dist-1.16.0.jar:1.16.0]  
  39.     at org.apache.flink.runtime.state.RetrievableStreamStateHandle.openInputStream(RetrievableStreamStateHandle.java:66) ~[flink-dist-1.16.0.jar:1.16.0]  
  40.     at org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:58) ~[flink-dist-1.16.0.jar:1.16.0]  
  41.     at org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:166) ~[flink-dist-1.16.0.jar:1.16.0]  
  42.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.tryRecoverJob(SessionDispatcherLeaderProcess.java:174) ~[flink-dist-1.16.0.jar:1.16.0]  
  43.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:150) ~[flink-dist-1.16.0.jar:1.16.0]  
  44.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$recoverJobsIfRunning$2(SessionDispatcherLeaderProcess.java:139) ~[flink-dist-1.16.0.jar:1.16.0]  
  45.     at org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198) ~[flink-dist-1.16.0.jar:1.16.0]  
  46.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:139) ~[flink-dist-1.16.0.jar:1.16.0]  
  47.     at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$createDispatcherBasedOnRecoveredJobGraphsAndRecoveredDirtyJobResults$1(SessionDispatcherLeaderProcess.java:129) ~[flink-dist-1.16.0.jar:1.16.0]  
  48.     ... 5 more  
  49. 2023-05-24 04:22:30,797 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting StandaloneSessionClusterEntrypoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been closed externally..  
  50. 2023-05-24 04:22:30,811 INFO  org.apache.flink.runtime.blob.BlobServer                     [] - Stopped BLOB server at 0.0.0.0:6124  
  51. 2023-05-24 04:22:31,145 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.  
  52. 2023-05-24 04:22:31,147 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.  
  53. 2023-05-24 04:22:31,147 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.  
  54. 2023-05-24 04:22:31,152 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.  
  55. WARNING: An illegal reflective access operation has occurred  
  56. WARNING: Illegal reflective access by org.jboss.netty.util.internal.ByteBufferUtil (file:/tmp/flink-rpc-akka_6212fea2-2c9d-45d6-b762-0ebb7f7d60d5.jar) to method java.nio.DirectByteBuffer.cleaner()  
  57. WARNING: Please consider reporting this to the maintainers of org.jboss.netty.util.internal.ByteBufferUtil  
  58. WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations  
  59. WARNING: All illegal access operations will be denied in a future release  
  60. 2023-05-24 04:22:31,177 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut down.  
  61. 2023-05-24 04:22:31,179 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut down.  

重点是这行:

  1. 2023-05-24 04:22:30,753 INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [00cb29fb2141a94b930dbe8ec8df0c3d, 1a7db87959722c7dfe49f18dadc07946, 3957c74dd3e6ac62989cbc001cda07be, 48e5da2d36e234fa67145da8d9d35d29, 648e72f5be8c820382479f01d5d1f1dc, 97aa9b6af5ba8ae7d44cc16e2646ba15, b9f891b343322417cd4e07392f5b75b9, dd3f75365a881e8dfae1fd58eb44222d] from KubernetesStateHandleStore{configMapName='bytehero-cluster-config-map'}

configMapName='bytehero-cluster-config-map'说明从bytehero-cluster-config-map中获取了持久化的job id,但是由于我们的集群是新创建的,无法重新恢复上一个集群的job,因此产生错误。

解决方法是删除这个ConfigMap即可:

  1. kubectl delete configmaps bytehero-config-map  

注意删除的ConfigMap名称要和报错中的相同。

参考

  1. Flink 1.17 Kubernetes部署文档
Catalog
  • 配置文件
  • 01_namespace.yaml
  • 02_rbac.yaml
  • 03_configmap.yaml
  • 04_service.yaml
  • 05_jobmanager-session.yaml
  • 06_taskmanager-session.yaml
  • 结果
  • 错误处理
  • 参考
  • CopyRight(c) 2020 - 2025 Debugwar.com

    Designed by Hacksign