The port-ocean Helm chart hardcodes backoffLimit: 0 in the CronJob job template (cron.yaml L28). This value is not exposed as a configurable Helm value.
In Kubernetes environments with node autoscalers (Karpenter, Cluster Autoscaler), pods can be evicted at any time due to node consolidation or scale-down events. When a resync pod is evicted mid-execution, it exits non-zero, and with backoffLimit: 0 the entire Job is marked as permanently failed — no retry is attempted.
This makes the self-hosted CronJob integration inherently fragile on autoscaled clusters, which represent the majority of production EKS/GKE/AKS deployments.
Requested change:
Expose backoffLimit as a configurable Helm value under workload.cron, e.g.:
workload:
cron:
backoffLimit: 3 # default: 0 (current behavior, for backward compat)
And in cron.yaml:
backoffLimit: {{ .Values.workload.cron.backoffLimit | default 0 }}
Why this matters:
backoffLimit: 0 means any transient pod failure (node eviction, OOM kill, spot interruption, network blip during image pull) permanently fails the Job.
Users currently cannot work around this without either (a) adding karpenter.sh/do-not-disrupt annotations — which only covers Karpenter and doesn't help with spot interruptions or OOM — or (b) forking the chart.
A modest default like 3 would allow Kubernetes to retry the resync pod automatically while still bounding runaway retries. The existing activeDeadlineSeconds already provides a time-based safety net