fix: configure kubedog rate limiter to prevent context cancellation
Fixes #2445 The default Kubernetes client rate limiter settings were too restrictive, causing context cancellation errors when kubedog's reflector infrastructure tried to watch multiple resources simultaneously. When the deployment becomes ready before the rate limiter releases the request, the context gets canceled. This fix: - Increases default QPS from 5 to 100 and Burst from 10 to 200 - Makes QPS and Burst configurable per release via kubedogQPS and kubedogBurst - Uses direct client-go configuration instead of kubedog's kube.Init - Adds comprehensive documentation and examples Users can now tune these settings based on their cluster size and requirements: - Small clusters: QPS=50, Burst=100 - Medium clusters: QPS=100, Burst=200 (default) - Large clusters: QPS=200, Burst=400 Signed-off-by: yxxhero <aiopsclub@163.com>
This commit is contained in:
parent
6e21671228
commit
b75c61b2e6
|
|
@ -0,0 +1,208 @@
|
|||
# Kubedog Configuration
|
||||
|
||||
This document describes how to configure kubedog resource tracking in Helmfile.
|
||||
|
||||
## Overview
|
||||
|
||||
Kubedog is a library for tracking Kubernetes resources during deployments. Helmfile uses kubedog when `trackMode: kubedog` is set to monitor the rollout of resources like Deployments, StatefulSets, DaemonSets, and Jobs.
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Release-level Configuration
|
||||
|
||||
You can configure kubedog settings per release:
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: my-app
|
||||
namespace: default
|
||||
chart: my-chart
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 100 # Queries per second (default: 100)
|
||||
kubedogBurst: 200 # Burst capacity (default: 200)
|
||||
trackLogs: true
|
||||
trackKinds:
|
||||
- Deployment
|
||||
```
|
||||
|
||||
### Global Default Configuration
|
||||
|
||||
You can also set defaults in `helmDefaults`:
|
||||
|
||||
```yaml
|
||||
helmDefaults:
|
||||
trackMode: kubedog
|
||||
# Note: QPS and Burst can only be configured at release level
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
### kubedogQPS
|
||||
|
||||
- **Type**: `float32`
|
||||
- **Default**: `100`
|
||||
- **Description**: Sets the maximum number of queries per second to the Kubernetes API server from the kubedog client. This controls the rate of API requests when tracking resources.
|
||||
|
||||
**When to increase**:
|
||||
- Large clusters with many resources
|
||||
- When tracking multiple releases simultaneously
|
||||
- When you see rate limiting errors like "client rate limiter Wait returned an error: context canceled"
|
||||
|
||||
**When to decrease**:
|
||||
- Small clusters or development environments
|
||||
- When you want to reduce load on the API server
|
||||
|
||||
### kubedogBurst
|
||||
|
||||
- **Type**: `int`
|
||||
- **Default**: `200`
|
||||
- **Description**: Sets the maximum burst of requests that can be made to the Kubernetes API server. This allows temporary spikes above the QPS limit.
|
||||
|
||||
**When to increase**:
|
||||
- When tracking releases with many resources
|
||||
- When you see connection timeout errors
|
||||
- In production environments with high throughput needs
|
||||
|
||||
**When to decrease**:
|
||||
- In resource-constrained environments
|
||||
- When API server is under heavy load
|
||||
|
||||
## Tuning Guidelines
|
||||
|
||||
### For Small Clusters (< 50 resources)
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: my-app
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 50
|
||||
kubedogBurst: 100
|
||||
```
|
||||
|
||||
### For Medium Clusters (50-200 resources)
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: my-app
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 100 # default
|
||||
kubedogBurst: 200 # default
|
||||
```
|
||||
|
||||
### For Large Clusters (> 200 resources)
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: my-app
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 200
|
||||
kubedogBurst: 400
|
||||
```
|
||||
|
||||
### For Multiple Concurrent Releases
|
||||
|
||||
When using `--concurrent` flag with multiple releases that use kubedog tracking:
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: app1
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 50
|
||||
kubedogBurst: 100
|
||||
|
||||
- name: app2
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 50
|
||||
kubedogBurst: 100
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Rate Limiting Errors
|
||||
|
||||
**Error**:
|
||||
```
|
||||
E0302 19:38:41.812322 91 reflector.go:204] "Failed to watch" err="client rate limiter Wait returned an error: context canceled"
|
||||
```
|
||||
|
||||
**Solution**: Increase `kubedogQPS` and `kubedogBurst` values.
|
||||
|
||||
### Connection Timeouts
|
||||
|
||||
**Error**:
|
||||
```
|
||||
context canceled while waiting for API server response
|
||||
```
|
||||
|
||||
**Solution**:
|
||||
1. Check network connectivity to the API server
|
||||
2. Increase `kubedogBurst` to allow more concurrent requests
|
||||
3. Decrease number of concurrent releases if using `--concurrent` flag
|
||||
|
||||
### Slow Tracking
|
||||
|
||||
**Symptom**: Resource tracking takes a long time to complete.
|
||||
|
||||
**Solution**:
|
||||
1. Use `trackKinds` to limit which resource types are tracked
|
||||
2. Use `skipKinds` to exclude unnecessary resource types
|
||||
3. Increase `kubedogQPS` to speed up API queries
|
||||
|
||||
## Related Configuration
|
||||
|
||||
### trackTimeout
|
||||
|
||||
Sets the timeout for kubedog tracking (in seconds):
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: my-app
|
||||
trackMode: kubedog
|
||||
trackTimeout: 600 # 10 minutes
|
||||
```
|
||||
|
||||
### trackLogs
|
||||
|
||||
Enable/disable log streaming from tracked resources:
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: my-app
|
||||
trackMode: kubedog
|
||||
trackLogs: true # Show pod logs during tracking
|
||||
```
|
||||
|
||||
### trackKinds / skipKinds
|
||||
|
||||
Control which resource types to track:
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: my-app
|
||||
trackMode: kubedog
|
||||
trackKinds:
|
||||
- Deployment
|
||||
- StatefulSet
|
||||
skipKinds:
|
||||
- ConfigMap
|
||||
- Secret
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The kubedog client configuration uses:
|
||||
- `k8s.io/client-go` for Kubernetes API communication
|
||||
- Custom rate limiting via `rest.Config.QPS` and `rest.Config.Burst`
|
||||
- Separate client cache per unique (kubeContext, kubeconfig, QPS, Burst) combination
|
||||
|
||||
The default values (QPS=100, Burst=200) were chosen to:
|
||||
- Prevent rate limiting errors in most common scenarios
|
||||
- Support tracking of multiple resource types simultaneously
|
||||
- Allow reasonable burst capacity for initial resource discovery
|
||||
- Balance between tracking speed and API server load
|
||||
|
||||
## See Also
|
||||
|
||||
- [Issue #2445](https://github.com/helmfile/helmfile/issues/2445) - Original issue that led to configurable QPS/Burst
|
||||
- [Kubedog Documentation](https://github.com/werf/kubedog)
|
||||
- [Kubernetes Client Go Rate Limiting](https://kubernetes.io/docs/reference/config-api/apiserver-eventratelimit.v1alpha1/)
|
||||
|
|
@ -0,0 +1,186 @@
|
|||
# Example: Kubedog Resource Tracking Configuration
|
||||
|
||||
This example demonstrates various ways to configure kubedog resource tracking.
|
||||
|
||||
## Basic Example
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: simple-app
|
||||
namespace: default
|
||||
chart: ./charts/simple-app
|
||||
trackMode: kubedog
|
||||
```
|
||||
|
||||
Uses default QPS (100) and Burst (200).
|
||||
|
||||
## Customized Rate Limiting
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: high-throughput-app
|
||||
namespace: production
|
||||
chart: ./charts/app
|
||||
trackMode: kubedog
|
||||
# Increased limits for large-scale deployments
|
||||
kubedogQPS: 200
|
||||
kubedogBurst: 400
|
||||
trackTimeout: 600
|
||||
trackLogs: true
|
||||
trackKinds:
|
||||
- Deployment
|
||||
- StatefulSet
|
||||
```
|
||||
|
||||
## Multiple Releases with Different Settings
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
# Small app - conservative limits
|
||||
- name: frontend
|
||||
namespace: web
|
||||
chart: ./charts/frontend
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 50
|
||||
kubedogBurst: 100
|
||||
|
||||
# Medium app - default limits
|
||||
- name: backend
|
||||
namespace: api
|
||||
chart: ./charts/backend
|
||||
trackMode: kubedog
|
||||
|
||||
# Large app - increased limits
|
||||
- name: data-processor
|
||||
namespace: data
|
||||
chart: ./charts/processor
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 150
|
||||
kubedogBurst: 300
|
||||
trackKinds:
|
||||
- Deployment
|
||||
- StatefulSet
|
||||
- Job
|
||||
```
|
||||
|
||||
## Environment-Specific Configuration
|
||||
|
||||
```yaml
|
||||
environments:
|
||||
development:
|
||||
values:
|
||||
- kubedogQPS: 50
|
||||
- kubedogBurst: 100
|
||||
staging:
|
||||
values:
|
||||
- kubedogQPS: 100
|
||||
- kubedogBurst: 200
|
||||
production:
|
||||
values:
|
||||
- kubedogQPS: 200
|
||||
- kubedogBurst: 400
|
||||
|
||||
releases:
|
||||
- name: myapp
|
||||
namespace: {{ .Environment.Name }}
|
||||
chart: ./charts/myapp
|
||||
trackMode: kubedog
|
||||
kubedogQPS: {{ .Values.kubedogQPS }}
|
||||
kubedogBurst: {{ .Values.kubedogBurst }}
|
||||
```
|
||||
|
||||
## With Global Defaults
|
||||
|
||||
```yaml
|
||||
helmDefaults:
|
||||
createNamespace: true
|
||||
timeout: 300
|
||||
|
||||
releases:
|
||||
- name: app1
|
||||
namespace: default
|
||||
chart: ./charts/app
|
||||
trackMode: kubedog
|
||||
# Uses release-specific settings
|
||||
kubedogQPS: 150
|
||||
kubedogBurst: 300
|
||||
|
||||
- name: app2
|
||||
namespace: default
|
||||
chart: ./charts/app
|
||||
trackMode: kubedog
|
||||
# Uses default QPS=100, Burst=200
|
||||
```
|
||||
|
||||
## Selective Tracking
|
||||
|
||||
```yaml
|
||||
releases:
|
||||
- name: complex-app
|
||||
namespace: default
|
||||
chart: ./charts/complex-app
|
||||
trackMode: kubedog
|
||||
kubedogQPS: 120
|
||||
kubedogBurst: 250
|
||||
# Only track deployments and jobs
|
||||
trackKinds:
|
||||
- Deployment
|
||||
- Job
|
||||
# Skip these resource types
|
||||
skipKinds:
|
||||
- ConfigMap
|
||||
- Secret
|
||||
- Ingress
|
||||
# Track specific resources only
|
||||
trackResources:
|
||||
- kind: Deployment
|
||||
name: main-app
|
||||
- kind: Job
|
||||
name: migration-job
|
||||
namespace: default
|
||||
```
|
||||
|
||||
## Testing the Configuration
|
||||
|
||||
To test your kubedog configuration:
|
||||
|
||||
```bash
|
||||
# Apply with kubedog tracking
|
||||
helmfile apply -n my-namespace -l app=myapp
|
||||
|
||||
# With debug logging
|
||||
helmfile apply -n my-namespace -l app=myapp --log-level debug
|
||||
|
||||
# With specific environment
|
||||
helmfile apply -e production -l app=myapp
|
||||
```
|
||||
|
||||
## Expected Output
|
||||
|
||||
When kubedog tracking is working correctly, you should see:
|
||||
|
||||
```
|
||||
Tracking 5 resources from release myapp with kubedog
|
||||
Tracking 5 resources with kubedog (filtered from 5 total)
|
||||
┌ Status progress
|
||||
│ DEPLOYMENT REPLICAS AVAILABLE UP-TO-DATE
|
||||
│ myapp-main 1/1 1 1
|
||||
└ Status progress
|
||||
All resources tracked successfully
|
||||
UPDATED RELEASES:
|
||||
NAME NAMESPACE CHART VERSION DURATION
|
||||
myapp default ./charts/app 1.0.0 1m32s
|
||||
```
|
||||
|
||||
## Troubleshooting Commands
|
||||
|
||||
```bash
|
||||
# Check current kubedog settings
|
||||
helmfile build -n my-namespace -l app=myapp | grep -A 5 "kubedog"
|
||||
|
||||
# Test with increased verbosity
|
||||
helmfile apply -n my-namespace -l app=myapp --log-level debug 2>&1 | grep -i kubedog
|
||||
|
||||
# Monitor API server requests (requires cluster access)
|
||||
kubectl get --raw /metrics | grep apiserver_request_count
|
||||
```
|
||||
|
|
@ -18,12 +18,16 @@ type TrackOptions struct {
|
|||
Logs bool
|
||||
LogsSince time.Duration
|
||||
Filter *resource.FilterConfig
|
||||
QPS float32
|
||||
Burst int
|
||||
}
|
||||
|
||||
func NewTrackOptions() *TrackOptions {
|
||||
return &TrackOptions{
|
||||
Timeout: 5 * time.Minute,
|
||||
LogsSince: 10 * time.Minute,
|
||||
QPS: 100,
|
||||
Burst: 200,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -41,3 +45,13 @@ func (o *TrackOptions) WithFilterConfig(config *resource.FilterConfig) *TrackOpt
|
|||
o.Filter = config
|
||||
return o
|
||||
}
|
||||
|
||||
func (o *TrackOptions) WithQPS(qps float32) *TrackOptions {
|
||||
o.QPS = qps
|
||||
return o
|
||||
}
|
||||
|
||||
func (o *TrackOptions) WithBurst(burst int) *TrackOptions {
|
||||
o.Burst = burst
|
||||
return o
|
||||
}
|
||||
|
|
|
|||
|
|
@ -8,11 +8,11 @@ import (
|
|||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/werf/kubedog/pkg/kube"
|
||||
"github.com/werf/kubedog/pkg/tracker"
|
||||
"github.com/werf/kubedog/pkg/trackers/rollout/multitrack"
|
||||
"go.uber.org/zap"
|
||||
"k8s.io/client-go/kubernetes"
|
||||
"k8s.io/client-go/tools/clientcmd"
|
||||
|
||||
"github.com/helmfile/helmfile/pkg/resource"
|
||||
)
|
||||
|
|
@ -20,6 +20,8 @@ import (
|
|||
type cacheKey struct {
|
||||
kubeContext string
|
||||
kubeconfig string
|
||||
qps float32
|
||||
burst int
|
||||
}
|
||||
|
||||
var (
|
||||
|
|
@ -41,6 +43,8 @@ type TrackerConfig struct {
|
|||
KubeContext string
|
||||
Kubeconfig string
|
||||
TrackOptions *TrackOptions
|
||||
KubedogQPS *float32
|
||||
KubedogBurst *int
|
||||
}
|
||||
|
||||
func NewTracker(config *TrackerConfig) (*Tracker, error) {
|
||||
|
|
@ -54,16 +58,26 @@ func NewTracker(config *TrackerConfig) (*Tracker, error) {
|
|||
kubeconfig = os.Getenv("KUBECONFIG")
|
||||
}
|
||||
|
||||
clientSet, err := getOrCreateClient(config.KubeContext, kubeconfig)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to initialize kubernetes client: %w", err)
|
||||
}
|
||||
|
||||
options := config.TrackOptions
|
||||
if options == nil {
|
||||
options = NewTrackOptions()
|
||||
}
|
||||
|
||||
qps := options.QPS
|
||||
if config.KubedogQPS != nil {
|
||||
qps = *config.KubedogQPS
|
||||
}
|
||||
|
||||
burst := options.Burst
|
||||
if config.KubedogBurst != nil {
|
||||
burst = *config.KubedogBurst
|
||||
}
|
||||
|
||||
clientSet, err := getOrCreateClient(config.KubeContext, kubeconfig, qps, burst)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to initialize kubernetes client: %w", err)
|
||||
}
|
||||
|
||||
var filter *resource.ResourceFilter
|
||||
if options.Filter != nil {
|
||||
filter = resource.NewResourceFilter(options.Filter, logger)
|
||||
|
|
@ -78,10 +92,12 @@ func NewTracker(config *TrackerConfig) (*Tracker, error) {
|
|||
}, nil
|
||||
}
|
||||
|
||||
func getOrCreateClient(kubeContext, kubeconfig string) (kubernetes.Interface, error) {
|
||||
func getOrCreateClient(kubeContext, kubeconfig string, qps float32, burst int) (kubernetes.Interface, error) {
|
||||
key := cacheKey{
|
||||
kubeContext: kubeContext,
|
||||
kubeconfig: kubeconfig,
|
||||
qps: qps,
|
||||
burst: burst,
|
||||
}
|
||||
|
||||
kubeInitMu.Lock()
|
||||
|
|
@ -91,18 +107,33 @@ func getOrCreateClient(kubeContext, kubeconfig string) (kubernetes.Interface, er
|
|||
return client, nil
|
||||
}
|
||||
|
||||
initOpts := kube.InitOptions{
|
||||
KubeConfigOptions: kube.KubeConfigOptions{
|
||||
Context: kubeContext,
|
||||
ConfigPath: kubeconfig,
|
||||
},
|
||||
var explicitPath string
|
||||
if kubeconfig != "" {
|
||||
explicitPath = kubeconfig
|
||||
}
|
||||
loadingRules := &clientcmd.ClientConfigLoadingRules{
|
||||
ExplicitPath: explicitPath,
|
||||
}
|
||||
|
||||
if err := kube.Init(initOpts); err != nil {
|
||||
return nil, err
|
||||
overrides := &clientcmd.ConfigOverrides{}
|
||||
if kubeContext != "" {
|
||||
overrides.CurrentContext = kubeContext
|
||||
}
|
||||
|
||||
cc := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, overrides)
|
||||
restConfig, err := cc.ClientConfig()
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to load kubeconfig: %w", err)
|
||||
}
|
||||
|
||||
restConfig.QPS = qps
|
||||
restConfig.Burst = burst
|
||||
|
||||
client, err := kubernetes.NewForConfig(restConfig)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to create kubernetes client: %w", err)
|
||||
}
|
||||
|
||||
client := kube.Kubernetes
|
||||
clientCache[key] = client
|
||||
|
||||
return client, nil
|
||||
|
|
|
|||
|
|
@ -86,3 +86,45 @@ func TestTrackOptions_WithFilterConfig(t *testing.T) {
|
|||
assert.Equal(t, []string{"Deployment", "StatefulSet"}, opts.Filter.TrackKinds)
|
||||
assert.Equal(t, []string{"ConfigMap"}, opts.Filter.SkipKinds)
|
||||
}
|
||||
|
||||
func TestTrackOptions_WithQPS(t *testing.T) {
|
||||
opts := NewTrackOptions()
|
||||
opts = opts.WithQPS(50.0)
|
||||
|
||||
assert.Equal(t, float32(50.0), opts.QPS)
|
||||
}
|
||||
|
||||
func TestTrackOptions_WithBurst(t *testing.T) {
|
||||
opts := NewTrackOptions()
|
||||
opts = opts.WithBurst(100)
|
||||
|
||||
assert.Equal(t, 100, opts.Burst)
|
||||
}
|
||||
|
||||
func TestTrackOptions_DefaultQPSBurst(t *testing.T) {
|
||||
opts := NewTrackOptions()
|
||||
|
||||
assert.Equal(t, float32(100), opts.QPS)
|
||||
assert.Equal(t, 200, opts.Burst)
|
||||
}
|
||||
|
||||
func TestTrackerConfig_WithQPSBurst(t *testing.T) {
|
||||
qps := float32(50.0)
|
||||
burst := 100
|
||||
config := &TrackerConfig{
|
||||
Logger: nil,
|
||||
Namespace: "test-ns",
|
||||
KubeContext: "test-ctx",
|
||||
Kubeconfig: "/test/kubeconfig",
|
||||
TrackOptions: NewTrackOptions(),
|
||||
KubedogQPS: &qps,
|
||||
KubedogBurst: &burst,
|
||||
}
|
||||
|
||||
assert.NotNil(t, config)
|
||||
assert.Equal(t, "test-ns", config.Namespace)
|
||||
assert.Equal(t, &qps, config.KubedogQPS)
|
||||
assert.Equal(t, &burst, config.KubedogBurst)
|
||||
assert.Equal(t, float32(50.0), *config.KubedogQPS)
|
||||
assert.Equal(t, 100, *config.KubedogBurst)
|
||||
}
|
||||
|
|
|
|||
|
|
@ -480,6 +480,8 @@ func (st *HelmState) trackWithKubedog(ctx context.Context, release *ReleaseSpec,
|
|||
KubeContext: kubeContext,
|
||||
Kubeconfig: st.kubeconfig,
|
||||
TrackOptions: trackOpts,
|
||||
KubedogQPS: release.KubedogQPS,
|
||||
KubedogBurst: release.KubedogBurst,
|
||||
})
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create kubedog tracker: %w", err)
|
||||
|
|
|
|||
|
|
@ -466,6 +466,10 @@ type ReleaseSpec struct {
|
|||
SkipKinds []string `yaml:"skipKinds,omitempty"`
|
||||
// TrackResources is a whitelist of specific resources to track
|
||||
TrackResources []TrackResourceSpec `yaml:"trackResources,omitempty"`
|
||||
// KubedogQPS specifies the QPS (queries per second) for kubedog kubernetes client
|
||||
KubedogQPS *float32 `yaml:"kubedogQPS,omitempty"`
|
||||
// KubedogBurst specifies the burst for kubedog kubernetes client
|
||||
KubedogBurst *int `yaml:"kubedogBurst,omitempty"`
|
||||
}
|
||||
|
||||
// TrackResourceSpec specifies a resource to track
|
||||
|
|
|
|||
Loading…
Reference in New Issue