Commit Graph

249 Commits

Author SHA1 Message Date
Cody Lee dabfeffe66
fix(prometheus): serve scrapes from cached background poll (#1013)
Decouples Prometheus scrape cadence from upstream UniFi API calls so a
429 backoff loop on the controller side no longer stalls /metrics. The
output plugin now owns a 60s background poller (configurable) whose
result is served from an in-memory cache. Concurrent /scrape requests
for the same target are coalesced via singleflight to prevent a noisy
scraper from multiplying upstream load.

Adds two new metrics so operators can detect cache staleness and
refresh failures independently:
- unpoller_prometheus_cache_age_seconds
- unpoller_prometheus_refresh_failures_total

Background goroutine recovers from panics so a malformed input payload
no longer silently kills refreshes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 17:43:44 -05:00
Cody Lee fef3ae74f2
test: update integration expectations for new UAP uplink fields
Influx and Datadog integration tests assert that the captured field/gauge
sets exactly match the YAML. Add the new uap uplink_* entries so the
TestInfluxV1Integration and Datadog integration tests stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:27:30 -05:00
Cody Lee b1a8d60460
feat: add UAP uplink metrics and Prometheus parity for USW/UBB/UDB (closes #988)
Exposes the uplink medium (wire vs wireless) and link speed for UniFi access
points so users can detect when an AP downgrades from gigabit to fast ethernet,
which was the original ask in #988. UAPs previously had zero uplink coverage
in any output plugin; now influxunifi, datadogunifi, and promunifi all report
uplink_type, uplink_speed, uplink_max_speed, and related fields.

Also brings Prometheus to parity with Influx/Datadog by emitting uplink
metrics for USW, UBB, and UDB devices (previously only USG/UDM/UXG had them
in promunifi). A new exportDeviceUplink helper in promunifi/usg.go reuses
the existing unpoller_device_uplink_* descriptors to avoid descriptor
collision (per c48b9917).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:23:41 -05:00
Cody Lee 511c524e6e
feat(influxunifi): add global tags applied to every measurement
Closes #1001. Mirrors the DataDog plugin's global tags feature for
InfluxDB. Per-metric tags take precedence on key collision so
site/device identifiers can never be overwritten by a misconfigured
global. Configurable via TOML/JSON/YAML under influxdb.tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 08:59:16 -05:00
Jim Strang c48b9917b0 fix(promunifi): avoid descriptor collision on unpoller_device_uptime_seconds
descIntegrationDevice was registered with namespace prefix
"unpoller_device_", producing "unpoller_device_uptime_seconds" with
labels {device_id} and a different help string than the existing
descDevice() metric of the same FQDN (labels {type, site_name, name,
source, tag}, help "Device Uptime"). Prometheus MustRegister panics on
inconsistent descriptors for the same fully-qualified name, causing
v3.0.0 to crashloop on startup whenever the Prometheus output was
enabled.

Move the Integration/v1 device metrics under a dedicated
"integration_device_" name prefix, matching the convention used by the
other Integration/v1 collectors added in the same release (e.g.
wifi_broadcast_*, acl_rule_*, mclag_domain_*, pending_device_*), where
the bare namespace prefix is passed in and the type prefix is baked
into each metric name string.

Affected metric renames:
  unpoller_device_uptime_seconds              -> unpoller_integration_device_uptime_seconds
  unpoller_device_cpu_utilization_pct         -> unpoller_integration_device_cpu_utilization_pct
  unpoller_device_memory_utilization_pct      -> unpoller_integration_device_memory_utilization_pct
  unpoller_device_load_average_{1,5,15}min    -> unpoller_integration_device_load_average_{1,5,15}min
  unpoller_device_radio_tx_retries_pct        -> unpoller_integration_device_radio_tx_retries_pct
  unpoller_device_uplink_{rx,tx}_rate_bps     -> unpoller_integration_device_uplink_{rx,tx}_rate_bps

Fixes #1002
Fixes #1004
2026-05-09 08:19:44 -04:00
Cody Lee d2948b8bd0
feat: upgrade unifi to v5.26.0 and add Integration/v1 + new legacy metrics
Adds 21 new data types from unifi v5.26.0 across all metric output plugins
(InfluxDB, Prometheus, DataDog). Per-site Integration/v1 calls are gated on
API key configuration and only run for user-configured sites; ErrEndpointNotFound
is handled gracefully so older firmware continues to work without log spam.

Also migrates events collection (collectAlarms, collectAnomalies, collectEvents,
collectIDs, collectProtectLogs) to handle Network 10.x+ endpoint removals via
ErrEndpointNotFound, with debug-level logging to avoid per-poll noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:45:47 -05:00
Cody Lee c596e82cf2
fix: use v2 traffic API as DPI fallback for Network 9.1+ firmware (#985)
The legacy /stat/stadpi and /stat/sitedpi endpoints return empty data
on UniFi Network 9.1+ (issue #834). The v2 /traffic endpoint already
existed in the unifi library and in the collector, but was only called
when both SaveTraffic and SaveDPI were enabled — most users only set
SaveDPI=true and never saw any data.

- Remove the SaveTraffic gate on GetClientTraffic; call it whenever
  SaveDPI is enabled, treating it as a DPI data source
- Downgrade GetClientTraffic errors to debug-log so old firmware that
  lacks the v2 endpoint continues to use the legacy API without error
- Add convertToSiteDPI to aggregate per-client v2 data into per-site
  DPITable entries, filling SitesDPI when the legacy endpoint is empty
- Legacy API results are preserved; v2 data only supplements sites not
  already covered, so old-firmware users are unaffected

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:42:35 -05:00
Cody Lee 2f1e28c7d3
chore: apply linter auto-fixes (wsl_v5, nlreturn, tagalign) (#984)
golangci-lint auto-fixes across multiple packages:
- wsl_v5: blank lines between logical blocks
- nlreturn: newlines before return statements
- tagalign: struct field tag alignment

No logic changes.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 21:10:52 -05:00
Cody Lee 18c6e66a8e
feat: add Site Magic site-to-site VPN metrics (closes #926) (#983)
* feat: add Site Magic site-to-site VPN metrics (closes #926)

Bump github.com/unpoller/unifi/v5 to v5.25.0 which adds:
- GetMagicSiteToSiteVPN / GetMagicSiteToSiteVPNSite API methods
- MagicSiteToSiteVPN types with mesh, connection, device, and status structs
- Missing VPN health fields on Site.Health (SiteToSiteNumActive/Inactive,
  SiteToSiteRxBytes/TxBytes/RxPackets/TxPackets)

Implement VPN metrics collection across all output plugins:
- Collect Site Magic VPN mesh data per-site in inputunifi pollController
- Propagate VPNMeshes through poller.Metrics / AppendMetrics
- Apply DefaultSiteNameOverride for VPN meshes in augmentMetrics /
  applySiteNameOverride
- influxunifi: vpn_mesh, vpn_mesh_connection, vpn_mesh_status tables
- promunifi: vpn_mesh_*, vpn_tunnel_*, vpn_mesh_status_* gauges
- datadogunifi: unifi.vpn_mesh.*, unifi.vpn_tunnel.*, unifi.vpn_mesh_status.*

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(otelunifi): add Site Magic VPN metrics to OpenTelemetry output

Adds exportVPNMeshes to the otel output plugin, emitting the same
unifi_vpn_mesh_*, unifi_vpn_tunnel_*, and unifi_vpn_mesh_status_*
gauges as the other output plugins.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 21:08:09 -05:00
Cody Lee a81a6e6e16
feat: add port anomaly metrics (closes #929) (#982)
Collect port anomalies from the UniFi v2 API endpoint
/proxy/network/v2/api/site/{site}/ports/port-anomalies and export
them to all output plugins (Prometheus, InfluxDB, DataDog, OpenTelemetry).

Metrics exported per port:
- port_anomaly_count     – number of anomaly events
- port_anomaly_last_seen – unix timestamp of last event

Labels: site_name, source, device_mac, port_idx, anomaly_type

Bumps github.com/unpoller/unifi/v5 to v5.24.0 which adds GetPortAnomalies.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 18:56:37 -05:00
Cody Lee 643c108674
feat: add network topology metrics (closes #931) (#981)
Bumps github.com/unpoller/unifi/v5 to v5.23.0 which adds
GetTopology() fetching vertices (devices/clients) and edges
(wired/wireless connections) from /proxy/network/v2/api/site/{site}/topology.

Changes across the stack:
- poller.Metrics: add Topologies []any field + AppendMetrics support
- inputunifi: collect topology per-site (non-fatal on older controllers),
  pass through augmentMetrics with site name override support
- promunifi: new topology.go with summary, connection-type, link-quality,
  and band-distribution gauges
- influxunifi: new topology.go with topology_summary and topology_edge
  measurements
- datadogunifi: new topology.go with equivalent Datadog gauges
- otelunifi: new topology.go with OpenTelemetry gauge observations

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 18:44:51 -05:00
Cody Lee 6b33b6b97b
feat: firewall policy metrics across all output plugins (closes #928) (#979)
* feat(promunifi): add firewall policy metrics (closes #928)

Bump unifi client to v5.22.0 and wire up firewall policy data end-to-end:

- poller.Metrics: add FirewallPolicies []any slice
- inputunifi: collect GetFirewallPolicies() per poll cycle; apply
  DefaultSiteNameOverride; augment into poller.Metrics
- promunifi: export per-rule (rule_enabled, rule_index) and per-site
  aggregate metrics (rules_total, rules_enabled, rules_disabled,
  rules_by_action, rules_predefined, rules_custom, rules_logging_enabled)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat: export firewall policies to influx, datadog, and otel outputs

Extends firewall policy support (PR #979) to all remaining output plugins:

- influxunifi: batchFirewallPolicy() writes measurement "firewall_policy"
  with tags (rule_name, action, protocol, ip_version, source/dest zone,
  site_name, source) and fields (enabled, index, predefined, logging)
- datadogunifi: batchFirewallPolicy() emits the same data as Datadog gauges
  under the "firewall_policy.*" namespace
- otelunifi: exportFirewallPolicies() emits per-rule gauges
  (unifi_firewall_rule_enabled, unifi_firewall_rule_index) and per-site
  aggregates (rules_total, rules_enabled, rules_disabled, rules_by_action,
  rules_predefined, rules_custom, rules_logging_enabled)

Also rebases onto master to pick up the otelunifi plugin (PR #978).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 18:26:27 -05:00
Cody Lee 521c2f88bc
feat(otelunifi): add OpenTelemetry output plugin (#978)
* feat(otelunifi): add OpenTelemetry output plugin

Adds a new push-based output plugin that exports UniFi metrics to any
OTLP-compatible backend (Grafana Alloy/Mimir, Honeycomb, Datadog via
OTel, New Relic, etc.) using the Go OpenTelemetry SDK v1.42.

Config (default disabled):
  [otel]
  url      = "http://localhost:4318"
  protocol = "http"   # or "grpc"
  interval = "30s"
  timeout  = "10s"
  disable  = false
  api_key  = ""       # optional Bearer auth

Env var prefix: UP_OTEL_*

Exported metrics:
- Sites:   user/guest/IoT counts, AP/GW/SW counts, latency, uptime,
           tx/rx rates per subsystem
- Clients: uptime, rx/tx bytes & rates; signal/noise/RSSI for wireless
- UAP:     up, uptime, CPU/mem, load, per-radio channel/power,
           per-VAP station count/satisfaction/bytes
- USW:     up, uptime, CPU/mem, load, aggregate rx/tx bytes,
           per-port up/speed/bytes/packets/errors/dropped/PoE
- USG:     up, uptime, CPU/mem, load, per-WAN rx/tx bytes/packets/errors
- UDM/UXG: up, uptime, CPU/mem, load averages

Closes #933

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(otelunifi): rename unused ctx parameter to _ in recordGauge

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(otelunifi): replace Disable with Enable (default false)

Plugin is opt-in: set enable=true / UP_OTEL_ENABLE=true to activate.
Closes part of #933.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 18:19:18 -05:00
Cody Lee 4c34180047
feat(clients): add MIMO spatial stream metrics for WiFi clients (#977)
* feat(clients): add MIMO spatial stream metrics for WiFi clients

Add tx_nss, rx_nss (spatial stream count) and tx_mcs, rx_mcs (MCS
index) metrics for WiFi clients, sourced from UniFi controller API
fields. These fields are only populated for wireless clients.

- promunifi: adds unifi_client_radio_transmit_spatial_streams,
  unifi_client_radio_receive_spatial_streams,
  unifi_client_radio_transmit_mcs_index, and
  unifi_client_radio_receive_mcs_index gauges
- influxunifi: adds tx_nss, rx_nss, tx_mcs, rx_mcs fields to the
  clients measurement
- go.mod: replace directive to use local unifi library with new fields

Closes #535

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: use published unifi commit for MIMO fields instead of local replace

Remove the local path replace directive for github.com/unpoller/unifi/v5
and pin to the published pseudo-version at commit f363f61cdbe3a863db5fb3176ef1c0fc282c5674
which contains the RxMcs, RxNSS, TxMcs, TxNSS MIMO fields.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 17:56:16 -05:00
Cody Lee cedc52fc89
feat(lokiunifi): add richer low-cardinality stream labels (#932) (#975)
- Add job=unpoller to every Loki stream (alarm, anomaly, event, ids,
  system_log, protect_log, protect_thumbnail) for standard Grafana/Loki
  source filtering with {job="unpoller"}
- Add event_type and inner_alert_action labels to IDS streams using
  EventType and InnerAlertAction fields
- Add event_type and inner_alert_action labels to Alarm streams using
  Key and InnerAlertAction fields
- Skip severity/category on Anomaly: the unifi.Anomaly struct has no
  such fields

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 15:41:20 -05:00
Cody Lee 117392dd8c
feat: export site_to_site_enabled VPN metric (#926) (#976)
Add the site_to_site_enabled FlexBool field from the vpn subsystem
health entry to both InfluxDB and Prometheus outputs. The field was
present in the unifi.Health struct but never exported.

- influxunifi: add site_to_site_enabled to subsystems fields map
- promunifi: add SiteToSiteEnabled gauge descriptor and emit it in
  the vpn case of exportSite
- Update integration_test_expectations.yaml to include the new field

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 15:39:19 -05:00
Cody Lee a95804743d
feat(lokiunifi): add extra_labels config for custom Loki stream labels (#691) (#973)
Add an ExtraLabels map[string]string field to the Loki Config struct so
users can define static key=value labels that are merged into the stream
labels of every log line sent to Loki. This allows users to distinguish
streams (e.g., by environment or datacenter) without hardcoding values.

Built-in dynamic labels (application, site_name, source, etc.) always
take precedence over extra labels to preserve existing behavior.

Example config (TOML):
  [loki.extra_labels]
  environment = "production"
  datacenter  = "us-east-1"

Closes #691

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 15:25:32 -05:00
Cody Lee 6c5ff5482d
feat(promunifi): add unifi_controller_up gauge metric (closes #356) (#974)
Add a per-controller `<namespace>_controller_up` Prometheus GaugeVec with
a `source` label (controller URL or configured ID). The gauge is set to 1
after each successful poll and 0 on failure, giving operators a standard
metric to alert on controller connectivity issues.

Changes:
- pkg/poller/config.go: add ControllerStatus type and ControllerStatuses
  field to Metrics so any output plugin can consume per-controller health.
- pkg/poller/inputs.go: merge ControllerStatuses when AppendMetrics is
  called (multiple input sources).
- pkg/inputunifi/interface.go: populate ControllerStatuses with Up=true
  on success and Up=false (while still continuing) on per-controller error.
- pkg/promunifi/collector.go: declare and register a prometheus.GaugeVec
  `<namespace>_controller_up`; set the gauge for each controller status
  after every Collect cycle.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 15:25:00 -05:00
Cody Lee 8c7f1cb854
fix: remove age==0 guard that silently dropped all rogue AP metrics (#972)
save_rogue = true collected data from the controller but never wrote
any of it to the output backends. All three exporters (InfluxDB, Datadog,
Prometheus) had the same guard:

    if s.Age.Val == 0 { return }

The intent was to drop stale entries, but the logic is inverted: Age==0
means brand-new or (more commonly) that the UniFi controller did not
include an "age" field in the JSON response, causing FlexInt to default
to 0. This silently discarded every rogue AP record.

Remove the guard entirely. The data was just fetched on-demand from the
controller; if the user opted in to save_rogue, they want all of it.

Fixes #405

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 14:53:07 -05:00
Cody Lee dcdbef6687
fix(inputunifi): gracefully handle 404s from remote API event endpoints (#971)
* fix(inputunifi): gracefully handle 404s from remote API event endpoints

The UniFi remote API (api.ui.com) does not support legacy event endpoints
such as /stat/event, causing repeated [ERROR] log lines for users who have
save_events = true with a remote controller.

When a remote controller returns an invalid HTTP status code (e.g. 404),
log a warning and continue to the next event collector instead of
propagating the error. This keeps metrics collection working and stops
the noisy error loop.

Fixes #966

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(inputunifi): log unsupported remote API event endpoints at Info not Error

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 14:30:49 -05:00
Cody Lee 873202ab5b
fix(inputunifi): recover from GetActiveDHCPLeasesWithAssociations panic (#969)
Wrap the GetActiveDHCPLeasesWithAssociations call in a deferred recover
so a nil-pointer panic in the unifi library (triggered when 401 errors
cause GetDevices to return nil, which was then dereferenced without a
guard in v5.18.0) can no longer crash the poller process.

Fixes #965

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 15:07:03 -05:00
Cody Lee 54bb3bfe8e
feat(devices): add UDB (UniFi Device Bridge) support (#968)
Adds metrics export for UDB devices (UDB-Switch, UDB-Pro, UDB-Pro-Sector)
to all output backends. UDB-Switch is a hybrid device combining PoE switch
ports with WiFi 7 wireless bridge capability (5GHz + 6GHz radios).

- pkg/promunifi/udb.go: Prometheus metrics exporter for UDB
- pkg/influxunifi/udb.go: InfluxDB batch exporter for UDB
- pkg/datadogunifi/udb.go: Datadog batch exporter for UDB
- Wire UDB into switchExport in all three output plugins
- Add UDB to inputunifi device collection and site name override
- Update integration test expectations for InfluxDB and Datadog
- Fix addUBB() bug: was incorrectly incrementing UCI counter

Resolves #947

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 15:00:18 -05:00
Brian Gates 074595c0a9
Fix remote API (Fabric/API key): 429 handling, NVR filter, updateWeb nil panic (#958) 2026-02-18 06:34:04 -05:00
Brian Gates 40e2a7703f
Fix panic when remote discovery fails and no controllers configured (fixes #953) (#957)
* Fix panic when remote discovery fails and no controllers are configured

Call setDefaults(&u.Default) before logController(&u.Default) when
len(u.Controllers) == 0 so HashPII, DropPII, etc. are initialized
and logController does not dereference nil pointers.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: trigger CI re-run

Co-authored-by: Cursor <cursoragent@cursor.com>

* ci: use golangci-lint v2.9 for Go 1.26-compatible deps

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-17 18:13:25 -06:00
Brian Gates b4fa16b2fd
fix(influxunifi): use CelsiusSafe() for temp fields to fix InfluxDB type conflict (#944) (#945)
* fix(influxunifi): use CelsiusSafe() for temp fields to fix InfluxDB type conflict

Write temp_* fields as float64 instead of int64 so InfluxDB does not
report 'field type conflict' when the measurement already has float.

Requires github.com/unpoller/unifi/v5 with CelsiusSafe() (unpoller/unifi#195).
Fixes #944.

Co-authored-by: Cursor <cursoragent@cursor.com>

* deps: unifi v5.17.0; nil guards and 429 retry (unpoller#943)

- Bump github.com/unpoller/unifi/v5 to v5.17.0 (CelsiusSafe, ErrNilUnifi, RateLimitError)
- inputunifi: guard pollController for nil c.Unifi; controllerID(c) in formatSites/Clients/Devices
- inputunifi: getUnifi retry with backoff on 429 (up to 5 attempts, Retry-After or exponential backoff)

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(influxunifi): expect temp_* as float after CelsiusSafe() (fix #944)

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-03 20:12:26 -06:00
Brian Gates 5ea7fcf736
feat: UPS battery metrics, example Prometheus/Loki alerts (unpoller#930) (#941) 2026-01-31 20:25:58 -06:00
brngates98 ca568384d1 feat: add controller sysinfo metrics (unpoller#927)
- Add Sysinfo collection from stat/sysinfo endpoint
- Export controller_info, uptime, update_available, data retention, ports
- Hostname fallback: name, then site_name when API omits hostname
- Apply site name override to Sysinfo for remote/cloud
- Add Discover/Discoverer for endpoint discovery
- Require unpoller/unifi v5.15.0

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-31 20:25:56 -05:00
brngates98 9cfb732c11 Replace Python endpoint-discovery with --discover flag (replaces #936)
- Add --discover and --discover-output to unpoller; uses first unifi
  controller from config to probe known API endpoints and write a
  shareable markdown report.
- Add Discoverer interface and RunDiscover(); inputunifi implements
  Discoverer via unifi.DiscoverEndpoints.
- Remove tools/endpoint-discovery/ (Python/Playwright).
- Add docs/PR_936_REPLACEMENT.md. .gitignore: test config and report.

Requires unpoller/unifi with DiscoverEndpoints (replace in go.mod until
unifi release).
2026-01-30 20:17:00 -05:00
brngates98 b96606128d chore: Update go.sum for unifi v5.11.0 and fix formatting
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-29 17:37:33 -05:00
brngates98 b8519ca058 feat: Add WAN metrics to InfluxDB and Datadog exporters
Add comprehensive WAN metrics support to InfluxDB and Datadog exporters:

InfluxDB Metrics (measurement: wan):
- Configuration: failover_priority, load_balance_weight, provider_download_kbps,
  provider_upload_kbps, smartq_enabled, magic_enabled, vlan_enabled
- Statistics: uptime_percentage, peak_download_percent, peak_upload_percent,
  max_rx_bytes_rate, max_tx_bytes_rate
- Service Provider: service_provider_asn
- Metadata: creation_timestamp

Tags: wan_id, wan_name, wan_networkgroup, wan_type, wan_load_balance_type,
      isp_name, isp_city

Datadog Metrics (namespace: unpoller.wan.*):
- Same metrics as InfluxDB with gauge type
- All metrics tagged with WAN and ISP information

Changes:
- pkg/influxunifi/wan.go: New WAN exporter for InfluxDB
- pkg/influxunifi/influxdb.go: Add WAN to loopPoints and switchExport
- pkg/datadogunifi/wan.go: New WAN exporter for Datadog
- pkg/datadogunifi/datadog.go: Add WAN to loopPoints and switchExport

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-29 17:27:28 -05:00
brngates98 aac4917da7 feat: Add WAN metrics export to Prometheus
Add comprehensive WAN metrics support to unpoller:

WAN Configuration Metrics:
- wan_failover_priority: WAN failover priority
- wan_load_balance_weight: Load balancing weight
- wan_provider_download_kbps: Configured ISP download speed
- wan_provider_upload_kbps: Configured ISP upload speed
- wan_smartq_enabled: SmartQueue QoS status
- wan_magic_enabled: Magic WAN status
- wan_vlan_enabled: VLAN configuration status

WAN Statistics Metrics:
- wan_uptime_percentage: WAN uptime percentage
- wan_peak_download_percent: Peak download utilization
- wan_peak_upload_percent: Peak upload utilization
- wan_max_rx_bytes_rate: Maximum receive rate
- wan_max_tx_bytes_rate: Maximum transmit rate

WAN Service Provider Metrics:
- wan_service_provider_asn: ISP autonomous system number

Labels include:
- wan_id, wan_name, wan_networkgroup
- wan_type (dhcp, static, pppoe)
- wan_load_balance_type (weighted, failover-only)
- isp_name, isp_city (service provider metrics)
- site_name, source

Changes:
- pkg/poller/config.go: Add WANConfigs field to Metrics struct
- pkg/poller/inputs.go: Append WAN configs in metric aggregation
- pkg/inputunifi/input.go: Add WANConfigs field to Metrics struct
- pkg/inputunifi/collector.go: Fetch WAN enriched configuration
- pkg/promunifi/wan.go: New WAN metrics exporter
- pkg/promunifi/collector.go: Initialize and export WAN metrics

Depends on: unpoller/unifi PR (WAN API support)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-29 17:24:12 -05:00
brngates98 86bc1c9d6d fix: rename unused exportWithTags param to _ to satisfy revive
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-29 17:07:43 -05:00
brngates98 74c30eabe6 feat: Add DHCP lease metrics export to Prometheus
- Add DHCP lease fetching in inputunifi collector
- Create promunifi/dhcp_leases.go with network-level and per-lease metrics
- Network-level metrics: pool_size, active_leases, utilization_percent, free_percent, available_ips
- Per-lease metrics: is_static, lease_end, lease_start, lease_time
- Separate network-level pool metrics from per-lease metrics
2026-01-28 21:42:44 -05:00
brngates98 6d85ea76ab Add device tag support to Prometheus metrics
- Add 'tag' label to all device metric descriptors
- Update exportWithTags helper to create separate metric series per tag
- Update all device export functions (UAP, USW, UDM, USG, UXG, PDU, UBB, UCI) to include tags
- Update all label arrays (VAP, Radio, Port, etc.) to include tag label
- Devices with multiple tags create multiple metric series (one per tag)
- Devices without tags export with tag=""

Requires unpoller/unifi#92
2026-01-28 20:48:10 -05:00
Cody Lee 97d3f995b1
Enrich alarms with device names for Loki logs
Added device name enrichment to alarms so that Loki logs show
human-readable device names instead of just MAC addresses.

Changes:
- Modified collectAlarms to fetch devices and build MAC-to-name lookup
- Added extractDeviceNameFromAlarm helper to extract MAC addresses from
  alarm messages and lookup corresponding device names
- Device names are extracted from messages like "AP[fc:ec:da:89:a6:91]"
  or from SrcMAC/DstMAC fields
- Added go.mod replace directive to use local unifi library with new
  DeviceName field

The device_name field will now be included in the JSON output sent to
Loki, making it easier to identify which device triggered an alarm.

Fixes #415

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 12:17:12 -06:00
Cody Lee ae1ab40386
Populate num_user field for VPN subsystem metrics
Fixes #417

UniFi controllers populate RemoteUserNumActive for VPN connections but
leave NumUser at 0 for the VPN subsystem. This caused dashboard queries
looking for num_user in the VPN subsystem to always show 0 active users,
even when VPN connections were active.

Root Cause:
For most subsystems (wlan, lan, www), the controller populates NumUser
directly. However, for the VPN subsystem, the controller uses the
RemoteUserNumActive field instead, leaving NumUser at 0.

The Prometheus exporter had special handling for VPN (lines 148-156 in
pkg/promunifi/site.go) and exported RemoteUserNumActive, but did not
export NumUser. The InfluxDB and Datadog exporters exported all fields
for all subsystems without special handling, resulting in num_user
always being 0 for VPN.

Existing Grafana dashboards query:
  SELECT "num_user" FROM "subsystems" WHERE subsystem='vpn'

This always returned 0 even with active VPN users.

Solution:
For all three exporters (InfluxDB, Datadog, Prometheus), when the
subsystem is 'vpn' and NumUser is 0 but RemoteUserNumActive has a
value, populate num_user with RemoteUserNumActive.

Changes:
- pkg/influxunifi/site.go: Add VPN-specific num_user fallback logic
- pkg/datadogunifi/site.go: Add VPN-specific num_user fallback logic
- pkg/promunifi/site.go: Add NumUser metric to VPN case with fallback

This maintains backward compatibility - existing queries for num_user
will now work correctly, and the remote_user_num_active field is still
available for those who updated their dashboards.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 12:09:01 -06:00
Cody Lee f51a0c7202
Allow polling to continue when individual controllers fail
Fixes #425

When polling multiple controllers, if one controller was down or
unreachable, unpoller would stop collecting data from ALL controllers.
This caused complete data loss across all sites when just one was down.

Root Cause:
Both Metrics() and Events() methods would immediately return an error
when any controller failed, skipping all remaining controllers in the
loop.

Changes:
- Log errors from failed controllers but continue to next controller
- Track collection errors separately from successful data collection
- Only return error if ALL controllers failed and no data was collected
- Return success if at least one controller provided data

This allows unpoller to continue monitoring healthy controllers even
when some are temporarily unreachable due to network issues, timeouts,
or maintenance.

Example behavior:
- Controller 1: Down (timeout) - logs error, continues
- Controller 2: Up - collects data successfully
- Controller 3: Up - collects data successfully
- Result: Returns data from controllers 2 and 3

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 12:01:37 -06:00
Cody Lee a1a8963159
Fix authentication retry to prevent data gaps after re-auth
Fixes #904

When a poll fails (typically with 401 Unauthorized after ~2 hour token
expiration), the code would re-authenticate but then return the original
poll error without retrying. This caused a one-minute data gap every
2 hours.

Changes:
- After successful re-authentication, retry the poll operation
- Add 500ms delay before retry to allow controller to process new auth
- Rename error variable to avoid shadowing during re-auth attempt

This ensures that transient authentication failures during the re-auth
window don't cause data gaps.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 11:53:31 -06:00
Cody Lee 9e3debd58a
Allow PoE-providing ports to be scraped even when disabled
Ports providing PoE power are no longer considered "dead" even when
disabled or down. This allows users to collect PoE metrics from ports
that are disabled for security reasons but still providing power.

Fixes #910

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 11:31:39 -06:00
Cody Lee 07781214c3
Add config option to suppress unknown device type messages
Adds log_unknown_types config option (default: false) to control logging
of unknown UniFi device types. When disabled (default), unknown devices
are silently ignored to reduce log volume. When enabled, they are logged
as DEBUG messages instead of ERROR. Addresses issue #912.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 11:24:33 -06:00
brngates98 1235430478 Update to unifi library v5.6.0 and fix linter errors
- Update go.mod to use unifi library v5.6.0 (includes remote API support)
- Remove temporary replace directive now that v5.6.0 is published
- Fix empty-block linter errors in input.go by removing empty if blocks
2026-01-25 10:58:08 -05:00
brngates98 e17d8bf62e move remote.go to use unifi library functions 2026-01-25 08:59:11 -05:00
brngates98 0cb331a745 Fix golangci-lint empty-block errors in input.go
Remove empty if blocks by inverting conditions:
- Line 289: Invert Remote check for URL default
- Line 303: Invert APIKey check in Remote mode
- Line 401: Invert Remote check for URL default in setControllerDefaults
2026-01-25 08:34:06 -05:00
brngates98 28e77d1ac5 Fix site name override for DPI clients, anomalies, and site metrics
- Apply site name override to DPI clients (ClientsDPI) in augmentMetrics
- Apply site name override to client anomalies when collecting events
- Apply site name override to sites (both Name and SiteName fields) when adding to metrics
- Apply site name override to DPI sites, speed tests, and country traffic
- Move applySiteNameOverride call to end of augmentMetrics to ensure all metrics are processed
- This ensures all Prometheus metrics use console names instead of 'Default (default)' for Cloud Gateways
2026-01-24 22:26:49 -05:00
brngates98 3996fd8683 Format code with gofmt 2026-01-24 18:22:40 -05:00
brngates98 d0abba6ddb Improve site name override to handle all default site name variations
- Add isDefaultSiteName helper to match any site name containing 'default' (case-insensitive)
- Handles variations like 'Default', 'default', 'Default (default)', etc.
- Ensures site_name in metrics shows console names instead of generic 'Default' values
- Makes metrics more compatible with existing dashboards that expect meaningful site names
- Also checks SiteName field on sites in addition to Name field
2026-01-24 18:22:34 -05:00
brngates98 1440f1426e Fix site name override for remote API Cloud Gateways
- Keep actual site name 'default' for API calls to prevent 404 errors
- Apply site name override only in metrics for display purposes
- Fixes issue where console names were used in API paths causing 404s
- Site name override now correctly applied to devices, clients, sites, and rogue APs in metrics only
2026-01-24 17:46:32 -05:00
brngates98 5f76c59fa2 fix duplicate controllers due to cloud gateways site being default 2026-01-24 17:42:54 -05:00
brngates98 28eae6ab22 Add remote API support for UniFi Site Manager
- Add remote API mode with automatic controller discovery
- Discover consoles via /v1/hosts endpoint
- Auto-discover sites for each console via integration API
- Use console name from hosts response as site name override for Cloud Gateways
- Support both config-level and per-controller remote mode
- Add example configs for YAML, JSON, and TOML formats
- Remote API uses api.ui.com with X-API-Key authentication
- Automatically discovers all consoles when remote=true and remote_api_key is set

This enables monitoring multiple UniFi Cloud Gateways through a single
API key without requiring direct network access to each controller.
2026-01-24 17:32:36 -05:00
aharper343 25ba0bd14a Fix incorrect initialization of SaveTraffic 2025-12-24 14:08:47 -05:00