1.4.3. Performance and Usage Diagnosis enabler

1.4.3.1. Introduction

Performance and Usage Diagnosis (PUD) enabler aims at collecting performance metrics from monitored targets by scraping metrics HTTP endpoints on them and highlighting potential problems in the ASSIST-IoT platform, so that it could autonomously act in accordance or to notify to the platform administrator to fine tuning machine resources. For this purpose we use Prometheus, an open-source software that collects metrics from targets by “scraping” metrics HTTP endpoints. Supported “targets” include kube-state-metrics for monitoring every kubernetes cluster used in the project, node-exporter metrics for monitoring hardware, OS metrics exposed by *NIX kernels, as well as other important metrics for the rest of the enablers used in the architecture. Together with its companion Alertmanager service, Prometheus is a flexible metrics collection and alerting tool.

1.4.3.2. Features

1.4.3.2.1. Performance and Usage Diagnosis (PUD) enabler’s features

  • Prometheus is an open-source monitoring framework. It provides out-of-the-box monitoring capabilities for the Kubernetes container orchestration platform. Its main features are:

    1. Metric Collection: Prometheus uses the pull model to retrieve metrics over HTTP. There is an option to push metrics to Prometheus using Pushgateway for use cases where Prometheus cannot Scrape the metrics.

    2. Metric Endpoint: The systems that you want to monitor using Prometheus should expose the metrics on an /metrics endpoint. Prometheus uses this endpoint to pull the metrics in regular intervals.

    3. PromQL: Prometheus comes with PromQL, a very flexible query language that can be used to query the metrics in the Prometheus dashboard. Also, the PromQL query will be used by Prometheus UI and Grafana to visualize metrics.

    4. Prometheus Exporters: Exporters are libraries which converts existing metric from third-party apps to Prometheus metrics format. There are many official and community Prometheus exporters. One example is, Kube State metrics, a service which talks to Kubernetes API server to get all the details about all the API objects like deployments, pods, daemonsets etc.

    5. TSDB (time-series database): Prometheus uses TSDB for storing all the data. By default, all the data gets stored locally. However, there are options to integrate remote storage for Prometheus TSDB.

  • Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration.

  • Prometheus-es-adapter is a read and write adapter for integrading LTSE’s elastic search as prometheus’ persistent storage.

  • Grafana is a multi-platform open source analytics and interactive visualization web application. It’s used for creating and visualizing dashboads with graphs generated by prometheus metrics for more user friendly monitoring experience.

  • Kube state metrics is a listening service that generates metrics about the state of Kubernetes objects through leveraging the Kubernetes API.

  • Node_exporter is a Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go is installed seperately in every GWEN and Ubuntu device. The node_exporter is designed to monitor the host system and it requires access to the host system so it’s not recommended to get deployed as a Docker container.

1.4.3.3. Place in architecture

Performance and Usage Diagnosis (PUD) enabler is located in the Application and Service layer of the ASSIST-IoT architecture that provides application logic, including data visualisation and user interaction services, data analytics capabilities, various kinds of data protection support, and data management logic. The PUD enabler is responsible to collect performance metrics from monitored targets.

https://user-images.githubusercontent.com/100563908/156375733-78f4f855-139f-4c55-8241-d6052d15f783.PNG

Here is the high-level architecture of PUD’s Prometheus.

https://user-images.githubusercontent.com/100563908/227181875-4a234213-7797-4eb9-84a2-bae69485dacb.png

Prometheus scrapes metrics from instrumented jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts.

Prometheus works well for recording any purely numeric time series. It fits both machine-centric monitoring as well as monitoring of highly dynamic service-oriented architectures. In a world of microservices, its support for multi-dimensional data collection and querying is a particular strength.

Prometheus is designed for reliability, to be the system you go to during an outage to allow you to quickly diagnose problems. Each Prometheus server is standalone, not depending on network storage or other remote services. You can rely on it when other parts of your infrastructure are broken, and you do not need to setup extensive infrastructure to use it.

1.4.3.4. User guide

Prometheus provides a web UI for running basic queries located at http://<your_server_IP>:9090/. This is how it looks like in a web browser:

https://user-images.githubusercontent.com/100563908/222110319-fa6212cb-6eaf-460b-8a09-1ba310f69eeb.PNG

The “Table” tab is used to view the results of a query, while the “Graph” tab is used to create graphs based on a query.

https://user-images.githubusercontent.com/100563908/156175560-b75810c9-ae49-45f6-80ff-6b5a59504f35.PNG

If you want to see a list of metrics sources, go to the Status > Targets page. Here, you will find a list of all services that are being monitored, including the path at which the metrics are available. In this case, the default path /metrics is used.

https://user-images.githubusercontent.com/100563908/222110555-a19fd69e-a58b-4c5c-ba4e-8e734498d043.PNG

If you’re curious to see how the metrics page looks like, head over to one of them by clicking one of the endpoint URLs.

https://user-images.githubusercontent.com/100563908/222110668-aa978e2c-db76-4595-b288-c92c59b39ec2.PNG

The Prometheus server collects metrics and stores them in a time series database. Individual metrics are identified with names such as kube_pod_container_resource_requests. A metric may have a number of “labels” attached to it, to distinguish it from other similar sources of metrics. As an example, suppose kube_pod_container_resource_requests refers to the number of requested request resource by a container. It may have a label such as resource, which helps you inspect individual system resources by mentioning them.

https://user-images.githubusercontent.com/100563908/156173870-734063b3-4ab8-41cc-b511-7c65fa5eb0a9.PNG

In PromQL, an expression or subexpression should always evaluate to one of the following data types:

  • Instant vector — It represents a time-varying value at a specific point of time.

  • Range vector — it represents a time-varying value, over a period of time.

  • Scalar — A simple numeric floating point value.

  • String — A string value. String literals can be enclosed between single quotes, double quotes or backticks (`). However, escape sequences like n are only processed when double quotes are used.

For more about Querying please refer to Prometheus’ documentation to get started.

Grafana also provides a web UI located at http://<your_server_IP>:3000/. First the user needs to get logged in:

https://user-images.githubusercontent.com/100563908/222115506-ec86a444-5528-45bf-9f88-eb379157573a.PNG

After login user should choose and add Prometheus data sourse in PUD’s Grafana.

https://user-images.githubusercontent.com/100563908/222114194-991a1898-34bd-4868-bdb3-bbdb6c11bc51.PNG

By going to Settings > Add Data Source > Prometheus.

https://user-images.githubusercontent.com/100563908/222114686-98433e40-8bb5-4285-8810-787b33fed86c.PNG

After choosing data source user should import new Dashboards for PUD’s Grafana.

https://user-images.githubusercontent.com/100563908/222116609-cb3aebe3-d4e7-4d46-a234-1f2f85b3fa8b.PNG

Dashboards regarding Kube state metrics and Node_exporter can be found in PUD’s repository in grafana-dashboards directory.

https://user-images.githubusercontent.com/100563908/222117715-e297f520-15bc-4ac7-8d25-54b1fac71270.PNG

By going to Dashboards user can access and manage all of his dashboards.

https://user-images.githubusercontent.com/100563908/222118360-a47c1f43-c8d8-4031-a520-9b1b674c2862.PNG

1.4.3.5. Prerequisites

  • Kubernetes 1.16+

  • Helm 3+

1.4.3.6. Installation

Helm must be installed to use the charts. Please refer to Helm’s documentation to get started.

To install the chart with the release name pude :

Clone the repository to your machine.

NOTE: Change the content of extraScrapeConfigs.yaml file with the correct configurations and targets that you want PUD to scrape.

Install Performance and Usage Diagnosis Enabler

helm install pude --set-file extraScrapeConfigs=extraScrapeConfigs.yaml ./performance-and-usage-diagnosis

To check if the installation was successful run:

kubectl get pods

The result should show something like:

NAME                                                              READY   STATUS    RESTARTS   AGE
prometheus-es-adapter-85cd499bd8-dskkv                            1/1     Running   0          112s
pude-grafana-6986754ffd-7gr62                                     1/1     Running   0          112s
pude-kube-state-metrics-6f78cf594b-dg25z                          1/1     Running   0          112s
pude-performance-and-usage-diagnosis-alertmanager-cc8dfbb5ks27s   2/2     Running   0          112s
pude-performance-and-usage-diagnosis-server-76ff877d66-8z6zd      2/2     Running   0          112s

To access PUD’s Grafana Dashboard UI:

Port forward grafana’s pod to port 3000:

kubectl port-forward pude-grafana-6986754ffd-7gr62 3000

In PUD’s Grafana login page use:

Username: admin

To find the current password enter:

kubectl get secret pude-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

To get kubernetes secrets and grafana’s secret name witch in our case is pude-grafana enter:

kubectl get secrets

To change your grafanas password enter:

kubectl exec -it <grafanas pod name> grafana-cli admin reset-admin-password <your reset password>

Add Prometheus data sourse PUD’s Grafana:

  • Go to Settings > Add Data Source > Prometheus.

To set Prometheus’ URL under HTTP settings first find performance-and-usage-diagnosis-server clusterIP:

kubectl get services
  • Copy and Paste the IP in the URL field.

  • Save & Test

Import new Dashboards for PUD’s Grafana:

  • Go to Dashboards > + Import.

  • Upload Dashboard’s json file or choose one from grafana.com.

  • Load

Node_exporter Installation:

  • Create a node_exporter user to run the node exporter service.

sudo useradd -rs /bin/false node_exporter
  • Create a node_exporter service file under systemd.

sudo vi /etc/systemd/system/node_exporter.service
  • Add the following service file content to the service file and save it.

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
  • Reload the system daemon and star the node exporter service.

sudo systemctl daemon-reload
sudo systemctl start node_exporter
  • Check the node exporter status to make sure it is running in the active state.

sudo systemctl status node_exporter
  • Enable the node exporter service to the system startup.

sudo systemctl enable node_exporter

Now, node exporter would be exporting metrics on port 9100.

1.4.3.7. Configuration options

The following table lists the configurable parameters of the Prometheus chart and their default values.

Parameter

Description

Default

alertmanager.enabled

If true, create alertmanager

true

alertmanager.name

alertmanager container name

alertmanager

alertmanager.useClusterRole

Use a ClusterRole (and ClusterRoleBinding). If set to false - we define a Role and RoleBinding in the defined namespaces ONLY. This makes alertmanager work - for users who do not have ClusterAdmin privs, but wants alertmanager to operate on their own namespaces, instead of clusterwide.

alertmanager

alertmanager.useExistingRole

Set to a rolename to use existing role - skipping role creating - but still doing serviceaccount and rolebinding to the rolename set here.

alertmanager

alertmanager.image.repository

alertmanager container image repository

prom/alertmanager

alertmanager.image.tag

alertmanager container image tag

v0.21.0

alertmanager.image.pullPolicy

alertmanager container image pull policy

IfNotPresent

alertmanager.prefixURL

The prefix slug at which the server can be accessed

``

alertmanager.baseURL

The external url at which the server can be accessed

"http://localhost:9093"

alertmanager.extraArgs

Additional alertmanager container arguments

{}

alertmanager.extraSecretMounts

Additional alertmanager Secret mounts

[]

alertmanager.configMapOverrideName

Prometheus alertmanager ConfigMap override where full-name is {{.Release.Name}}-{{.Values.alertmanager.configMapOverrideName}} and setting this value will prevent the default alertmanager ConfigMap from being generated

""

alertmanager.configFromSecret

The name of a secret in the same kubernetes namespace which contains the Alertmanager config, setting this value will prevent the default alertmanager ConfigMap from being generated

""

alertmanager.configFileName

The configuration file name to be loaded to alertmanager. Must match the key within configuration loaded from ConfigMap/Secret.

alertmanager.yml

alertmanager.ingress.enabled

If true, alertmanager Ingress will be created

false

alertmanager.ingress.annotations

alertmanager Ingress annotations

{}

alertmanager.ingress.extraLabels

alertmanager Ingress additional labels

{}

alertmanager.ingress.hosts

alertmanager Ingress hostnamesv

[]

alertmanager.ingress.extraPaths

Ingress extra paths to prepend to every alertmanager host configuration. Useful when configuring custom actions with AWS ALB Ingress Controller

[]

alertmanager.ingress.tls

alertmanager Ingress TLS configuration (YAML)

[]

alertmanager.nodeSelector

node labels for alertmanager pod assignment

{}

alertmanager.tolerations

node taints to tolerate (requires Kubernetes >=1.6)

[]

alertmanager.affinity

pod affinity

{}

alertmanager.podDisruptionBudget.enabled

If true, create a PodDisruptionBudget

false

alertmanager.podDisruptionBudget.maxUnavailable

Maximum unavailable instances in PDB

1

alertmanager.schedulerName

alertmanager alternate scheduler name

nil

alertmanager.persistentVolume.enabled

If true, alertmanager will create a Persistent Volume Claim

true

alertmanager.persistentVolume.accessModes

alertmanager data Persistent Volume access modes

[ReadWriteOnce]

alertmanager.persistentVolume.annotations

Annotations for alertmanager Persistent Volume Claim

{}

alertmanager.persistentVolume.existingClaim

alertmanager data Persistent Volume existing claim name

""

alertmanager.persistentVolume.mountPath

alertmanager data Persistent Volume mount root path

/data

alertmanager.persistentVolume.size

alertmanager data Persistent Volume size

2Gi

alertmanager.persistentVolume.storageClass

alertmanager data Persistent Volume Storage Class

unset

alertmanager.persistentVolume.volumeBindingMode

alertmanager data Persistent Volume Binding Mode

unset

alertmanager.persistentVolume.subPath

Subdirectory of alertmanager data Persistent Volume to mount

""

alertmanager.podAnnotations

annotations to be added to alertmanager pods

{}

alertmanager.podLabels

labels to be added to Prometheus AlertManager pods

{}

alertmanager.podSecurityPolicy.annotations

Specify pod annotations in the pod security policy

{}

alertmanager.replicaCount

desired number of alertmanager pods

1

alertmanager.statefulSet.enabled

If true, use a statefulset instead of a deployment for pod management

false

alertmanager.statefulSet.podManagementPolicy

podManagementPolicy of alertmanager pods

OrderedReady

alertmanager.statefulSet.headless.annotations

annotations for alertmanager headless service

{}

alertmanager.statefulSet.headless.labels

labels for alertmanager headless service

{}

alertmanager.statefulSet.headless.enableMeshPeer

If true, enable the mesh peer endpoint for the headless service

false

alertmanager.statefulSet.headless.servicePort

alertmanager headless service port

80

alertmanager.priorityClassName

alertmanager priorityClassName

nil

alertmanager.resources

alertmanager pod resource requests & limits

{}

alertmanager.securityContext

Custom security context for Alert Manager containers

{}

alertmanager.service.annotations

annotations for alertmanager service

{}

alertmanager.service.clusterIP

internal alertmanager cluster service IP

""

alertmanager.service.externalIPs

alertmanager service external IP addresses

[]

alertmanager.service.loadBalancerIP

IP address to assign to load balancer (if supported)

""

alertmanager.service.loadBalancerSourceRanges

list of IP CIDRs allowed access to load balancer (if supported)

[]

alertmanager.service.servicePort

alertmanager service port

80

alertmanager.service.sessionAffinity

Session Affinity for alertmanager service, can be None or ClientIP

None

alertmanager.service.type

type of alertmanager service to create

ClusterIP

alertmanager.strategy

Deployment strategy

{ "type": "RollingUpdate" }

alertmanagerFiles.alertmanager.yml

Prometheus alertmanager configuration

example configuration

configmapReload.prometheus.enabled

If false, the configmap-reload container for Prometheus will not be deployed

true

configmapReload.prometheus.name

configmap-reload container name

configmap-reload

configmapReload.prometheus.image.repository

configmap-reload container image repository

jimmidyson/configmap-reload

configmapReload.prometheus.image.tag

configmap-reload container image tag

v0.4.0

configmapReload.prometheus.image.pullPolicy

configmap-reload container image pull policy

IfNotPresent

configmapReload.prometheus.extraArgs

Additional configmap-reload container arguments

{}

configmapReload.prometheus.extraVolumeDirs

Additional configmap-reload volume directories

{}

configmapReload.prometheus.extraConfigmapMounts

Additional configmap-reload configMap mounts

[]

configmapReload.prometheus.resources

configmap-reload pod resource requests & limits

{}

configmapReload.alertmanager.enabled

If false, the configmap-reload container for AlertManager will not be deployed

true

configmapReload.alertmanager.name

configmap-reload container name

configmap-reload

configmapReload.alertmanager.image.repository

configmap-reload container image repository

jimmidyson/configmap-reload

configmapReload.alertmanager.image.repository

configmap-reload container image repository

jimmidyson/configmap-reload

configmapReload.alertmanager.image.tag

configmap-reload container image tag

v0.4.0

configmapReload.alertmanager.image.pullPolicy

configmap-reload container image pull policy

IfNotPresent

configmapReload.alertmanager.extraArgs

Additional configmap-reload container arguments

{}

configmapReload.alertmanager.extraVolumeDirs

Additional configmap-reload volume directories

{}

configmapReload.alertmanager.extraConfigmapMounts

Additional configmap-reload configMap mounts

[]

configmapReload.alertmanager.resources

configmap-reload pod resource requests & limits

{}

initChownData.enabled

If false, don’t reset data ownership at startup

true

initChownData.name

init-chown-data container name

init-chown-data

initChownData.image.repository

init-chown-data container image repository

busybox

initChownData.image.tag

init-chown-data container image tag

latest

initChownData.image.pullPolicy

init-chown-data container image pull policy

IfNotPresent

initChownData.resources

init-chown-data pod resource requests & limits

{}

kubeStateMetrics.enabled

If true, create kube-state-metrics sub-chart

true

kube-state-metrics

kube-state-metrics configuration options

Same as sub-chart's

rbac.create

If true, create & use RBAC resources

true

server.enabled

If false, Prometheus server will not be created

true

server.name

Prometheus server container name

server

server.image.repository

Prometheus server container image repository

prom/prometheus

server.image.tag

Prometheus server container image tag

v2.20.1

server.image.pullPolicy

Prometheus server container image pull policy

IfNotPresent

server.configPath

Path to a prometheus server config file on the container FS

/etc/config/prometheus.yml

server.global.scrape_interval

How frequently to scrape targets by default

1m

server.global.scrape_timeout

How long until a scrape request times out

10s

server.global.evaluation_interval

How frequently to evaluate rules

1m

server.remoteWrite

The remote write feature of Prometheus allow transparently sending samples.

[]

server.remoteRead

The remote read feature of Prometheus allow transparently receiving samples.

[]

server.extraArgs

Additional Prometheus server container arguments

{}

server.extraFlags

Additional Prometheus server container flags

["web.enable-lifecycle"]

server.extraInitContainers

Init containers to launch alongside the server

[]

server.prefixURL

The prefix slug at which the server can be accessed

``

server.baseURL

The external url at which the server can be accessed

``

server.env

Prometheus server environment variables

[]

server.extraHostPathMounts

Additional Prometheus server hostPath mounts

[]

server.extraConfigmapMounts

Additional Prometheus server configMap mounts

[]

server.extraSecretMounts

Additional Prometheus server Secret mounts

[]

server.extraVolumeMounts

Additional Prometheus server Volume mounts

[]

server.extraVolumes

Additional Prometheus server Volumes

[]

server.configMapOverrideName

Prometheus server ConfigMap override where full-name is {{.Release.Name}}-{{.Values.server.configMapOverrideName}} and setting this value will prevent the default server ConfigMap from being generated

""

server.ingress.enabled

If true, Prometheus server Ingress will be created

false

server.ingress.annotations

Prometheus server Ingress annotations

[]

server.ingress.extraLabels

Prometheus server Ingress additional labels

{}

server.ingress.hosts

Prometheus server Ingress hostnames

[]

server.ingress.extraPaths

Ingress extra paths to prepend to every Prometheus server host configuration. Useful when configuring custom actions with AWS ALB Ingress Controller

[]

server.ingress.tls

Prometheus server Ingress TLS configuration (YAML)

[]

server.nodeSelector

node labels for Prometheus server pod assignment

{}

server.tolerations

node taints to tolerate (requires Kubernetes >=1.6)

[]

server.affinity

pod affinity

{}

server.podDisruptionBudget.enabled

If true, create a PodDisruptionBudget

false

server.podDisruptionBudget.maxUnavailable

Maximum unavailable instances in PDB

1

server.priorityClassName

Prometheus server priorityClassName

nil

server.enableServiceLinks

Set service environment variables in Prometheus server pods

true

server.schedulerName

Prometheus server alternate scheduler name

nil

server.persistentVolume.enabled

If true, Prometheus server will create a Persistent Volume Claim

true

server.persistentVolume.accessModes

Prometheus server data Persistent Volume access modes

[ReadWriteOnce]

server.persistentVolume.annotations

Prometheus server data Persistent Volume annotations

{}

server.persistentVolume.existingClaim

Prometheus server data Persistent Volume existing claim name

""

server.persistentVolume.mountPath

Prometheus server data Persistent Volume mount root path

/data

server.persistentVolume.size

Prometheus server data Persistent Volume size

8Gi

server.persistentVolume.storageClass

Prometheus server data Persistent Volume Storage Class

unset

server.persistentVolume.volumeBindingMode

Prometheus server data Persistent Volume Binding Mode

unset

server.persistentVolume.subPath

Subdirectory of Prometheus server data Persistent Volume to mount

""

server.emptyDir.sizeLimit

emptyDir sizeLimit if a Persistent Volume is not used

""

server.podAnnotations

annotations to be added to Prometheus server pods

{}

server.podLabels

labels to be added to Prometheus server pods

{}

server.alertmanagers

Prometheus AlertManager configuration for the Prometheus server

{}

server.deploymentAnnotations

annotations to be added to Prometheus server deployment

{}

server.podSecurityPolicy.annotations

Specify pod annotations in the pod security policy

{}

server.replicaCount

desired number of Prometheus server pods

1

server.statefulSet.enabled

If true, use a statefulset instead of a deployment for pod management

false

server.statefulSet.annotations

annotations to be added to Prometheus server stateful set

{}

server.statefulSet.labels

labels to be added to Prometheus server stateful set

{}

server.statefulSet.podManagementPolicy

podManagementPolicy of server pods

OrderedReady

server.podLabels

labels to be added to Prometheus server pods

{}

server.alertmanagers

Prometheus AlertManager configuration for the Prometheus server

{}

server.deploymentAnnotations

annotations to be added to Prometheus server deployment

{}

server.podSecurityPolicy.annotations

Specify pod annotations in the pod security policy

{}

server.replicaCount

desired number of Prometheus server pods

1

server.statefulSet.enabled

If true, use a statefulset instead of a deployment for pod management

false

server.statefulSet.annotations

annotations to be added to Prometheus server stateful set

{}

server.statefulSet.labels

labels to be added to Prometheus server stateful set

{}

server.statefulSet.podManagementPolicy

podManagementPolicy of server pods

OrderedReady

server.statefulSet.headless.annotations

annotations for Prometheus server headless service

{}

server.statefulSet.headless.labels

labels for Prometheus server headless service

{}

server.statefulSet.headless.servicePort

Prometheus server headless service port

80

server.statefulSet.headless.gRPC.enabled

If true, open a second port on the service for gRPC

false

server.statefulSet.headless.gRPC.servicePort

Prometheus service gRPC port, (ignored if server.service.gRPC.enabled is not true)

10901

server.statefulSet.headless.gRPC.nodePort

Port to be used as gRPC nodePort in the prometheus service

0

server.readinessProbeInitialDelay

the initial delay for the Prometheus server readiness probe

30

server.readinessProbePeriodSeconds

how often (in seconds) to perform the Prometheus server readiness probe

5

server.readinessProbeTimeout

the timeout for the Prometheus server readiness probe

30

server.readinessProbeFailureThreshold

the failure threshold for the Prometheus server readiness probe

3

server.readinessProbeSuccessThreshold

the success threshold for the Prometheus server readiness probe

1

server.livenessProbeInitialDelay

the initial delay for the Prometheus server liveness probe

30

server.livenessProbePeriodSeconds

how often (in seconds) to perform the Prometheus server liveness probe

15

server.livenessProbeTimeout

the timeout for the Prometheus server liveness probe

30

server.livenessProbeFailureThreshold

the failure threshold for the Prometheus server liveness probe

3

server.livenessProbeSuccessThreshold

the success threshold for the Prometheus server liveness probe

1

server.resources

Prometheus server resource requests and limits

{}

server.verticalAutoscaler.enabled

If true a VPA object will be created for the controller (either StatefulSet or Deployemnt, based on above configs)

false

server.securityContext

Custom security context for server containers

{}

server.service.annotations

annotations for Prometheus server service

{}

server.service.clusterIP

internal Prometheus server cluster service IP

""

server.service.externalIPs

Prometheus server service external IP addresses

[]

server.service.loadBalancerIP

IP address to assign to load balancer (if supported)

""

server.service.loadBalancerSourceRanges

list of IP CIDRs allowed access to load balancer (if supported)

[]

server.service.nodePort

Port to be used as the service NodePort (ignored if server.service.type is not NodePort)

0

server.service.servicePort

Prometheus server service port

80

server.service.sessionAffinity

Session Affinity for server service, can be None or ClientIP

None

server.service.type

type of Prometheus server service to create

ClusterIP

server.service.gRPC.enabled

If true, open a second port on the service for gRPC

false

server.service.gRPC.servicePort

Prometheus service gRPC port, (ignored if server.service.gRPC.enabled is not true)

10901

server.service.gRPC.nodePort

Port to be used as gRPC nodePort in the prometheus service

0

server.service.statefulsetReplica.enabled

If true, send the traffic from the service to only one replica of the replicaset

false

server.service.statefulsetReplica.replica

Which replica to send the traffice to

0

server.hostAliases

/etc/hosts-entries in container(s)

[]

server.sidecarContainers

array of snippets with your sidecar containers for prometheus server

""

server.strategy

Deployment strategy

{ "type": "RollingUpdate" }

serviceAccounts.alertmanager.create

If true, create the alertmanager service account

true

serviceAccounts.alertmanager.name

name of the alertmanager service account to use or create

{{ prometheus.alertmanager.fullname }}

serviceAccounts.alertmanager.annotations

annotations for the alertmanager service account

{}

serviceAccounts.server.create

If true, create the server service account

true

serviceAccounts.server.name

name of the server service account to use or create

{{ prometheus.server.fullname }}

serviceAccounts.server.annotations

annotations for the server service account

{}

server.terminationGracePeriodSeconds

Prometheus server Pod termination grace period

300

server.retention

(optional) Prometheus data retention

"15d"

serverFiles.alerting_rules.yml

Prometheus server alerts configuration

{}

serverFiles.recording_rules.yml

Prometheus server rules configuration

{}

serverFiles.prometheus.yml

Prometheus server scrape configuration

example configuration

extraScrapeConfigs

Prometheus server additional scrape configuration

""

alertRelabelConfigs

Prometheus server alert relabeling configs for H/A prometheus

""

networkPolicy.enabled

Enable NetworkPolicy

false

forceNamespace

Force resources to be namespaced

null

Specify each parameter using the --set key=value[,key=value] argument to helm install. For example:

helm install PUD/prometheus --name my-release --set server.terminationGracePeriodSeconds=360

Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example:

helm install PUD/prometheus --name my-release -f values.yaml

The following table lists the configurable parameters of the Prometheus-elasticsearch-adapter chart and their default values.

Env Variables

Description

Default

ES_URL

Elasticsearch URL

http://localhost:9200

ES_USER

Elasticsearch User

ES_PASSWORD

Elasticsearch User Password

ES_WORKERS

Number of batch workers

1

ES_BATCH_MAX_AGE

Max period in seconds between bulk Elasticsearch insert operations

10

ES_BATCH_MAX_DOCS

Max items for bulk Elasticsearch insert operation

1000

ES_BATCH_MAX_SIZE

Max size in bytes for bulk Elasticsearch insert operation

4096

ES_ALIAS

Elasticsearch alias pointing to active write index

prom-metrics

ES_INDEX_DAILY

Create daily indexes and disable index rollover

false

ES_INDEX_SHARDS

Number of Elasticsearch shards to create per index

5

ES_INDEX_REPLICAS

Number of Elasticsearch replicas to create per index

1

ES_INDEX_MAX_AGE

Max age of Elasticsearch index before rollover

7d

ES_INDEX_MAX_DOCS

Max number of docs in Elasticsearch index before rollover

1000000

ES_INDEX_MAX_SIZE

Max size of index before rollover eg 5gb

ES_SEARCH_MAX_DOCS

Max number of docs returned for Elasticsearch search operation

1000

ES_SNIFF

Enable Elasticsearch sniffing

false

STATS

Expose Prometheus metrics endpoint

true

DEBUG

Display extra debug logs

false

1.4.3.8. Developer guide

1.4.3.8.1. PUD’s Prometheus Metrics & Exporters

Performance and Usage Diagnosis (PUD) Enabler follows an HTTP pull model: It scrapes performance metrics from endpoints routinely. Typically the abstraction layer between the application and PUD is an exporter, which takes application-formatted metrics and converts them to Prometheus metrics for consumption. Because PUD uses an HTTP pull model, the exporter typically provides an endpoint /metrics where the performance metrics can be scraped.

The relationship between Prometheus, the exporter, and the application in a Kubernetes environment can be visualized like this:

https://trstringer.com/images/prometheus-exporter.png

Metrics are served as plaintext. They are designed to be consumed either by PUD itself or by a scraper that is compatible with scraping a Prometheus client endpoint. The raw metrics can also be visualized in a browser by opening /metrics endpoint. Note that the metrics exposed on the /metrics endpoint reflect the current state of the application monitored.

The Prometheus metrics format is so widely adopted that it became an independent project: OpenMetrics, striving to make this metric format specification an industry standard.

1.4.3.8.2. Prometheus metrics naming

Generally metric names should allow someone who is familiar with Prometheus but not a particular system to make a good guess as to what a metric means. A metric named http_requests_total is not extremely useful - are these being measured as they come in, in some filter or when they get to the user’s code? And requests_total is even worse, what type of requests?

Metric names for applications should generally be prefixed by the exporter name, e.g. haproxy_up.

Metrics must use base units (e.g. seconds, bytes) and leave converting them to something more readable to graphing tools. No matter what units you end up using, the units in the metric name must match the units in use.

Prometheus metrics and label names are written in snake_case. Only [a-zA-Z0-9:_] are valid in metric names.

The _sum, _count, _bucket and _total suffixes are used by Summaries, Histograms and Counters. Unless you’re producing one of those, avoid these suffixes. _total is a convention for counters, you should use it if you’re using the COUNTER type. Prometheus metric format has a name combined with a series of labels or tags.

<metric name>{<label name>=<label value>, ...}

A time series with the metric name http_requests_total and the labels service=”service”, server=”pod50” and env=”production” could be written like this:

http_requests_total{service="service", server="pod50", env="production"}

You can associate any number of context-specific labels to every metric you submit. Imagine a typical metric like http_requests_per_second, every one of your web servers is emitting these metrics. You can then bundle the labels (or dimensions): - Web Server software (Nginx, Apache) - Environment (production, staging) - HTTP method (POST, GET) - Error code (404, 503) - HTTP response code (number) - Endpoint (/webapp1, /webapp2) - Datacenter zone (east, west)

Prometheus metrics text-based format is line oriented. Lines are separated by a line feed character (n). The last line must end with a line feed character. Empty lines are ignored. A metric is composed by several fields: - Metric name - Any number of labels (can be 0), represented as a key-value array - Current metric value - Optional metric timestamp

A Prometheus metric can be as simple as: http_requests 2

Or, including all the mentioned components: http_requests_total{method="post",code="400"}  3   1395066363000

Metric output is typically preceded with # HELP and # TYPE metadata lines.

The HELP string identifies the metric name and a brief description of it. The TYPE string identifies the type of metric. If there’s no TYPE before a metric, the metric is set to untyped. Everything else that starts with a # is parsed as a comment.

# HELP metric_name Description of the metric
# TYPE metric_name type
# Comment that's not parsed by prometheus
http_requests_total{method="post",code="400"}  3   1395066363000

1.4.3.8.3. Prometheus metrics client libraries

The Prometheus project maintains 4 official Prometheus metrics libraries written in Go, Java / Scala, Python, and Ruby. The Prometheus community has created many third-party libraries that you can use to instrument other languages (or just alternative implementations for the same language):

  • Bash

  • C++

  • Common Lisp

  • Elixir

  • Erlang

  • Haskell

  • Lua for Nginx

  • Lua for Tarantool

  • .NET / C#

  • Node.js

  • Perl

  • PHP

  • Rust

1.4.3.8.4. Prometheus metrics / OpenMetrics types

Depending on what kind of information you want to collect and expose, you’ll have to use a different metric type. Here are your four choices available on the OpenMetrics specification:

Counter

This represents a cumulative metric that only increases over time, like the number of requests to an endpoint. Note: instead of using Counter to instrument decreasing values, use Gauges.

# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.7156890216e+10

Gauge

Gauges are instantaneous measurements of a value. They can be arbitrary values which will be recorded. Gauges represent a random value that can increase and decrease randomly such as the load of your system.

# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 73

Histogram

A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values. A histogram with a base metric name of exposes multiple time series during a scrape:

# HELP http_request_duration_seconds request duration histogram
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.5"} 0
http_request_duration_seconds_bucket{le="1"} 1
http_request_duration_seconds_bucket{le="2"} 2
http_request_duration_seconds_bucket{le="3"} 3
http_request_duration_seconds_bucket{le="5"} 3
http_request_duration_seconds_bucket{le="+Inf"} 3
http_request_duration_seconds_sum 6
http_request_duration_seconds_count 3

Summary

Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window. A summary with a base metric name of also exposes multiple time series during a scrape:

More regarding OpenMetrics types

1.4.3.8.5. Prometheus exporters

Many popular server applications like Nginx or PostgreSQL are much older than the Prometheus metrics / OpenMetrics popularization. They usually have their own metrics formats and exposition methods. To work around this hurdle, the Prometheus community is creating and maintaining a vast collection of Prometheus exporters. An exporter is a “translator” or “adapter” program able to collect the server native metrics and re-publishing these metrics using the Prometheus metrics format and HTTP protocol transports. These small binaries can be co-located in the same container or pod executing the main server that is being monitored, or isolated in their own sidecar container and then you can collect the service metrics scraping the exporter that exposes and transforms them into Prometheus metrics.

There are a number of exporters that are maintained as part of the official Prometheus GitHub.

You might need to write your own exporter if…

  • You’re using 3rd party software that doesn’t have an existing exporter already

  • You want to generate Prometheus metrics from software that you have written

1.4.3.8.6. Example

Building a generic HTTP server metrics exporter in Python. By Nancy Chauhan: https://levelup.gitconnected.com/building-a-prometheus-exporter-8a4bbc3825f5

1.4.3.9. Version control and release

Prometheus v2.31.1

Prometheus-es-adapter v3.3

Grafana v9.1.1

kube-state-metrics v2.8.1

node_exporter v0.18.1

1.4.3.10. License

Apache License 2.0