Prometheus
Prometheus Integration Guide
This guide provides comprehensive instructions for integrating your Prometheus metrics with slaOS. This integration enables slaOS to collect and analyze metrics from your Prometheus instances, helping you establish and monitor Service Level Indicators (SLIs).
Prerequisites
Required Components
A running Prometheus instance
Your slaOS account credentials
(Optional) Access to configure authentication methods
Important Note: Currently, the integration requires an existing Prometheus server. If you only have applications exposing /metrics
endpoints that need to be scraped, this is not yet supported but is coming soon!
Please contact our support team if this is your use case - we're happy to help find alternative solutions and work with you to ensure a smooth onboarding experience when this feature becomes available.
Integration Steps
Step 1: Authentication Setup
If your Prometheus instance requires authentication or runs with TLS enabled, you'll need to configure the appropriate authentication method. This step is crucial for securing access to your metrics while ensuring slaOS can reliably collect them.
Choose one of the following authentication methods based on your Prometheus setup:
Choose one authentication method
Basic Authentication
Token Authentication
Certificate Authentication (mTLS)
Google Cloud
Important setup steps:
Create a service account with these IAM roles:
roles/monitoring.viewer
roles/iam.serviceAccountTokenCreator
roles/iam.serviceAccountUser
Generate and download the service account key file (JSON)
Replace:
[PROJECT_ID]
with your actual GCP project ID/path/to/service-account.json
with the actual path to your downloaded key file
Ensure the service account has the required OAuth scopes enabled in your GCP project.
For more details check Github templates at prometheus/auth.md
Step 2: Set up your promQL queries
Set up your PromQL queries for collecting metrics. slaOS validates query correctness during the onboarding process to ensure reliable data collection. For self-hosted deployments, invalid query formats will prevent the indexer from starting.
Query Configuration
You can use the full power of PromQL to build your queries. For a comprehensive guide on writing PromQL queries, refer to the official Prometheus documentation.
Here are some common query patterns for monitoring service health:
Monitor the rate of incoming requests:
This query:
Calculates request rate over 5-minute windows
Groups results by customer_id
Returns data points every minute (step)
Maps to "request_rate" metric in slaOS
Query Validation
slaOS performs several validations on your queries:
Syntax correctness
Label presence (especially for
organization_identifier
)Appropriate use of aggregation operators
Correct histogram usage
Valid time windows and steps
If validation fails:
In cloud slaOS: The onboarding interface will show specific error messages
In self-hosted slaOS: The indexer will log errors and fail to start
Best Practices
Time Windows: Use appropriate time windows for rate calculations
Step Selection
When querying metrics, the step interval determines how frequently data points are sampled. Here are the key points about step configuration:
We poll Prometheus integrations every 60 seconds (1 minute)
Step sizes must be ≤ 60 seconds
Step intervals should evenly divide into 60 seconds to ensure consistent metric sampling
For example, valid step intervals include: 1s, 2s, 3s, 4s, 5s, 6s, 10s, 12s, 15s, 20s, 30s, and 60s.
Aggregation: Include necessary labels in aggregations
For more complex queries or specific use cases, consult our support team or refer to the Prometheus querying documentation.
Step 3: Configuration Setup
Combine the outcomes from Step 1 (Authentication) and Step 2 (Queries) into your main configuration file. Here's an example:
Tip: For the latest configuration examples and templates, check our GitHub repository. We regularly update these templates with best practices and new features.
Frequently Asked Questions (FAQ)
Authentication
Q: Can I use multiple authentication methods simultaneously? A: No, authentication methods are mutually exclusive. Choose one that best fits your security requirements.
Q: How often should I rotate credentials? A: Best practice is to rotate credentials every 90 days or immediately if compromised.
Organization Identification
Q: What happens if the organization identifier is missing? A: The integration will:
Use the
fallback_org_id
if configuredStop with an error if no
fallback_org_id
is provided
Q: Can I use different organization identifiers for different queries? A: Yes, each query can specify its own organization_identifier
and fallback_org_id
.
Metrics and Queries
Q: How oftQ: How often does slaOS collect metrics? A: After initial backfilling of historical data, slaOS queries the data source every 60 seconds. The frequency of data points within each 60-second window is determined by the step
parameter in your query configuration.
Q: Can I query logs through Prometheus? A: No, the Prometheus integration only supports metric queries. For log analysis, please use other supported integrations like CloudWatch. We plan to integrate promQL compatible log systems soon.
For any additional questions or issues, please contact the slaOS support team on Slack.
Last updated