Handling organization_id

Multi-tenant API service sxample

Let's explore common scenarios using a real-world example of a multi-tenant API service. In this example, you're running an API that serves multiple organizations, and you want to monitor various metrics. Some metrics are naturally split by organization (like latency and request volume), while others are collected globally (like error rates).

The organization_id in these examples could represent different identifiers depending on your use case:

  • customer_id: When serving multiple end customers (e.g., SaaS platform)

    • Email service monitoring delivery rates per business account

    • DEX monitoring liquidity provider positions

    • NFT marketplace tracking collection trading volume

  • vendor_id: When aggregating metrics across different suppliers or partners

    • Marketplace measuring seller performance metrics

    • RPC node provider tracking request volumes

    • Oracle service monitoring price feed updates

  • service_id: When monitoring multiple internal services or microservices

    • E-commerce tracking checkout service reliability

    • Bridge monitoring cross-chain transfers

    • Smart contract monitoring function calls

  • integration_id: When tracking metrics for different third-party integrations

    • Payment platform monitoring gateway success rates

    • Multi-chain wallet tracking transaction status

    • DEX aggregator monitoring swap routes

Scenario 1: Organization-Specific Metrics (Per-Organization Latency)

Context: Your API tracks request latency per organization, which is essential for:

  • Monitoring individual organization experience

  • Meeting specific SLAs per organization

  • Identifying organization-specific performance issues

# Prometheus metrics
# Each request is tagged with organization_id
api_request_latency_seconds{organization_id="org123", endpoint="/api/v1/users"} 0.45
api_request_latency_seconds{organization_id="org456", endpoint="/api/v1/users"} 0.32

# slaOS configuration
queries:
  - query: 'histogram_quantile(0.95, sum by (le, organization_id) (rate(api_request_latency_seconds_bucket[5m])))'
    step: 
      value: 60
      unit: "s"
    slaos_metric_name: "p95_latency"
    organization_identifier: "organization_id"  # Each organization gets their own latency metrics

Use Case Examples:

  • SaaS Platform: Track response times for each customer's API usage

  • Marketplace: Monitor transaction processing times for different vendors

  • Microservices: Measure inter-service communication latency

  • Integration Platform: Track external API call latencies per integration

Scenario 2: Service-Wide Metrics (Global Error Rates)

Context: Your API tracks error counts globally due to:

  • Infrastructure limitations

  • Metric collection setup

  • No business need to track errors per organization

# Prometheus metrics
# Error counts are only tagged with status code
http_errors_total{status="500"} 10
http_errors_total{status="400"} 25
http_errors_total{status="200"} 1000
# slaOS configuration
queries:
  - query: 'sum(rate(http_errors_total{status=~"5.."}[5m])) / sum(rate(http_errors_total[5m]))'
    step: 
      value: 60
      unit: "s"
    slaos_metric_name: "error_rate"
    fallback_org_id: "global_service"  # All error metrics go to a default organization

In this case:

  • Error metrics don't have organization identification

  • Using fallback_org_id assigns all error rates to a default organization

  • Useful for service-wide SLAs or general monitoring

  • All organizations reference the same error rate metrics

Scenario 3: Mixed Metrics (Combined Approach)

# Organization-specific requests
api_requests_total{organization_id="org123", endpoint="/api/v1/users"} 150
api_requests_total{organization_id="org456", endpoint="/api/v1/orders"} 75

# Public endpoint requests (no organization_id)
api_requests_total{endpoint="/public/status"} 50
api_requests_total{endpoint="/health"} 25

# slaOS configuration
queries:
  - query: 'sum by (organization_id) (rate(api_requests_total[5m]))'
    step: 
      value: 60
      unit: "s"
    slaos_metric_name: "request_rate"
    organization_identifier: "organization_id"
    fallback_org_id: "public_endpoints"  # For requests without organization_id

Best Practices for Mixed Environments

Consistent Labeling:

# Good - consistent organization identification
api_latency_seconds{organization_id="org123", ...}
api_requests_total{organization_id="org123", ...}

# Avoid - inconsistent labeling
api_latency_seconds{organization_id="org123", ...}
api_requests_total{client="org123", ...}  # Different label name

Clear Separation:

queries:
  # Organization-specific latency
  - query: 'histogram_quantile(0.95, sum by (le, organization_id) (rate(api_latency_seconds_bucket[5m])))'
    organization_identifier: "organization_id"
    slaos_metric_name: "org_latency"

  # Global error rates
  - query: 'sum(rate(http_errors_total{status=~"5.."}[5m])) / sum(rate(http_errors_total[5m]))'
    fallback_org_id: "global_service"
    slaos_metric_name: "global_error_rate"

Meaningful Fallback IDs:

# Descriptive fallback IDs
fallback_org_id: "public_api_endpoints"    # Clear purpose
fallback_org_id: "unauthenticated_users"   # Clear purpose

# Avoid generic fallbacks
fallback_org_id: "default"                 # Too generic
fallback_org_id: "other"                   # Not descriptive

Remember:

  • Choose the appropriate organization identifier based on your use case

  • Not all metrics need to be split by organization

  • Use fallback IDs thoughtfully and consistently

  • Document your choices for future reference

  • Consider future changes in metric collection

  • Balance granularity with system complexity

Last updated