# Configuration yaml

This guide provides a detailed explanation of the configuration YAML file used for the slaOS self-hosted indexer solution. Understanding this configuration is crucial for setting up and customizing your self-hosted indexer.

{% embed url="<https://github.com/rated-network/rated-log-indexer>" %}

## Configuration File Structure

The configuration is organized into three main sections:

1. `inputs`: A list of input configurations
2. `output`: Configuration for the output destination
3. `secrets`: Configuration for secrets management

{% hint style="warning" %}
Our [GitHub repository](https://github.com/rated-network/rated-log-indexer) maintains an extensive collection of up-to-date and thoroughly tested configuration templates. These templates cover all sections of the indexer configuration and include the latest supported integrations.
{% endhint %}

Let's explore each section in detail.

### Section 1: Input

The `inputs` section defines the data sources for your indexer. It is a list of integration objects, you can run more than one input/integration concurrently fully managed.

It specifies which integration to use and the necessary configuration for that integration.

```yaml
inputs:
  - integration: <integration_type>
    slaos_key: <unique_identifier>
    type: <logs_or_metrics>
    <integration_specific_config>
    filters: <optional_filter_config>
    offset: <offset_config>
```

<details>

<summary>Example configuration for Cloudwatch logs</summary>

```
 inputs:
  - integration: cloudwatch
    integration_prefix: "cloudwatch_logs_test"
    type: logs
    cloudwatch:
      region: us-east-1
      aws_access_key_id: AKIAXXXXX
      aws_secret_access_key: X/XX+XXXX
      logs_config:
        log_group_name: "/aws/apprunner/prod-rated-api/32cf02da3ba8495f87ad79806b0521e5/application"
        filter_pattern: '{ $.event = "request_finished" }'
    filters:
      version: 1
      log_format: json_dict
      log_example: { }
      fields:
        - key: "status_code"
          value: "22"
          field_type: "integer"
          path: "status_code"
        - key: "organization_id"
          value: "e6bd1f68367b4eee993f247e7301107a"
          field_type: "string"
          path: "user.id"
        - key: "path"
          value: "operators"
          field_type: "string"
          path: "request_route_name"
    offset:
      type: redis
      override_start_from: true
      start_from: 1724803200000
      start_from_type: bigint
      redis:
        host: redis
        port: 6379
        db: 0
```

Extract from GitHub Repository [input template examples](https://github.com/rated-network/rated-log-indexer/tree/main/templates/inputs).

</details>

**Key components**

* **`integration`**: Specifies the data source (e.g., cloudwatch, `datadog`). This determines which integration-specific configuration is required.
* **`slaos_key`**: A unique identifier for the input. This is used to differentiate data submitted to slaOS when multiple integrations are running.
  * **Example**: If `slaos_key` is set to "*prod\_api\_cloudwatch*", a data point with key "*status\_code*" will be submitted to slaOS as "*prod\_api\_cloudwatch\_status\_code*".
  * **Validation**: Each `slaos_key` must be unique across all inputs to avoid conflicts.
  * **Context**: The `slaos_key` is mandatory when using more than one integration. It prevents conflicts in data submitted to slaOS by prefixing all data points from this input with the specified prefix.
* **`type`**: Specifies "logs" or "metrics". This determines how the input data is processed and which additional configurations (like filters) are required.
* **`filters`**: Configuration for data filtering. This is only applicable and required for log-type inputs. It defines how log data should be parsed and transformed.
* **`offset`**: Configuration for tracking the last processed position in the data stream. This ensures idempotent operation and allows for efficient data processing, especially after interruptions or for backfills.

{% hint style="info" %}
Tested input examples can be found in [inputs template directory](https://github.com/rated-network/rated-log-indexer/tree/main/templates/inputs) on our GitHub repository.
{% endhint %}

#### Filters section

The `filters` section defines how the indexer processes and transforms input data. This is where you specify the log format and define the fields you want to extract. It is only applicable for log-type inputs and is not needed for metrics.

**Structure**

```yaml
filters:
  version: <version_number>
  log_format: <format_type>
  log_example: <example_log_entry>
  fields:
    - key: <field_name>
      path: <json_path>
      field_type: <data_type>
```

**Example**

```yaml
filters:
  version: 1
  log_format: json_dict
  log_example: { "timestamp": "2023-01-01T00:00:00Z", "level": "INFO", "message": "Example log" }
  fields:
    - key: "timestamp"
      path: "timestamp"
      field_type: "timestamp"
    - key: "level"
      path: "level"
      field_type: "string"
      hash: true
    - key: "message"
      path: "message"
      field_type: "string"
```

For a more detailed explanation of how filters work, please refer to:

{% content-ref url="/pages/CjwxFTBSg5t2WiXhkqpB" %}
[Filters](/onboarding-your-data/filters.md)
{% endcontent-ref %}

#### Offset section

The `offset` section is responsible for tracking the last processed position in the input data stream. This ensures idempotent operation and allows for efficient data processing.

**Structure**

```yaml
offset:
  type: <storage_type>
  override_start_from: <boolean>
  start_from: <start_position>
  start_from_type: <data_type>
  <storage_specific_config>
```

* The `override_start_from` option is particularly useful for backfills, allowing you to specify a starting point for data processing.

**Examples**

{% tabs %}
{% tab title="Redis offset" %}

```yaml
offset:
  type: redis
  override_start_from: true
  start_from: 1724803200000
  start_from_type: bigint
  redis:
    host: redis
    port: 6379
    db: 0
```

{% endtab %}

{% tab title="Postgres offset" %}

```yaml
offset:
  type: postgres
  override_start_from: true
  start_from: 123456789
  start_from_type: bigint
  postgres:
    table_name: offset_tracking
    host: localhost
    port: 5432
    database: postgres
    user: postgres
    password: postgres
```

{% endtab %}

{% tab title="slaOS offset" %}

```yaml
offset:
  type: slaos
  override_start_from: true
  start_from: 123456789
  start_from_type: bigint
  ingestion_id: ingestion-id
    ingestion_key: ingestion-key
    ingestion_url: https://api.rated.co/v1/ingest
    datastream_filter:
      key: datastream_key
      organization_id: customer_one
```

{% hint style="info" %}
The `datastream_filter` is used to identify the offset related to this specific instance of the indexer. The `key` is a required field and corresponds to the `slaos_key` associated with this instance.&#x20;

We also provide an optional filter parameter on `organization_id`. This should only be used if you have multiple instances of the indexer using the same `key`.  A typical example of when this might happen is when indexing metrics for a resource used for particular customer or organization.\
\
For values that have been hashed in the `filters` config, prefix with `hash:` (e.g., `hash:value`).
{% endhint %}
{% endtab %}
{% endtabs %}

{% embed url="<https://github.com/rated-network/rated-log-indexer/tree/main/templates/offset>" %}

### Section 2: Output

The `output` section defines where the processed data should be sent. slaOS supports two output types: `rated` (for sending data to direct ingestion API) and `console` (for debugging purposes).

```yaml
output:
  type: rated
  rated:
    ingestion_id: your_ingestion_id
    ingestion_key: your_ingestion_key
    ingestion_url: https://rated.live/v1/ingest
```

To obtain the `ingestion_id` and `ingestion_key`, you need to create an account on the slaOS platform. Once logged in, navigate to the API management section where you can generate and manage your ingestion credentials.

To use console output for debugging, you can configure it like this:

```yaml
output:
  type: console
  console:
    verbose: true
```

### Section 3: Secrets

The `secrets` section allows you to use a secrets manager for sensitive configuration values.

```yaml
secrets:
  use_secrets_manager: true
  
```

If `use_secrets_manager` is set to `true`, any value in the YAML that starts with "secret:" will be resolved using the specified secrets manager. For example:

```yaml
secrets:
  use_secrets_manager: true
  provider: aws
  aws:
    region: us-west-2
    aws_access_key_id: fake_access_key
    aws_secret_access_key: fake_secret_key
```

Let's break down each part of this configuration:

* `use_secrets_manager: true`: This enables the use of the secrets manager.
* `provider: aws`: This specifies that we're using AWS as our secrets provider.
* `aws`: This section contains the configuration specific to AWS:
  * `region: us-west-2`: The AWS region where your secrets are stored.
  * `aws_access_key_id`: Your AWS access key ID for accessing the secrets manager.
  * `aws_secret_access_key`: Your AWS secret access key for accessing the secrets manager.

<details>

<summary>IAM Policy - using AWS Secrets manager</summary>

If you are self-hosting slaOS and using AWS Secrets Manager to store sensitive information like access keys, you need to configure additional IAM permissions.

1. **Create a Secrets Manager Policy:** Use the following JSON document to create a policy.

   ```json
   {
     "Version": "2012-10-17",
     "Statement": [
       {
         "Sid": "SecretsManagerAccess",
         "Effect": "Allow",
         "Action": [
           "secretsmanager:GetSecretValue",
           "secretsmanager:DescribeSecret"
         ],
         "Resource": "arn:aws:secretsmanager:*:*:secret:*"
       }
     ]
   }
   ```
2. **Attach the Policy:** Name the policy `RatedSecretsManagerAccessPolicy` and attach it to the IAM user created for slaOS.

</details>

## Conclusion

Understanding and properly configuring your self-hosted slaOS indexer is key to effectively processing your data. Always refer to the most up-to-date documentation on our GitHub repository for detailed setup instructions and best practices.

If you encounter any issues or have questions about your configuration, don't hesitate to reach out to our support team at [hello@rated.network](emailto:hello@rated.network) or consult the community forums.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.rated.co/onboarding-your-data/self-hosting/configuration-yaml.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
