> For the complete documentation index, see [llms.txt](https://docs.rated.co/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.rated.co/onboarding-your-data/filters/data-hashing-and-transformation.md).

# Data hashing & transformation

## Overview

The `rated-parser` library provides powerful data processing capabilities with built-in privacy features to help you handle sensitive data responsibly. This guide explains how to use these features while maintaining GDPR compliance.

## Field Processing Options

#### Basic Field Definition

Every field in your metrics is defined by a `key` that maps to the corresponding value in your data. For example:

```json
{
  "user_email": "john@example.com",
  "request_count": 150,
  "response_time_ms": 250
}
```

## Privacy Protection Options

### **1. Encryption**

Use encryption when you need to retrieve the original value later (e.g., for debugging or customer support).

**Example Use Cases:**

* User identifiers
* Email addresses
* IP addresses
* Session IDs

```json
{
  "version": 1,
  "fields": [
    {
      "key": "user_email",
      "encryption": true
    }
  ]
}
```

When processed, the email becomes an encrypted string that can only be decrypted with your encryption key:

```json
{
  "user_email": "AES256.cbc.f7d9a1b2..."
}
```

### **2. Hashing**

Use hashing when you need to track metrics without storing the original value. Hashed values cannot be reversed.

Our implementation uses:

* Algorithm: SHA-256
* Encoding: UTF-8
* Output Format: Hexadecimal digest (64 characters)

These specifications ensure consistent hash generation across different systems. The code implementation is:

```python
def hash_value(value):
    return sha256(str(value).encode()).hexdigest()
```

**Example Use Cases:**

* Organization IDs for analytics
* Device IDs for unique user counting
* Transaction IDs for deduplication

```json
{
  "version": 1,
  "fields": [
    {
      "key": "organization_id",
      "hash": true
    }
  ]
}
```

Results in:

```json
{
  "organization_id": "sha256.8f4e8d9c..."
}
```

## Data Transformations

### **1. Expression Transformations**

Use expressions when you need to modify values using simple mathematical or string operations.

**Example Use Cases:**

* Converting units (bytes to MB, seconds to milliseconds)
* Normalizing string formats
* Basic calculations

```json
{
  "version": 1,
  "fields": [
    {
      "key": "memory_usage",
      "transformation": "value / (1024 * 1024)",
      "transformation_type": "expression"
    }
  ]
}
```

This transforms memory usage from bytes to MB:

```json
Input:  { "memory_usage": 1048576 }
Output: { "memory_usage": 1.0 }
```

### **2. Function Transformations**

Use predefined functions for more complex transformations.

**Example Use Cases:**

* Duration string parsing
* HTTP status code categorization
* String normalization

```json
{
  "version": 1,
  "fields": [
    {
      "key": "duration",
      "transformation": "duration_to_ms",
      "transformation_type": "function"
    }
  ]
}
```

This converts duration strings to milliseconds:

```json
Input:  { "duration": "1.5s" }
Output: { "duration": 1500.0 }
```

## Built-in Safety Features

1. **Field Protection:**
   * Cannot combine encryption and hashing on the same field
   * Automatic validation of transformation expressions
   * Protection against injection attacks
2. **Transformation Safety:**
   * Restricted to safe mathematical operations
   * Limited to approved string methods
   * No access to system functions or dangerous operations

## Example Implementation

Here's a complete example showing different types of field processing:

```json
{
  "version": 1,
  "fields": [
    {
      "key": "user_id",
      "encryption": true
    },
    {
      "key": "organization_id",
      "hash": true
    },
    {
      "key": "response_time",
      "transformation": "value * 1000",
      "transformation_type": "expression"
    },
    {
      "key": "status_code",
      "transformation": "status_class",
      "transformation_type": "function"
    }
  ]
}
```

Input data:

```json
{
  "user_id": "user_123",
  "organization_id": "org_456",
  "response_time": 0.45,
  "status_code": 404
}
```

Output data:

```json
{
  "user_id": "AES256.cbc.a1b2c3...",
  "organization_id": "sha256.d4e5f6...",
  "response_time": 450.0,
  "status_code": "4xx"
}
```

This processed data is now ready for storage or analysis while maintaining privacy and compliance requirements.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.rated.co/onboarding-your-data/filters/data-hashing-and-transformation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
