# Data hashing & transformation

## Overview

The `rated-parser` library provides powerful data processing capabilities with built-in privacy features to help you handle sensitive data responsibly. This guide explains how to use these features while maintaining GDPR compliance.

## Field Processing Options

#### Basic Field Definition

Every field in your metrics is defined by a `key` that maps to the corresponding value in your data. For example:

```json
{
  "user_email": "john@example.com",
  "request_count": 150,
  "response_time_ms": 250
}
```

## Privacy Protection Options

### **1. Encryption**

Use encryption when you need to retrieve the original value later (e.g., for debugging or customer support).

**Example Use Cases:**

* User identifiers
* Email addresses
* IP addresses
* Session IDs

```json
{
  "version": 1,
  "fields": [
    {
      "key": "user_email",
      "encryption": true
    }
  ]
}
```

When processed, the email becomes an encrypted string that can only be decrypted with your encryption key:

```json
{
  "user_email": "AES256.cbc.f7d9a1b2..."
}
```

### **2. Hashing**

Use hashing when you need to track metrics without storing the original value. Hashed values cannot be reversed.

Our implementation uses:

* Algorithm: SHA-256
* Encoding: UTF-8
* Output Format: Hexadecimal digest (64 characters)

These specifications ensure consistent hash generation across different systems. The code implementation is:

```python
def hash_value(value):
    return sha256(str(value).encode()).hexdigest()
```

**Example Use Cases:**

* Organization IDs for analytics
* Device IDs for unique user counting
* Transaction IDs for deduplication

```json
{
  "version": 1,
  "fields": [
    {
      "key": "organization_id",
      "hash": true
    }
  ]
}
```

Results in:

```json
{
  "organization_id": "sha256.8f4e8d9c..."
}
```

## Data Transformations

### **1. Expression Transformations**

Use expressions when you need to modify values using simple mathematical or string operations.

**Example Use Cases:**

* Converting units (bytes to MB, seconds to milliseconds)
* Normalizing string formats
* Basic calculations

```json
{
  "version": 1,
  "fields": [
    {
      "key": "memory_usage",
      "transformation": "value / (1024 * 1024)",
      "transformation_type": "expression"
    }
  ]
}
```

This transforms memory usage from bytes to MB:

```json
Input:  { "memory_usage": 1048576 }
Output: { "memory_usage": 1.0 }
```

### **2. Function Transformations**

Use predefined functions for more complex transformations.

**Example Use Cases:**

* Duration string parsing
* HTTP status code categorization
* String normalization

```json
{
  "version": 1,
  "fields": [
    {
      "key": "duration",
      "transformation": "duration_to_ms",
      "transformation_type": "function"
    }
  ]
}
```

This converts duration strings to milliseconds:

```json
Input:  { "duration": "1.5s" }
Output: { "duration": 1500.0 }
```

## Built-in Safety Features

1. **Field Protection:**
   * Cannot combine encryption and hashing on the same field
   * Automatic validation of transformation expressions
   * Protection against injection attacks
2. **Transformation Safety:**
   * Restricted to safe mathematical operations
   * Limited to approved string methods
   * No access to system functions or dangerous operations

## Example Implementation

Here's a complete example showing different types of field processing:

```json
{
  "version": 1,
  "fields": [
    {
      "key": "user_id",
      "encryption": true
    },
    {
      "key": "organization_id",
      "hash": true
    },
    {
      "key": "response_time",
      "transformation": "value * 1000",
      "transformation_type": "expression"
    },
    {
      "key": "status_code",
      "transformation": "status_class",
      "transformation_type": "function"
    }
  ]
}
```

Input data:

```json
{
  "user_id": "user_123",
  "organization_id": "org_456",
  "response_time": 0.45,
  "status_code": 404
}
```

Output data:

```json
{
  "user_id": "AES256.cbc.a1b2c3...",
  "organization_id": "sha256.d4e5f6...",
  "response_time": 450.0,
  "status_code": "4xx"
}
```

This processed data is now ready for storage or analysis while maintaining privacy and compliance requirements.
