Data hashing & transformation

Overview

The rated-parser library provides powerful data processing capabilities with built-in privacy features to help you handle sensitive data responsibly. This guide explains how to use these features while maintaining GDPR compliance.

Field Processing Options

Basic Field Definition

Every field in your metrics is defined by a key that maps to the corresponding value in your data. For example:

{
  "user_email": "[email protected]",
  "request_count": 150,
  "response_time_ms": 250
}

Privacy Protection Options

1. Encryption

Use encryption when you need to retrieve the original value later (e.g., for debugging or customer support).

Example Use Cases:

User identifiers
Email addresses
IP addresses
Session IDs

{
  "version": 1,
  "fields": [
    {
      "key": "user_email",
      "encryption": true
    }
  ]
}

When processed, the email becomes an encrypted string that can only be decrypted with your encryption key:

{
  "user_email": "AES256.cbc.f7d9a1b2..."
}

2. Hashing

Use hashing when you need to track metrics without storing the original value. Hashed values cannot be reversed.

Our implementation uses:

Algorithm: SHA-256
Encoding: UTF-8
Output Format: Hexadecimal digest (64 characters)

These specifications ensure consistent hash generation across different systems. The code implementation is:

def hash_value(value):
    return sha256(str(value).encode()).hexdigest()

Example Use Cases:

Organization IDs for analytics
Device IDs for unique user counting
Transaction IDs for deduplication

{
  "version": 1,
  "fields": [
    {
      "key": "organization_id",
      "hash": true
    }
  ]
}

Results in:

{
  "organization_id": "sha256.8f4e8d9c..."
}

Data Transformations

1. Expression Transformations

Use expressions when you need to modify values using simple mathematical or string operations.

Example Use Cases:

Converting units (bytes to MB, seconds to milliseconds)
Normalizing string formats
Basic calculations

{
  "version": 1,
  "fields": [
    {
      "key": "memory_usage",
      "transformation": "value / (1024 * 1024)",
      "transformation_type": "expression"
    }
  ]
}

This transforms memory usage from bytes to MB:

Input:  { "memory_usage": 1048576 }
Output: { "memory_usage": 1.0 }

2. Function Transformations

Use predefined functions for more complex transformations.

Example Use Cases:

Duration string parsing
HTTP status code categorization
String normalization

{
  "version": 1,
  "fields": [
    {
      "key": "duration",
      "transformation": "duration_to_ms",
      "transformation_type": "function"
    }
  ]
}

This converts duration strings to milliseconds:

Input:  { "duration": "1.5s" }
Output: { "duration": 1500.0 }

Built-in Safety Features

Field Protection:
- Cannot combine encryption and hashing on the same field
- Automatic validation of transformation expressions
- Protection against injection attacks
Transformation Safety:
- Restricted to safe mathematical operations
- Limited to approved string methods
- No access to system functions or dangerous operations

Example Implementation

Here's a complete example showing different types of field processing:

{
  "version": 1,
  "fields": [
    {
      "key": "user_id",
      "encryption": true
    },
    {
      "key": "organization_id",
      "hash": true
    },
    {
      "key": "response_time",
      "transformation": "value * 1000",
      "transformation_type": "expression"
    },
    {
      "key": "status_code",
      "transformation": "status_class",
      "transformation_type": "function"
    }
  ]
}

Input data:

{
  "user_id": "user_123",
  "organization_id": "org_456",
  "response_time": 0.45,
  "status_code": 404
}

Output data:

{
  "user_id": "AES256.cbc.a1b2c3...",
  "organization_id": "sha256.d4e5f6...",
  "response_time": 450.0,
  "status_code": "4xx"
}

This processed data is now ready for storage or analysis while maintaining privacy and compliance requirements.

PreviousOpen source log parsing library NextCustom adapters

Last updated 9 months ago