# Aegis (IAM) Specification > Version: 1.0 | Status: Draft | Last Updated: 2025-12-08 ## 1. Overview ### 1.1 Purpose Aegis is the Identity and Access Management (IAM) platform providing authentication, authorization, and multi-tenant access control for all cloud services. It implements RBAC (Role-Based Access Control) with ABAC (Attribute-Based Access Control) extensions. The name "Aegis" (shield of Zeus) reflects its role as the protective layer that guards access to all platform resources. ### 1.2 Scope - **In scope**: Principals (users, service accounts, groups), roles, permissions, policy bindings, scope hierarchy (System > Org > Project > Resource), internal token issuance/validation, external identity federation (OIDC/JWT), authorization decision service (PDP), audit event generation - **Out of scope**: User password management (delegated to external IdP), UI for authentication, API gateway/rate limiting ### 1.3 Design Goals - **AWS IAM / GCP IAM compatible**: Familiar concepts and API patterns - **Multi-tenant from day one**: Full org/project hierarchy with scope isolation - **Flexible RBAC + ABAC hybrid**: Roles with conditional permissions - **High-performance authorization**: Sub-millisecond decisions with caching - **Zero-trust security**: Default deny, explicit grants, audit everything - **Cloud-grade scalability**: Handle millions of decisions per second ## 2. Architecture ### 2.1 Crate Structure ``` iam/ ├── crates/ │ ├── iam-api/ # gRPC service implementations │ ├── iam-audit/ # Audit logging (planned) │ ├── iam-authn/ # Authentication (tokens, OIDC) │ ├── iam-authz/ # Authorization engine (PDP) │ ├── iam-client/ # Rust client library │ ├── iam-server/ # Server binary │ ├── iam-store/ # Storage backends (Chainfire, FlareDB, Memory) │ └── iam-types/ # Core types └── proto/ └── iam.proto # gRPC definitions ``` ### 2.2 Authorization Flow ``` [Client Request] → [IamAuthz Service] ↓ [Fetch Principal] ↓ [Build Resource Context] ↓ [PolicyEvaluator] ↓ ┌───────────────┼───────────────┐ ↓ ↓ ↓ [Get Bindings] [Get Roles] [Cache Lookup] ↓ ↓ ↓ └───────────────┼───────────────┘ ↓ [Evaluate Permissions] ↓ [Condition Check] ↓ [ALLOW / DENY] ``` ### 2.3 Dependencies | Crate | Version | Purpose | |-------|---------|---------| | tokio | 1.x | Async runtime | | tonic | 0.12 | gRPC framework | | prost | 0.13 | Protocol buffers | | dashmap | 6.x | Concurrent cache | | ipnetwork | 0.20 | CIDR matching | | glob-match | 0.2 | Resource pattern matching | ## 3. Core Concepts ### 3.1 Principals Identities that can be authenticated and authorized. ```rust pub struct Principal { pub id: String, // Unique identifier pub kind: PrincipalKind, // User | ServiceAccount | Group pub name: String, // Display name pub org_id: Option, // Organization membership pub project_id: Option, // For service accounts pub email: Option, // For users pub oidc_sub: Option, // Federated identity subject pub node_id: Option, // For node-bound service accounts pub metadata: HashMap, pub created_at: u64, pub updated_at: u64, pub enabled: bool, } pub enum PrincipalKind { User, // Human users ServiceAccount, // Machine identities Group, // Collections (future) } ``` **Principal Reference**: `kind:id` format - `user:alice` - `service_account:compute-agent` ### 3.2 Roles Named collections of permissions. ```rust pub struct Role { pub name: String, // e.g., "ProjectAdmin" pub display_name: String, pub description: String, pub scope: Scope, // Where role can be assigned pub permissions: Vec, pub builtin: bool, // System-defined, immutable pub created_at: u64, pub updated_at: u64, } ``` **Builtin Roles**: | Role | Scope | Description | |------|-------|-------------| | SystemAdmin | System | Full cluster access | | OrgAdmin | Org | Full organization access | | ProjectAdmin | Project | Full project access | | ProjectMember | Project | Own resources + read all | | ReadOnly | Project | Read-only project access | | ServiceRole-ComputeAgent | Resource | Node-scoped compute | | ServiceRole-StorageAgent | Resource | Node-scoped storage | ### 3.3 Permissions Individual access rights within roles. ```rust pub struct Permission { pub action: String, // e.g., "compute:instances:create" pub resource_pattern: String, // e.g., "org/*/project/${project}/instances/*" pub condition: Option, } ``` **Action Format**: `service:resource:operation` - Wildcards: `*`, `compute:*`, `compute:instances:*` - Examples: `compute:instances:create`, `storage:volumes:delete` **Resource Pattern Format**: `org/{org_id}/project/{project_id}/{kind}/{id}` - Wildcards: `org/*/project/*/instances/*` - Variables: `${principal.id}`, `${project}` ### 3.4 Policy Bindings Assignments of roles to principals within a scope. ```rust pub struct PolicyBinding { pub id: String, // UUID pub principal_ref: PrincipalRef, pub role_ref: String, // "roles/ProjectAdmin" pub scope: Scope, pub condition: Option, pub created_at: u64, pub updated_at: u64, pub created_by: String, pub expires_at: Option, // Time-limited access pub enabled: bool, } ``` ## 4. Scope Hierarchy Four-level hierarchical boundary for permissions. ``` System (level 0) ← Cluster-wide └─ Organization (level 1) ← Tenant boundary └─ Project (level 2) ← Workload isolation └─ Resource (level 3) ← Individual resource ``` ### 4.1 Scope Types ```rust pub enum Scope { System, Org { id: String }, Project { id: String, org_id: String }, Resource { id: String, project_id: String, org_id: String }, } ``` ### 4.2 Scope Containment ```rust impl Scope { // System contains everything // Org contains its projects and resources // Project contains its resources fn contains(&self, other: &Scope) -> bool; // Get parent scope fn parent(&self) -> Option; // Get all ancestors up to System fn ancestors(&self) -> Vec; } ``` ### 4.3 Scope Storage Keys ``` system org/{org_id} org/{org_id}/project/{project_id} org/{org_id}/project/{project_id}/resource/{resource_id} ``` ## 5. API ### 5.1 Authorization Service (PDP) ```protobuf service IamAuthz { rpc Authorize(AuthorizeRequest) returns (AuthorizeResponse); rpc BatchAuthorize(BatchAuthorizeRequest) returns (BatchAuthorizeResponse); } message AuthorizeRequest { PrincipalRef principal = 1; string action = 2; // "compute:instances:create" ResourceRef resource = 3; AuthzContext context = 4; // IP, timestamp, metadata } message AuthorizeResponse { bool allowed = 1; string reason = 2; string matched_binding = 3; string matched_role = 4; } message ResourceRef { string kind = 1; // "instance" string id = 2; // "vm-123" string org_id = 3; // Required string project_id = 4; // Required optional string owner_id = 5; optional string node_id = 6; optional string region = 7; map tags = 8; } ``` ### 5.2 Admin Service (Management) ```protobuf service IamAdmin { // Principals rpc CreatePrincipal(CreatePrincipalRequest) returns (Principal); rpc GetPrincipal(GetPrincipalRequest) returns (Principal); rpc UpdatePrincipal(UpdatePrincipalRequest) returns (Principal); rpc DeletePrincipal(DeletePrincipalRequest) returns (Empty); rpc ListPrincipals(ListPrincipalsRequest) returns (ListPrincipalsResponse); // Roles rpc CreateRole(CreateRoleRequest) returns (Role); rpc GetRole(GetRoleRequest) returns (Role); rpc UpdateRole(UpdateRoleRequest) returns (Role); rpc DeleteRole(DeleteRoleRequest) returns (Empty); rpc ListRoles(ListRolesRequest) returns (ListRolesResponse); // Bindings rpc CreateBinding(CreateBindingRequest) returns (PolicyBinding); rpc GetBinding(GetBindingRequest) returns (PolicyBinding); rpc UpdateBinding(UpdateBindingRequest) returns (PolicyBinding); rpc DeleteBinding(DeleteBindingRequest) returns (Empty); rpc ListBindings(ListBindingsRequest) returns (ListBindingsResponse); } ``` ### 5.3 Token Service ```protobuf service IamToken { rpc IssueToken(IssueTokenRequest) returns (IssueTokenResponse); rpc ValidateToken(ValidateTokenRequest) returns (ValidateTokenResponse); rpc RevokeToken(RevokeTokenRequest) returns (Empty); rpc RefreshToken(RefreshTokenRequest) returns (RefreshTokenResponse); } message InternalTokenClaims { string principal_id = 1; PrincipalKind principal_kind = 2; string principal_name = 3; repeated string roles = 4; // Pre-loaded roles Scope scope = 5; optional string org_id = 6; optional string project_id = 7; optional string node_id = 8; uint64 iat = 9; // Issued at (TSO) uint64 exp = 10; // Expires at (TSO) string session_id = 11; AuthMethod auth_method = 12; // Jwt | Mtls | ApiKey } ``` ## 6. Authorization Logic ### 6.1 Evaluation Algorithm ``` evaluate(request): 1. Default DENY 2. resource_scope = Scope::from(request.resource) 3. bindings = get_effective_bindings(principal, resource_scope) 4. For each binding where binding.is_active(now): a. role = get_role(binding.role_ref) b. If binding.condition exists and !evaluate_condition(binding.condition): continue c. If evaluate_role(role, request): return ALLOW 5. Return DENY ``` ### 6.2 Role Permission Evaluation ``` evaluate_role(role, request): For each permission in role.permissions: 1. If !matches_action(permission.action, request.action): continue 2. resource_path = request.resource.to_path() pattern = substitute_variables(permission.resource_pattern) If !matches_resource(pattern, resource_path): continue 3. If permission.condition exists and !evaluate_condition(permission.condition): continue 4. return true // Permission matches return false ``` ### 6.3 Action Matching ```rust matches_action("compute:*", "compute:instances:create") // true matches_action("compute:instances:*", "compute:volumes:create") // false matches_action("*", "anything:here:works") // true ``` ### 6.4 Resource Matching ```rust // Path format: org/{org}/project/{proj}/{kind}/{id} matches_resource("org/*/project/*/instance/*", "org/org-1/project/proj-1/instance/vm-1") // true matches_resource("org/org-1/project/proj-1/*", "org/org-1/project/proj-1/instance/vm-1") // true (trailing /*) ``` ## 7. Conditions (ABAC) ### 7.1 Condition Types ```rust pub enum Condition { // String StringEquals { key: String, value: String }, StringNotEquals { key: String, value: String }, StringLike { key: String, pattern: String }, // Glob pattern StringEqualsAny { key: String, values: Vec }, // Numeric NumericEquals { key: String, value: i64 }, NumericLessThan { key: String, value: i64 }, NumericGreaterThan { key: String, value: i64 }, // Network IpAddress { key: String, cidr: String }, // CIDR matching NotIpAddress { key: String, cidr: String }, // Temporal TimeBetween { start: String, end: String }, // HH:MM or Unix timestamp // Existence Exists { key: String }, // Boolean Bool { key: String, value: bool }, // Logical And { conditions: Vec }, Or { conditions: Vec }, Not { condition: Box }, } ``` ### 7.2 Variable Context ```rust // Available variables for condition evaluation principal.id, principal.kind, principal.name principal.org_id, principal.project_id, principal.node_id principal.email, principal.metadata.{key} resource.kind, resource.id resource.org_id, resource.project_id resource.owner, resource.node, resource.region resource.tags.{key} request.source_ip, request.time request.method, request.path request.metadata.{key} ``` ### 7.3 Variable Substitution ```rust // In permission patterns "org/${principal.org_id}/project/${project}/*" // In conditions Condition::string_equals("resource.owner", "${principal.id}") ``` ### 7.4 Example: Owner-Only Access ```rust Permission { action: "compute:instances:*", resource_pattern: "org/*/project/*/instance/*", condition: Some(Condition::string_equals( "resource.owner", "${principal.id}" )), } ``` ## 8. Storage ### 8.1 Backend Abstraction ```rust pub trait StorageBackend: Send + Sync { async fn get(&self, key: &str) -> Result, u64)>>; async fn put(&self, key: &str, value: &[u8]) -> Result; async fn cas(&self, key: &str, expected: u64, value: &[u8]) -> Result; async fn delete(&self, key: &str) -> Result; async fn scan_prefix(&self, prefix: &str, limit: usize) -> Result>; } ``` **Supported Backends**: - **Chainfire**: Production distributed KV - **FlareDB**: Alternative distributed DB - **Memory**: Testing ### 8.2 Key Schema **Principals**: ``` iam/principals/{kind}/{id} # Primary iam/principals/by-org/{org_id}/{kind}/{id} # Org index iam/principals/by-project/{project_id}/{id} # Project index iam/principals/by-email/{email} # Email lookup iam/principals/by-oidc/{iss_hash}/{sub} # OIDC lookup ``` **Roles**: ``` iam/roles/{name} # Primary iam/roles/by-scope/{scope}/{name} # Scope index iam/roles/builtin/{name} # Builtin marker ``` **Bindings**: ``` iam/bindings/scope/{scope}/principal/{principal}/{id} # Primary iam/bindings/by-principal/{principal}/{id} # Principal index iam/bindings/by-role/{role}/{id} # Role index ``` ### 8.3 Caching ```rust pub struct PolicyCache { bindings: DashMap>, roles: DashMap, config: CacheConfig, } impl PolicyCache { fn get_bindings(&self, principal: &PrincipalRef) -> Option>; fn put_bindings(&self, principal: &PrincipalRef, bindings: Vec); fn invalidate_principal(&self, principal: &PrincipalRef); fn invalidate_role(&self, name: &str); } ``` ## 9. Configuration ### 9.1 Config File Format (TOML) ```toml [server] addr = "0.0.0.0:50051" [server.tls] cert_file = "/etc/aegis/tls/server.crt" key_file = "/etc/aegis/tls/server.key" ca_file = "/etc/aegis/tls/ca.crt" # For client cert verification require_client_cert = true # Enable mTLS [store] backend = "chainfire" # "memory" | "chainfire" | "flaredb" chainfire_endpoints = ["http://localhost:2379"] # flaredb_endpoint = "http://localhost:5000" # flaredb_namespace = "iam" [authn] [authn.jwt] jwks_url = "https://auth.example.com/.well-known/jwks.json" issuer = "https://auth.example.com" audience = "aegis" jwks_cache_ttl_seconds = 3600 [authn.internal_token] signing_key = "base64-encoded-256-bit-key" issuer = "aegis" default_ttl_seconds = 3600 # 1 hour max_ttl_seconds = 604800 # 7 days [logging] level = "info" # "debug" | "info" | "warn" | "error" format = "json" # "json" | "text" ``` ### 9.2 Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `IAM_CONFIG` | - | Path to config file | | `IAM_ADDR` | `0.0.0.0:50051` | Server listen address | | `IAM_LOG_LEVEL` | `info` | Log level | | `IAM_SIGNING_KEY` | - | Token signing key (overrides config) | | `IAM_STORE_BACKEND` | `memory` | Storage backend type | ### 9.3 CLI Arguments ``` aegis-server [OPTIONS] Options: -c, --config Config file path -a, --addr Listen address (overrides config) -l, --log-level Log level -h, --help Print help -V, --version Print version ``` ## 10. Multi-Tenancy ### 10.1 Organization Isolation - All principals have `org_id` (except System scope) - All resources require `org_id` and `project_id` - Scope containment enforces org boundaries ### 10.2 Project Isolation - Service accounts bound to projects - Resources belong to projects - Permissions scoped to `project/${project}/*` ### 10.3 Cross-Tenant Access Patterns | Pattern | Scope | Use Case | |---------|-------|----------| | System Admin | System | Platform operators | | Org Admin | Org | Organization administrators | | Project Admin | Project | Project owners | | Node Agent | Resource | Node-bound service accounts | ### 10.4 Node-Bound Service Accounts ```rust // Service account with node binding Principal { kind: ServiceAccount, node_id: Some("node-001"), ... } // Permission with node condition Permission { action: "compute:*", resource_pattern: "org/*/project/*/instance/*", condition: Some(Condition::string_equals( "resource.node", "${principal.node_id}" )), } ``` ## 11. Security ### 11.1 Authentication **External Identity (OIDC/JWT)**: - Validate JWT signature using JWKS from configured IdP - Verify issuer, audience, and expiration claims - Map OIDC `sub` claim to internal principal - JWKS cached with configurable TTL **Internal Tokens**: - HMAC-SHA256 signed tokens for service-to-service auth - Contains: principal_id, kind, roles, scope, org_id, project_id, exp, iat, session_id - Short-lived (default 1 hour, max 7 days) - Revocable via session_id **mTLS**: - Optional client certificate authentication - Certificate CN mapped to service account ID - Used for node-to-control-plane communication ### 11.2 Authorization Properties - **Default Deny**: No binding = denied - **Explicit Allow**: Must match binding + role + permission - **Scope Enforcement**: Automatic via containment - **Temporal Bounds**: `expires_at` for time-limited access - **Soft Disable**: `enabled` flag for quick revocation ### 11.3 Immutable Builtins - System roles cannot be modified/deleted - Prevents privilege escalation via role modification ### 11.4 Audit Trail - `created_by` on all entities - Timestamps for creation/modification - Audit event generation via iam-audit crate ## 12. Operations ### 12.1 Deployment **Single Node**: ```bash aegis-server --config /etc/aegis/aegis.toml ``` **Cluster Mode**: - Multiple Aegis instances behind load balancer - Shared storage backend (Chainfire or FlareDB) - Stateless - any instance can handle any request - Session affinity not required **High Availability**: - Deploy 3+ instances across availability zones - Use Chainfire Raft cluster for storage - Health checks on `/health` endpoint ### 12.2 Initialization ```rust // Initialize builtin roles (idempotent) role_store.init_builtin_roles().await?; ``` ### 12.3 Client Library ```rust use iam_client::IamClient; let client = IamClient::connect("http://127.0.0.1:9090").await?; // Check authorization let allowed = client.authorize( PrincipalRef::user("alice"), "compute:instances:create", ResourceRef::new("instance", "org-1", "proj-1", "vm-1"), ).await?; // Create binding client.create_binding(CreateBindingRequest { principal: PrincipalRef::user("alice"), role: "roles/ProjectAdmin".into(), scope: Scope::project("proj-1", "org-1"), ..Default::default() }).await?; ``` ### 12.4 Monitoring **Metrics (Prometheus format)**: | Metric | Type | Description | |--------|------|-------------| | `aegis_authz_requests_total` | Counter | Total authorization requests | | `aegis_authz_decisions{result}` | Counter | Decisions by allow/deny | | `aegis_authz_latency_seconds` | Histogram | Authorization latency | | `aegis_token_issued_total` | Counter | Tokens issued | | `aegis_token_validated_total` | Counter | Token validations | | `aegis_cache_hits_total` | Counter | Policy cache hits | | `aegis_cache_misses_total` | Counter | Policy cache misses | | `aegis_bindings_total` | Gauge | Total active bindings | | `aegis_principals_total` | Gauge | Total principals | **Health Endpoints**: - `GET /health` - Liveness check - `GET /ready` - Readiness check (storage connected) ### 12.5 Backup & Recovery **Backup**: - Export all principals, roles, and bindings via Admin API - Or snapshot underlying storage (Chainfire/FlareDB) - Recommended: Daily full backup + continuous WAL archiving **Recovery**: - Restore from storage snapshot - Or reimport via Admin API - Builtin roles auto-created on startup ## 13. Compatibility ### 13.1 API Versioning - gRPC package: `iam.v1` - Semantic versioning for breaking changes - Backward compatible additions within major version - Deprecation warnings before removal ### 13.2 Wire Protocol - Protocol Buffers v3 - gRPC with HTTP/2 transport - TLS 1.3 required in production ### 13.3 Storage Migration - Schema version tracked in metadata key - Automatic migration on startup - Backward compatible within major version ## Appendix ### A. Error Codes | Error | Meaning | |-------|---------| | PRINCIPAL_NOT_FOUND | Principal does not exist | | ROLE_NOT_FOUND | Role does not exist | | BINDING_NOT_FOUND | Binding does not exist | | BUILTIN_IMMUTABLE | Cannot modify builtin role | | SCOPE_VIOLATION | Operation violates scope boundary | | CONDITION_FAILED | Condition evaluation failed | ### B. Proto Scope Messages ```protobuf message Scope { oneof scope { bool system = 1; OrgScope org = 2; ProjectScope project = 3; ResourceScope resource = 4; } } message OrgScope { string id = 1; } message ProjectScope { string id = 1; string org_id = 2; } message ResourceScope { string id = 1; string project_id = 2; string org_id = 3; } ``` ### C. Port Assignments | Port | Protocol | Purpose | |------|----------|---------| | 9090 | gRPC | IAM API | ### D. Performance Considerations - Cache bindings and roles for hot path - Batch authorization for bulk checks - Prefix scans for hierarchical queries - CAS for conflict-free updates ### E. Glossary - **Principal**: An identity that can be authenticated (user, service account, or group) - **Role**: A named collection of permissions that can be assigned to principals - **Permission**: A specific action allowed on a resource pattern with optional conditions - **Binding**: Assignment of a role to a principal within a specific scope - **Scope**: Hierarchical boundary for permission application (System > Org > Project > Resource) - **Condition**: ABAC expression that must evaluate to true for access to be granted - **PDP**: Policy Decision Point - the authorization evaluation engine - **RBAC**: Role-Based Access Control - permissions assigned via roles - **ABAC**: Attribute-Based Access Control - permissions based on attributes/conditions ### F. Example Policies **Allow user to manage own instances**: ```json { "principal": "user:alice", "role": "roles/ProjectMember", "scope": { "type": "project", "id": "web-app", "org_id": "acme" } } ``` **Time-limited admin access**: ```json { "principal": "user:bob", "role": "roles/ProjectAdmin", "scope": { "type": "project", "id": "staging", "org_id": "acme" }, "expires_at": 1735689600, "condition": { "expression": { "type": "time_between", "start": "09:00", "end": "18:00" } } } ``` **Node-bound service account**: ```json { "principal": "service_account:compute-agent-node-1", "role": "roles/ServiceRole-ComputeAgent", "scope": { "type": "system" }, "condition": { "expression": { "type": "string_equals", "key": "resource.node", "value": "${principal.node_id}" } } } ``` **IP-restricted access**: ```json { "principal": "user:admin", "role": "roles/SystemAdmin", "scope": { "type": "system" }, "condition": { "expression": { "type": "ip_address", "key": "request.source_ip", "cidr": "10.0.0.0/8" } } } ```