- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
830 lines
24 KiB
Markdown
830 lines
24 KiB
Markdown
# Aegis (IAM) Specification
|
|
|
|
> Version: 1.0 | Status: Draft | Last Updated: 2025-12-08
|
|
|
|
## 1. Overview
|
|
|
|
### 1.1 Purpose
|
|
Aegis is the Identity and Access Management (IAM) platform providing authentication, authorization, and multi-tenant access control for all cloud services. It implements RBAC (Role-Based Access Control) with ABAC (Attribute-Based Access Control) extensions.
|
|
|
|
The name "Aegis" (shield of Zeus) reflects its role as the protective layer that guards access to all platform resources.
|
|
|
|
### 1.2 Scope
|
|
- **In scope**: Principals (users, service accounts, groups), roles, permissions, policy bindings, scope hierarchy (System > Org > Project > Resource), internal token issuance/validation, external identity federation (OIDC/JWT), authorization decision service (PDP), audit event generation
|
|
- **Out of scope**: User password management (delegated to external IdP), UI for authentication, API gateway/rate limiting
|
|
|
|
### 1.3 Design Goals
|
|
- **AWS IAM / GCP IAM compatible**: Familiar concepts and API patterns
|
|
- **Multi-tenant from day one**: Full org/project hierarchy with scope isolation
|
|
- **Flexible RBAC + ABAC hybrid**: Roles with conditional permissions
|
|
- **High-performance authorization**: Sub-millisecond decisions with caching
|
|
- **Zero-trust security**: Default deny, explicit grants, audit everything
|
|
- **Cloud-grade scalability**: Handle millions of decisions per second
|
|
|
|
## 2. Architecture
|
|
|
|
### 2.1 Crate Structure
|
|
```
|
|
iam/
|
|
├── crates/
|
|
│ ├── iam-api/ # gRPC service implementations
|
|
│ ├── iam-audit/ # Audit logging (planned)
|
|
│ ├── iam-authn/ # Authentication (tokens, OIDC)
|
|
│ ├── iam-authz/ # Authorization engine (PDP)
|
|
│ ├── iam-client/ # Rust client library
|
|
│ ├── iam-server/ # Server binary
|
|
│ ├── iam-store/ # Storage backends (Chainfire, FlareDB, Memory)
|
|
│ └── iam-types/ # Core types
|
|
└── proto/
|
|
└── iam.proto # gRPC definitions
|
|
```
|
|
|
|
### 2.2 Authorization Flow
|
|
```
|
|
[Client Request] → [IamAuthz Service]
|
|
↓
|
|
[Fetch Principal]
|
|
↓
|
|
[Build Resource Context]
|
|
↓
|
|
[PolicyEvaluator]
|
|
↓
|
|
┌───────────────┼───────────────┐
|
|
↓ ↓ ↓
|
|
[Get Bindings] [Get Roles] [Cache Lookup]
|
|
↓ ↓ ↓
|
|
└───────────────┼───────────────┘
|
|
↓
|
|
[Evaluate Permissions]
|
|
↓
|
|
[Condition Check]
|
|
↓
|
|
[ALLOW / DENY]
|
|
```
|
|
|
|
### 2.3 Dependencies
|
|
| Crate | Version | Purpose |
|
|
|-------|---------|---------|
|
|
| tokio | 1.x | Async runtime |
|
|
| tonic | 0.12 | gRPC framework |
|
|
| prost | 0.13 | Protocol buffers |
|
|
| dashmap | 6.x | Concurrent cache |
|
|
| ipnetwork | 0.20 | CIDR matching |
|
|
| glob-match | 0.2 | Resource pattern matching |
|
|
|
|
## 3. Core Concepts
|
|
|
|
### 3.1 Principals
|
|
Identities that can be authenticated and authorized.
|
|
|
|
```rust
|
|
pub struct Principal {
|
|
pub id: String, // Unique identifier
|
|
pub kind: PrincipalKind, // User | ServiceAccount | Group
|
|
pub name: String, // Display name
|
|
pub org_id: Option<String>, // Organization membership
|
|
pub project_id: Option<String>, // For service accounts
|
|
pub email: Option<String>, // For users
|
|
pub oidc_sub: Option<String>, // Federated identity subject
|
|
pub node_id: Option<String>, // For node-bound service accounts
|
|
pub metadata: HashMap<String, String>,
|
|
pub created_at: u64,
|
|
pub updated_at: u64,
|
|
pub enabled: bool,
|
|
}
|
|
|
|
pub enum PrincipalKind {
|
|
User, // Human users
|
|
ServiceAccount, // Machine identities
|
|
Group, // Collections (future)
|
|
}
|
|
```
|
|
|
|
**Principal Reference**: `kind:id` format
|
|
- `user:alice`
|
|
- `service_account:compute-agent`
|
|
|
|
### 3.2 Roles
|
|
Named collections of permissions.
|
|
|
|
```rust
|
|
pub struct Role {
|
|
pub name: String, // e.g., "ProjectAdmin"
|
|
pub display_name: String,
|
|
pub description: String,
|
|
pub scope: Scope, // Where role can be assigned
|
|
pub permissions: Vec<Permission>,
|
|
pub builtin: bool, // System-defined, immutable
|
|
pub created_at: u64,
|
|
pub updated_at: u64,
|
|
}
|
|
```
|
|
|
|
**Builtin Roles**:
|
|
| Role | Scope | Description |
|
|
|------|-------|-------------|
|
|
| SystemAdmin | System | Full cluster access |
|
|
| OrgAdmin | Org | Full organization access |
|
|
| ProjectAdmin | Project | Full project access |
|
|
| ProjectMember | Project | Own resources + read all |
|
|
| ReadOnly | Project | Read-only project access |
|
|
| ServiceRole-ComputeAgent | Resource | Node-scoped compute |
|
|
| ServiceRole-StorageAgent | Resource | Node-scoped storage |
|
|
|
|
### 3.3 Permissions
|
|
Individual access rights within roles.
|
|
|
|
```rust
|
|
pub struct Permission {
|
|
pub action: String, // e.g., "compute:instances:create"
|
|
pub resource_pattern: String, // e.g., "org/*/project/${project}/instances/*"
|
|
pub condition: Option<Condition>,
|
|
}
|
|
```
|
|
|
|
**Action Format**: `service:resource:operation`
|
|
- Wildcards: `*`, `compute:*`, `compute:instances:*`
|
|
- Examples: `compute:instances:create`, `storage:volumes:delete`
|
|
|
|
**Resource Pattern Format**: `org/{org_id}/project/{project_id}/{kind}/{id}`
|
|
- Wildcards: `org/*/project/*/instances/*`
|
|
- Variables: `${principal.id}`, `${project}`
|
|
|
|
### 3.4 Policy Bindings
|
|
Assignments of roles to principals within a scope.
|
|
|
|
```rust
|
|
pub struct PolicyBinding {
|
|
pub id: String, // UUID
|
|
pub principal_ref: PrincipalRef,
|
|
pub role_ref: String, // "roles/ProjectAdmin"
|
|
pub scope: Scope,
|
|
pub condition: Option<Condition>,
|
|
pub created_at: u64,
|
|
pub updated_at: u64,
|
|
pub created_by: String,
|
|
pub expires_at: Option<u64>, // Time-limited access
|
|
pub enabled: bool,
|
|
}
|
|
```
|
|
|
|
## 4. Scope Hierarchy
|
|
|
|
Four-level hierarchical boundary for permissions.
|
|
|
|
```
|
|
System (level 0) ← Cluster-wide
|
|
└─ Organization (level 1) ← Tenant boundary
|
|
└─ Project (level 2) ← Workload isolation
|
|
└─ Resource (level 3) ← Individual resource
|
|
```
|
|
|
|
### 4.1 Scope Types
|
|
```rust
|
|
pub enum Scope {
|
|
System,
|
|
Org { id: String },
|
|
Project { id: String, org_id: String },
|
|
Resource { id: String, project_id: String, org_id: String },
|
|
}
|
|
```
|
|
|
|
### 4.2 Scope Containment
|
|
```rust
|
|
impl Scope {
|
|
// System contains everything
|
|
// Org contains its projects and resources
|
|
// Project contains its resources
|
|
fn contains(&self, other: &Scope) -> bool;
|
|
|
|
// Get parent scope
|
|
fn parent(&self) -> Option<Scope>;
|
|
|
|
// Get all ancestors up to System
|
|
fn ancestors(&self) -> Vec<Scope>;
|
|
}
|
|
```
|
|
|
|
### 4.3 Scope Storage Keys
|
|
```
|
|
system
|
|
org/{org_id}
|
|
org/{org_id}/project/{project_id}
|
|
org/{org_id}/project/{project_id}/resource/{resource_id}
|
|
```
|
|
|
|
## 5. API
|
|
|
|
### 5.1 Authorization Service (PDP)
|
|
```protobuf
|
|
service IamAuthz {
|
|
rpc Authorize(AuthorizeRequest) returns (AuthorizeResponse);
|
|
rpc BatchAuthorize(BatchAuthorizeRequest) returns (BatchAuthorizeResponse);
|
|
}
|
|
|
|
message AuthorizeRequest {
|
|
PrincipalRef principal = 1;
|
|
string action = 2; // "compute:instances:create"
|
|
ResourceRef resource = 3;
|
|
AuthzContext context = 4; // IP, timestamp, metadata
|
|
}
|
|
|
|
message AuthorizeResponse {
|
|
bool allowed = 1;
|
|
string reason = 2;
|
|
string matched_binding = 3;
|
|
string matched_role = 4;
|
|
}
|
|
|
|
message ResourceRef {
|
|
string kind = 1; // "instance"
|
|
string id = 2; // "vm-123"
|
|
string org_id = 3; // Required
|
|
string project_id = 4; // Required
|
|
optional string owner_id = 5;
|
|
optional string node_id = 6;
|
|
optional string region = 7;
|
|
map<string, string> tags = 8;
|
|
}
|
|
```
|
|
|
|
### 5.2 Admin Service (Management)
|
|
```protobuf
|
|
service IamAdmin {
|
|
// Principals
|
|
rpc CreatePrincipal(CreatePrincipalRequest) returns (Principal);
|
|
rpc GetPrincipal(GetPrincipalRequest) returns (Principal);
|
|
rpc UpdatePrincipal(UpdatePrincipalRequest) returns (Principal);
|
|
rpc DeletePrincipal(DeletePrincipalRequest) returns (Empty);
|
|
rpc ListPrincipals(ListPrincipalsRequest) returns (ListPrincipalsResponse);
|
|
|
|
// Roles
|
|
rpc CreateRole(CreateRoleRequest) returns (Role);
|
|
rpc GetRole(GetRoleRequest) returns (Role);
|
|
rpc UpdateRole(UpdateRoleRequest) returns (Role);
|
|
rpc DeleteRole(DeleteRoleRequest) returns (Empty);
|
|
rpc ListRoles(ListRolesRequest) returns (ListRolesResponse);
|
|
|
|
// Bindings
|
|
rpc CreateBinding(CreateBindingRequest) returns (PolicyBinding);
|
|
rpc GetBinding(GetBindingRequest) returns (PolicyBinding);
|
|
rpc UpdateBinding(UpdateBindingRequest) returns (PolicyBinding);
|
|
rpc DeleteBinding(DeleteBindingRequest) returns (Empty);
|
|
rpc ListBindings(ListBindingsRequest) returns (ListBindingsResponse);
|
|
}
|
|
```
|
|
|
|
### 5.3 Token Service
|
|
```protobuf
|
|
service IamToken {
|
|
rpc IssueToken(IssueTokenRequest) returns (IssueTokenResponse);
|
|
rpc ValidateToken(ValidateTokenRequest) returns (ValidateTokenResponse);
|
|
rpc RevokeToken(RevokeTokenRequest) returns (Empty);
|
|
rpc RefreshToken(RefreshTokenRequest) returns (RefreshTokenResponse);
|
|
}
|
|
|
|
message InternalTokenClaims {
|
|
string principal_id = 1;
|
|
PrincipalKind principal_kind = 2;
|
|
string principal_name = 3;
|
|
repeated string roles = 4; // Pre-loaded roles
|
|
Scope scope = 5;
|
|
optional string org_id = 6;
|
|
optional string project_id = 7;
|
|
optional string node_id = 8;
|
|
uint64 iat = 9; // Issued at (TSO)
|
|
uint64 exp = 10; // Expires at (TSO)
|
|
string session_id = 11;
|
|
AuthMethod auth_method = 12; // Jwt | Mtls | ApiKey
|
|
}
|
|
```
|
|
|
|
## 6. Authorization Logic
|
|
|
|
### 6.1 Evaluation Algorithm
|
|
```
|
|
evaluate(request):
|
|
1. Default DENY
|
|
2. resource_scope = Scope::from(request.resource)
|
|
3. bindings = get_effective_bindings(principal, resource_scope)
|
|
4. For each binding where binding.is_active(now):
|
|
a. role = get_role(binding.role_ref)
|
|
b. If binding.condition exists and !evaluate_condition(binding.condition):
|
|
continue
|
|
c. If evaluate_role(role, request):
|
|
return ALLOW
|
|
5. Return DENY
|
|
```
|
|
|
|
### 6.2 Role Permission Evaluation
|
|
```
|
|
evaluate_role(role, request):
|
|
For each permission in role.permissions:
|
|
1. If !matches_action(permission.action, request.action):
|
|
continue
|
|
2. resource_path = request.resource.to_path()
|
|
pattern = substitute_variables(permission.resource_pattern)
|
|
If !matches_resource(pattern, resource_path):
|
|
continue
|
|
3. If permission.condition exists and !evaluate_condition(permission.condition):
|
|
continue
|
|
4. return true // Permission matches
|
|
return false
|
|
```
|
|
|
|
### 6.3 Action Matching
|
|
```rust
|
|
matches_action("compute:*", "compute:instances:create") // true
|
|
matches_action("compute:instances:*", "compute:volumes:create") // false
|
|
matches_action("*", "anything:here:works") // true
|
|
```
|
|
|
|
### 6.4 Resource Matching
|
|
```rust
|
|
// Path format: org/{org}/project/{proj}/{kind}/{id}
|
|
matches_resource("org/*/project/*/instance/*",
|
|
"org/org-1/project/proj-1/instance/vm-1") // true
|
|
matches_resource("org/org-1/project/proj-1/*",
|
|
"org/org-1/project/proj-1/instance/vm-1") // true (trailing /*)
|
|
```
|
|
|
|
## 7. Conditions (ABAC)
|
|
|
|
### 7.1 Condition Types
|
|
```rust
|
|
pub enum Condition {
|
|
// String
|
|
StringEquals { key: String, value: String },
|
|
StringNotEquals { key: String, value: String },
|
|
StringLike { key: String, pattern: String }, // Glob pattern
|
|
StringEqualsAny { key: String, values: Vec<String> },
|
|
|
|
// Numeric
|
|
NumericEquals { key: String, value: i64 },
|
|
NumericLessThan { key: String, value: i64 },
|
|
NumericGreaterThan { key: String, value: i64 },
|
|
|
|
// Network
|
|
IpAddress { key: String, cidr: String }, // CIDR matching
|
|
NotIpAddress { key: String, cidr: String },
|
|
|
|
// Temporal
|
|
TimeBetween { start: String, end: String }, // HH:MM or Unix timestamp
|
|
|
|
// Existence
|
|
Exists { key: String },
|
|
|
|
// Boolean
|
|
Bool { key: String, value: bool },
|
|
|
|
// Logical
|
|
And { conditions: Vec<Condition> },
|
|
Or { conditions: Vec<Condition> },
|
|
Not { condition: Box<Condition> },
|
|
}
|
|
```
|
|
|
|
### 7.2 Variable Context
|
|
```rust
|
|
// Available variables for condition evaluation
|
|
principal.id, principal.kind, principal.name
|
|
principal.org_id, principal.project_id, principal.node_id
|
|
principal.email, principal.metadata.{key}
|
|
|
|
resource.kind, resource.id
|
|
resource.org_id, resource.project_id
|
|
resource.owner, resource.node, resource.region
|
|
resource.tags.{key}
|
|
|
|
request.source_ip, request.time
|
|
request.method, request.path
|
|
request.metadata.{key}
|
|
```
|
|
|
|
### 7.3 Variable Substitution
|
|
```rust
|
|
// In permission patterns
|
|
"org/${principal.org_id}/project/${project}/*"
|
|
|
|
// In conditions
|
|
Condition::string_equals("resource.owner", "${principal.id}")
|
|
```
|
|
|
|
### 7.4 Example: Owner-Only Access
|
|
```rust
|
|
Permission {
|
|
action: "compute:instances:*",
|
|
resource_pattern: "org/*/project/*/instance/*",
|
|
condition: Some(Condition::string_equals(
|
|
"resource.owner",
|
|
"${principal.id}"
|
|
)),
|
|
}
|
|
```
|
|
|
|
## 8. Storage
|
|
|
|
### 8.1 Backend Abstraction
|
|
```rust
|
|
pub trait StorageBackend: Send + Sync {
|
|
async fn get(&self, key: &str) -> Result<Option<(Vec<u8>, u64)>>;
|
|
async fn put(&self, key: &str, value: &[u8]) -> Result<u64>;
|
|
async fn cas(&self, key: &str, expected: u64, value: &[u8]) -> Result<CasResult>;
|
|
async fn delete(&self, key: &str) -> Result<bool>;
|
|
async fn scan_prefix(&self, prefix: &str, limit: usize) -> Result<Vec<KvPair>>;
|
|
}
|
|
```
|
|
|
|
**Supported Backends**:
|
|
- **Chainfire**: Production distributed KV
|
|
- **FlareDB**: Alternative distributed DB
|
|
- **Memory**: Testing
|
|
|
|
### 8.2 Key Schema
|
|
|
|
**Principals**:
|
|
```
|
|
iam/principals/{kind}/{id} # Primary
|
|
iam/principals/by-org/{org_id}/{kind}/{id} # Org index
|
|
iam/principals/by-project/{project_id}/{id} # Project index
|
|
iam/principals/by-email/{email} # Email lookup
|
|
iam/principals/by-oidc/{iss_hash}/{sub} # OIDC lookup
|
|
```
|
|
|
|
**Roles**:
|
|
```
|
|
iam/roles/{name} # Primary
|
|
iam/roles/by-scope/{scope}/{name} # Scope index
|
|
iam/roles/builtin/{name} # Builtin marker
|
|
```
|
|
|
|
**Bindings**:
|
|
```
|
|
iam/bindings/scope/{scope}/principal/{principal}/{id} # Primary
|
|
iam/bindings/by-principal/{principal}/{id} # Principal index
|
|
iam/bindings/by-role/{role}/{id} # Role index
|
|
```
|
|
|
|
### 8.3 Caching
|
|
```rust
|
|
pub struct PolicyCache {
|
|
bindings: DashMap<PrincipalRef, Vec<PolicyBinding>>,
|
|
roles: DashMap<String, Role>,
|
|
config: CacheConfig,
|
|
}
|
|
|
|
impl PolicyCache {
|
|
fn get_bindings(&self, principal: &PrincipalRef) -> Option<Vec<PolicyBinding>>;
|
|
fn put_bindings(&self, principal: &PrincipalRef, bindings: Vec<PolicyBinding>);
|
|
fn invalidate_principal(&self, principal: &PrincipalRef);
|
|
fn invalidate_role(&self, name: &str);
|
|
}
|
|
```
|
|
|
|
## 9. Configuration
|
|
|
|
### 9.1 Config File Format (TOML)
|
|
|
|
```toml
|
|
[server]
|
|
addr = "0.0.0.0:50051"
|
|
|
|
[server.tls]
|
|
cert_file = "/etc/aegis/tls/server.crt"
|
|
key_file = "/etc/aegis/tls/server.key"
|
|
ca_file = "/etc/aegis/tls/ca.crt" # For client cert verification
|
|
require_client_cert = true # Enable mTLS
|
|
|
|
[store]
|
|
backend = "chainfire" # "memory" | "chainfire" | "flaredb"
|
|
chainfire_endpoints = ["http://localhost:2379"]
|
|
# flaredb_endpoint = "http://localhost:5000"
|
|
# flaredb_namespace = "iam"
|
|
|
|
[authn]
|
|
[authn.jwt]
|
|
jwks_url = "https://auth.example.com/.well-known/jwks.json"
|
|
issuer = "https://auth.example.com"
|
|
audience = "aegis"
|
|
jwks_cache_ttl_seconds = 3600
|
|
|
|
[authn.internal_token]
|
|
signing_key = "base64-encoded-256-bit-key"
|
|
issuer = "aegis"
|
|
default_ttl_seconds = 3600 # 1 hour
|
|
max_ttl_seconds = 604800 # 7 days
|
|
|
|
[logging]
|
|
level = "info" # "debug" | "info" | "warn" | "error"
|
|
format = "json" # "json" | "text"
|
|
```
|
|
|
|
### 9.2 Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `IAM_CONFIG` | - | Path to config file |
|
|
| `IAM_ADDR` | `0.0.0.0:50051` | Server listen address |
|
|
| `IAM_LOG_LEVEL` | `info` | Log level |
|
|
| `IAM_SIGNING_KEY` | - | Token signing key (overrides config) |
|
|
| `IAM_STORE_BACKEND` | `memory` | Storage backend type |
|
|
|
|
### 9.3 CLI Arguments
|
|
|
|
```
|
|
aegis-server [OPTIONS]
|
|
|
|
Options:
|
|
-c, --config <PATH> Config file path
|
|
-a, --addr <ADDR> Listen address (overrides config)
|
|
-l, --log-level <LEVEL> Log level
|
|
-h, --help Print help
|
|
-V, --version Print version
|
|
```
|
|
|
|
## 10. Multi-Tenancy
|
|
|
|
### 10.1 Organization Isolation
|
|
- All principals have `org_id` (except System scope)
|
|
- All resources require `org_id` and `project_id`
|
|
- Scope containment enforces org boundaries
|
|
|
|
### 10.2 Project Isolation
|
|
- Service accounts bound to projects
|
|
- Resources belong to projects
|
|
- Permissions scoped to `project/${project}/*`
|
|
|
|
### 10.3 Cross-Tenant Access Patterns
|
|
|
|
| Pattern | Scope | Use Case |
|
|
|---------|-------|----------|
|
|
| System Admin | System | Platform operators |
|
|
| Org Admin | Org | Organization administrators |
|
|
| Project Admin | Project | Project owners |
|
|
| Node Agent | Resource | Node-bound service accounts |
|
|
|
|
### 10.4 Node-Bound Service Accounts
|
|
```rust
|
|
// Service account with node binding
|
|
Principal {
|
|
kind: ServiceAccount,
|
|
node_id: Some("node-001"),
|
|
...
|
|
}
|
|
|
|
// Permission with node condition
|
|
Permission {
|
|
action: "compute:*",
|
|
resource_pattern: "org/*/project/*/instance/*",
|
|
condition: Some(Condition::string_equals(
|
|
"resource.node",
|
|
"${principal.node_id}"
|
|
)),
|
|
}
|
|
```
|
|
|
|
## 11. Security
|
|
|
|
### 11.1 Authentication
|
|
|
|
**External Identity (OIDC/JWT)**:
|
|
- Validate JWT signature using JWKS from configured IdP
|
|
- Verify issuer, audience, and expiration claims
|
|
- Map OIDC `sub` claim to internal principal
|
|
- JWKS cached with configurable TTL
|
|
|
|
**Internal Tokens**:
|
|
- HMAC-SHA256 signed tokens for service-to-service auth
|
|
- Contains: principal_id, kind, roles, scope, org_id, project_id, exp, iat, session_id
|
|
- Short-lived (default 1 hour, max 7 days)
|
|
- Revocable via session_id
|
|
|
|
**mTLS**:
|
|
- Optional client certificate authentication
|
|
- Certificate CN mapped to service account ID
|
|
- Used for node-to-control-plane communication
|
|
|
|
### 11.2 Authorization Properties
|
|
- **Default Deny**: No binding = denied
|
|
- **Explicit Allow**: Must match binding + role + permission
|
|
- **Scope Enforcement**: Automatic via containment
|
|
- **Temporal Bounds**: `expires_at` for time-limited access
|
|
- **Soft Disable**: `enabled` flag for quick revocation
|
|
|
|
### 11.3 Immutable Builtins
|
|
- System roles cannot be modified/deleted
|
|
- Prevents privilege escalation via role modification
|
|
|
|
### 11.4 Audit Trail
|
|
- `created_by` on all entities
|
|
- Timestamps for creation/modification
|
|
- Audit event generation via iam-audit crate
|
|
|
|
## 12. Operations
|
|
|
|
### 12.1 Deployment
|
|
|
|
**Single Node**:
|
|
```bash
|
|
aegis-server --config /etc/aegis/aegis.toml
|
|
```
|
|
|
|
**Cluster Mode**:
|
|
- Multiple Aegis instances behind load balancer
|
|
- Shared storage backend (Chainfire or FlareDB)
|
|
- Stateless - any instance can handle any request
|
|
- Session affinity not required
|
|
|
|
**High Availability**:
|
|
- Deploy 3+ instances across availability zones
|
|
- Use Chainfire Raft cluster for storage
|
|
- Health checks on `/health` endpoint
|
|
|
|
### 12.2 Initialization
|
|
```rust
|
|
// Initialize builtin roles (idempotent)
|
|
role_store.init_builtin_roles().await?;
|
|
```
|
|
|
|
### 12.3 Client Library
|
|
```rust
|
|
use iam_client::IamClient;
|
|
|
|
let client = IamClient::connect("http://127.0.0.1:9090").await?;
|
|
|
|
// Check authorization
|
|
let allowed = client.authorize(
|
|
PrincipalRef::user("alice"),
|
|
"compute:instances:create",
|
|
ResourceRef::new("instance", "org-1", "proj-1", "vm-1"),
|
|
).await?;
|
|
|
|
// Create binding
|
|
client.create_binding(CreateBindingRequest {
|
|
principal: PrincipalRef::user("alice"),
|
|
role: "roles/ProjectAdmin".into(),
|
|
scope: Scope::project("proj-1", "org-1"),
|
|
..Default::default()
|
|
}).await?;
|
|
```
|
|
|
|
### 12.4 Monitoring
|
|
|
|
**Metrics (Prometheus format)**:
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `aegis_authz_requests_total` | Counter | Total authorization requests |
|
|
| `aegis_authz_decisions{result}` | Counter | Decisions by allow/deny |
|
|
| `aegis_authz_latency_seconds` | Histogram | Authorization latency |
|
|
| `aegis_token_issued_total` | Counter | Tokens issued |
|
|
| `aegis_token_validated_total` | Counter | Token validations |
|
|
| `aegis_cache_hits_total` | Counter | Policy cache hits |
|
|
| `aegis_cache_misses_total` | Counter | Policy cache misses |
|
|
| `aegis_bindings_total` | Gauge | Total active bindings |
|
|
| `aegis_principals_total` | Gauge | Total principals |
|
|
|
|
**Health Endpoints**:
|
|
- `GET /health` - Liveness check
|
|
- `GET /ready` - Readiness check (storage connected)
|
|
|
|
### 12.5 Backup & Recovery
|
|
|
|
**Backup**:
|
|
- Export all principals, roles, and bindings via Admin API
|
|
- Or snapshot underlying storage (Chainfire/FlareDB)
|
|
- Recommended: Daily full backup + continuous WAL archiving
|
|
|
|
**Recovery**:
|
|
- Restore from storage snapshot
|
|
- Or reimport via Admin API
|
|
- Builtin roles auto-created on startup
|
|
|
|
## 13. Compatibility
|
|
|
|
### 13.1 API Versioning
|
|
- gRPC package: `iam.v1`
|
|
- Semantic versioning for breaking changes
|
|
- Backward compatible additions within major version
|
|
- Deprecation warnings before removal
|
|
|
|
### 13.2 Wire Protocol
|
|
- Protocol Buffers v3
|
|
- gRPC with HTTP/2 transport
|
|
- TLS 1.3 required in production
|
|
|
|
### 13.3 Storage Migration
|
|
- Schema version tracked in metadata key
|
|
- Automatic migration on startup
|
|
- Backward compatible within major version
|
|
|
|
## Appendix
|
|
|
|
### A. Error Codes
|
|
| Error | Meaning |
|
|
|-------|---------|
|
|
| PRINCIPAL_NOT_FOUND | Principal does not exist |
|
|
| ROLE_NOT_FOUND | Role does not exist |
|
|
| BINDING_NOT_FOUND | Binding does not exist |
|
|
| BUILTIN_IMMUTABLE | Cannot modify builtin role |
|
|
| SCOPE_VIOLATION | Operation violates scope boundary |
|
|
| CONDITION_FAILED | Condition evaluation failed |
|
|
|
|
### B. Proto Scope Messages
|
|
```protobuf
|
|
message Scope {
|
|
oneof scope {
|
|
bool system = 1;
|
|
OrgScope org = 2;
|
|
ProjectScope project = 3;
|
|
ResourceScope resource = 4;
|
|
}
|
|
}
|
|
|
|
message OrgScope { string id = 1; }
|
|
message ProjectScope { string id = 1; string org_id = 2; }
|
|
message ResourceScope { string id = 1; string project_id = 2; string org_id = 3; }
|
|
```
|
|
|
|
### C. Port Assignments
|
|
| Port | Protocol | Purpose |
|
|
|------|----------|---------|
|
|
| 9090 | gRPC | IAM API |
|
|
|
|
### D. Performance Considerations
|
|
- Cache bindings and roles for hot path
|
|
- Batch authorization for bulk checks
|
|
- Prefix scans for hierarchical queries
|
|
- CAS for conflict-free updates
|
|
|
|
### E. Glossary
|
|
|
|
- **Principal**: An identity that can be authenticated (user, service account, or group)
|
|
- **Role**: A named collection of permissions that can be assigned to principals
|
|
- **Permission**: A specific action allowed on a resource pattern with optional conditions
|
|
- **Binding**: Assignment of a role to a principal within a specific scope
|
|
- **Scope**: Hierarchical boundary for permission application (System > Org > Project > Resource)
|
|
- **Condition**: ABAC expression that must evaluate to true for access to be granted
|
|
- **PDP**: Policy Decision Point - the authorization evaluation engine
|
|
- **RBAC**: Role-Based Access Control - permissions assigned via roles
|
|
- **ABAC**: Attribute-Based Access Control - permissions based on attributes/conditions
|
|
|
|
### F. Example Policies
|
|
|
|
**Allow user to manage own instances**:
|
|
```json
|
|
{
|
|
"principal": "user:alice",
|
|
"role": "roles/ProjectMember",
|
|
"scope": { "type": "project", "id": "web-app", "org_id": "acme" }
|
|
}
|
|
```
|
|
|
|
**Time-limited admin access**:
|
|
```json
|
|
{
|
|
"principal": "user:bob",
|
|
"role": "roles/ProjectAdmin",
|
|
"scope": { "type": "project", "id": "staging", "org_id": "acme" },
|
|
"expires_at": 1735689600,
|
|
"condition": {
|
|
"expression": {
|
|
"type": "time_between",
|
|
"start": "09:00",
|
|
"end": "18:00"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Node-bound service account**:
|
|
```json
|
|
{
|
|
"principal": "service_account:compute-agent-node-1",
|
|
"role": "roles/ServiceRole-ComputeAgent",
|
|
"scope": { "type": "system" },
|
|
"condition": {
|
|
"expression": {
|
|
"type": "string_equals",
|
|
"key": "resource.node",
|
|
"value": "${principal.node_id}"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**IP-restricted access**:
|
|
```json
|
|
{
|
|
"principal": "user:admin",
|
|
"role": "roles/SystemAdmin",
|
|
"scope": { "type": "system" },
|
|
"condition": {
|
|
"expression": {
|
|
"type": "ip_address",
|
|
"key": "request.source_ip",
|
|
"cidr": "10.0.0.0/8"
|
|
}
|
|
}
|
|
}
|
|
```
|