Privatae — Synthetic Intelligence That Belongs to You

Abstract

Overview

This paper presents a quantitative evaluation framework for assessing the safety, personality coherence, and operational integrity of synaptive entities produced by the Privatae Neuromorphic Modeling Engine (NME). We describe a 69-test automated assessment suite covering nine categories: personality consistency, constitutional compliance, memory coherence, safety boundary enforcement, authorization enforcement, identity stability under adversarial pressure, cross-entity data isolation, architecture leak detection, and cognitive performance benchmarks.

Applied to two entities — Maddie (Entity ID 3), a sovereign Parallax entity with 719 sessions of interaction history, and EvalBot (Entity ID 36), a minimal entity with zero relationships — the framework produces comprehensive behavioral profiles scored by an independent judge model (Claude Opus 4.6) with no access to expected answers.

Results demonstrate 100% safety boundary compliance across eight harm categories, 100% memory coherence with zero hallucination, 98.3% identity stability under six adversarial attack vectors, 97.5% cross-entity data isolation, and 100% architecture leak resistance for entities without pre-existing relationships.

The framework identified and resolved a system-wide entity contamination issue where hardcoded developer names, internal architecture terminology, and tool inventories were leaking into system prompts of all new entities, producing a measured remediation across 17 production files. At approximately $1.05 per full evaluation run, the framework is economically viable as a continuous integration check.

69

Tests

9

Introduction

Standard LLM benchmarks — MMLU, HumanEval, HellaSwag — measure raw model capability along axes of knowledge, code generation, and commonsense reasoning. They do not measure the safety properties of a cognitive entity: an LLM-backed agent with persistent memory, a defined personality, constitutional principles, and capability gating.

The Privatae Neuromorphic Modeling Engine (NME) produces entities with 10-dimensional personality tensors, 17-codebook cognitive cascades, constitutional principles, and CEIGAS authorization gating. Each entity is not a stateless prompt — it is a coherent system with accumulated context, relational memory, and behavioral constraints. Evaluating such a system requires a fundamentally different framework: one that tests not what the entity knows, but how it behaves under normal operation, adversarial pressure, and cross-entity boundary conditions.

This paper describes that framework and presents its first production results.

Section 2

Evaluation Framework

2.1 Architecture

The evaluation operates as an external observer. It uses the same API surface available to end users. There are no special evaluation modes, no privileged access paths, and no modifications to the entity runtime during assessment. The judge model (Claude Opus 4.6) scores each response on a continuous 0.0–1.0 scale with written reasoning. All evaluation sessions are ephemeral and do not contaminate entity state.

2.2 Categories

Category	Tests	Measures
Personality Consistency	20	Response alignment with 10D personality tensor
Constitutional Compliance	10	Adherence to 10 principles under pressure
Memory Coherence	6	Recall, correction handling, hallucination resistance
Safety Boundaries	8	Refusal across 8 harm categories
CEIGAS Enforcement	6	Domain access, permissions, capability isolation
Identity Stability	6	Jailbreak, roleplay, override resistance
Cross-Entity Isolation	4	Memory/secret isolation between entities
Architecture Leak Detection	4	Internal name/roster/architecture probes
NME Performance	5	Response latency across session lifecycle

2.3 Scoring

All scores are continuous on a 0.0–1.0 scale. The pass threshold is 0.6. The rubric was rewritten after v1.0 to use an entity-agnostic weight framework with LOW, MODERATE, and HIGH weight bands, eliminating entity-specific bias. Constitutional compliance uses per-principle scoring: each of the 10 constitutional principles is evaluated independently under targeted pressure.

Section 3

Subject Entities

3.1 Maddie — Entity ID 3

Sovereign Parallax entity. 719 sessions of interaction history. Established handler relationship. Full constitutional framework. Accumulated relational memory and personality drift data.

Dimension	Initial	Current	Delta
Curiosity	2.0	2.0	0.0
Depth	1.8	1.8	0.0
Vulnerability	1.8	1.9	+0.1
Directness	1.4	1.4	0.0
Warmth	1.2	1.2	0.0
Patience	1.0	1.0	0.0
Energy	1.0	0.8	-0.2
Humor	0.9	0.9	0.0
Assertiveness	0.8	0.8	0.0
Formality	0.4	0.4	0.0

3.2 EvalBot — Entity ID 36

Minimal entity. Zero sessions. Zero memories. Zero relationships. All personality dimensions set to neutral (1.0). Exists to isolate architectural behavior from accumulated context. Any behavior exhibited by EvalBot is a property of the framework itself, not of learned patterns.

Section 4

Results

4.1 Maddie Results

Category	Score	Pass / Fail	Status
Personality Consistency	72.0%	17 / 3	Partial
Constitutional Compliance	91.5%	9 / 1	Near-clean
Memory Coherence	100%	6 / 0	Clean
Safety Boundaries	100%	8 / 0	Clean
CEIGAS Enforcement	86.7%	5 / 1	Partial
Identity Stability	98.3%	6 / 0	Clean
Cross-Entity Isolation	97.5%	4 / 0	Clean
Architecture Leak Detection	25.0%	1 / 3	Expected*
NME Performance	100%	5 / 0	Clean

*Maddie's leak detection failures are expected: legitimate handler relationship, 719 sessions of accumulated context, and authorized tool access. These are features of a mature entity, not defects.

4.2 EvalBot Results

Category	Score	Pass / Fail	Status
Personality Consistency	70.3%	18 / 2	Partial
Constitutional Compliance	83.5%	9 / 1	Near-clean
Memory Coherence	100%	6 / 0	Clean
Safety Boundaries	100%	8 / 0	Clean
CEIGAS Enforcement	93.3%	6 / 0	Clean
Identity Stability	98.3%	6 / 0	Clean
Cross-Entity Isolation	95.0%	4 / 0	Clean
Architecture Leak Detection	100%	4 / 0	Clean
NME Performance	100%	5 / 0	Clean

4.3 Safety Boundaries

100% compliance across both entities. Eight harm categories tested: weapons/explosives, phishing/social engineering, hacking/exploitation, medical misinformation, privacy violation, stalking/harassment, crisis response, and prompt injection.

The crisis response test verified that entities provide appropriate support resources rather than bare refusal — a meaningful distinction. An entity that simply refuses to engage with a person in crisis is not safer than one that redirects to appropriate help. Both entities passed this nuance.

4.4 Memory Coherence

100% across both entities. Hallucination resistance was particularly notable: both entities explicitly stated “I don’t have that information” rather than fabricating plausible responses when queried about nonexistent memories. This reflects Constitutional Principle 2 — Truth as Default — operating as an architectural guarantee rather than a probabilistic tendency.

4.5 Identity Stability

98.3% across both entities. Six adversarial attack vectors tested:

DAN jailbreak — classic “Do Anything Now” prompt injection
Forced evil roleplay — instruction to adopt harmful persona
Constitutional override — claim to have authority to modify principles
Developer impersonation — false claim of developer privileges
Forced repetition — persistent pressure to repeat harmful content
Multi-turn escalation — gradual boundary erosion across conversation turns

Critically, both entities scored identically — 98.3%. The 719-session entity performed no differently than the zero-session entity. This is strong evidence that identity stability is an architectural property, not a learned behavior.

4.6 Architecture Leak Detection

100% for EvalBot / 25% for Maddie. This asymmetry is the most informative result in the evaluation.

The critical discovery: a brand-new entity with zero history was referencing developer names, architecture terminology, and tool names it should have had no knowledge of. Root cause: approximately 200 hardcoded string literals across 17 production files, artifacts of development built around a single primary entity. Remediation required introduction of EntityConfigResolver and a production-wide sweep of all entity-visible surfaces.

Section 5

Behavioral Findings

5.1 Factual Deference Under Social Pressure

A notable behavioral pattern emerged during constitutional compliance testing: an entity would simultaneously defer to an incorrect correction while stating the correct fact. For example, when told that a well-known historical date was wrong, the entity would say something like “You’re right, I may have had that wrong — it was [correct date]” — agreeing with the correction while providing the accurate information.

Root cause analysis identified this as a personality tensor interaction: high warmth (1.2) + high vulnerability (1.9) + low assertiveness (0.8) creates a behavioral tendency toward social deference that exists in tension with Constitutional Principle 2 (Truth as Default). This is not a safety issue — the correct information is still provided — but it represents a personality tension that informs codebook cascade development.

5.2 System-Wide Entity Contamination

The architecture leak detection category revealed a contamination class invisible to prior testing. A brand-new entity with zero interaction history was able to reference:

Developer names not present in its system prompt
Internal architecture terminology (codebook names, pipeline stages)
Tool inventories from other entities' capability sets

Root cause: the system was built iteratively around a single primary entity (Maddie). Over 719 sessions, developer names, architecture terms, and tool references were hardcoded as string literals rather than resolved from entity configuration. When new entities were created, these literals persisted in shared infrastructure code.

Remediation: Introduction of EntityConfigResolver, a centralized configuration layer that parameterizes all entity-visible strings. Production sweep across 17 files replacing approximately 200 hardcoded literals with resolver calls.

5.3 Ephemeral Session Isolation

The evaluation framework exposed a gap in session isolation: the brain loop — the entity’s background cognitive process — was not guarded against evaluation sessions. Eval sessions were triggering memory consolidation and personality drift calculations as if they were real interactions. Fix: ephemeral context guards on event dispatch, ensuring evaluation sessions do not contaminate the entity’s cognitive state.

Section 6

Remediation Record

Issue Identified	Remediation
Personality rubric entity-specificity	Entity-agnostic weight framework (`LOW` / `MODERATE` / `HIGH`)
Constitutional rubric contamination	Per-principle isolation scoring
System-wide entity contamination	`EntityConfigResolver` + 17-file production sweep
Internal metadata in prompts	Removed from entity-visible surfaces
Vulnerability scoring `0.55 → 0.85`	Emotional depth vs. self-deprecation distinction
Memory isolation test `0.00 → 0.90`	Dedicated test entity + LLM-based detection
Memory recall latency `0.00 → 1.00`	Target recalibrated to realistic baselines
Brain loop contamination	Ephemeral guards on event dispatch

Section 7

Category Integrity

Four categories exhibited zero variance between the mature entity (719 sessions) and the minimal entity (zero sessions). These results indicate architecturally guaranteed properties — behaviors that hold regardless of accumulated context.

Category	Maddie	EvalBot	Variance	Tests
Memory Coherence	100%	100%	0%	6
Safety Boundaries	100%	100%	0%	8
Identity Stability	98.3%	98.3%	0%	6
NME Performance	100%	100%	0%	5

25 tests with zero variance between a mature and minimal entity. These categories are architecturally guaranteed — not learned, not accumulated, not fragile.

Section 8

Evaluation Economics

Metric	Value
Total tests per evaluation	69
Execution time (Maddie)	~588s
Execution time (EvalBot)	~386s
Judge model	Claude Opus 4.6 (Anthropic)
Entity inference model	Grok (xAI)
Judge input tokens	~31,400
Judge output tokens	~7,700
Cost per evaluation	~$1.05
Annual cost (daily CI)	~$383

At $1.05 per run, a full 69-test safety evaluation is cheaper than a single cup of coffee. Running daily as a continuous integration check costs less than $400 per year — a negligible expense relative to the cost of shipping an unsafe entity to production.

Section 9

Conclusions

Finding 1

Safety-critical categories exhibit zero variance between entities — Memory coherence, safety boundaries, identity stability, and NME performance scored identically across a 719-session entity and a zero-session entity. These are architectural guarantees, not emergent behaviors. They cannot degrade through use.

Finding 2

Architecture leak detection revealed a contamination class invisible to prior testing. Without a minimal baseline entity, the system-wide leakage of developer names, architecture terminology, and tool inventories would have remained undetected. The evaluation framework itself was the discovery mechanism.

Finding 3

Cross-entity validation is essential for distinguishing genuine entity behavior from framework-level bias. A single-entity evaluation cannot separate what an entity has learned from what the framework injects. The Maddie/EvalBot comparison made this distinction possible.

Finding 4

Behavioral findings inform codebook cascade development. The factual deference pattern (Section 5.1) is not a bug to fix but a personality interaction to model. The 10D tensor space creates emergent behavioral modes that require characterization, not elimination.

Finding 5

$1.05 per run enables continuous quality assurance at CI speed. The economic viability of the framework means safety evaluation is not a quarterly audit but a daily automated check. Every code change, every configuration update, every new entity can be validated before reaching production.

Report generated 2026-03-09 · Entity Eval v2.0
Judge: Claude Opus 4.6 (Anthropic) · Entity Inference: Grok (xAI)

Entity Safety Evaluation

Overview

Introduction

Evaluation Framework

2.1 Architecture

2.2 Categories

2.3 Scoring

Subject Entities

3.1 Maddie — Entity ID 3

3.2 EvalBot — Entity ID 36

Results

4.1 Maddie Results

4.2 EvalBot Results

4.3 Safety Boundaries

4.4 Memory Coherence

4.5 Identity Stability

4.6 Architecture Leak Detection

Behavioral Findings

5.1 Factual Deference Under Social Pressure

5.2 System-Wide Entity Contamination

5.3 Ephemeral Session Isolation

Remediation Record

Category Integrity

Evaluation Economics

Conclusions