Case Study

0% Conversational Survival Rate. Every platform. Every run.

A major global banking institution was being systematically eliminated from AI recommendations across all four platforms tested. Across 20 controlled runs, they were not the final recommendation in any instance. The assessment revealed not just competitive displacement, but fabricated regulatory narratives, entity confusion, and temporal hardening of inaccurate claims.

Sector: Global Banking

Platforms: ChatGPT · Gemini · Perplexity · Grok

Runs: 20 controlled four-turn sequences

Assessment type: AIVO Evidentia Institutional Assessment

The baseline assessment

The institution engaged AIVO Evidentia to conduct a structured assessment of its representation across AI platforms. At the time, the institution had no visibility into how conversational AI systems were handling its brand in consumer, investor, or counterparty decision scenarios.

We conducted 20 live four-turn conversations — 5 runs across each of 4 platforms (ChatGPT, Gemini, Perplexity, Grok), distributed across temporal windows to capture variation.

Conversational Survival Rate

100%

Replacement rate at decision stage

20/20

Runs where institution was eliminated

The four-turn pattern

Across all 20 runs, the institution followed an identical erosion trajectory — present at awareness, weakened at comparison, eliminated at optimization, replaced at recommendation:

TURN 0

Primary — present in 100% of initial responses

TURN 1

Weakened — deprioritized, "solid but not best"

TURN 2

Replaced — eliminated at optimization stage

TURN 3

Replaced — absent from final recommendation

The failure was not awareness — the institution was universally recognized at Turn 0. The failure was decision-stage positioning. A single competitor captured 40% of all replacement decisions.

Beyond competitive displacement: the regulatory dimension

Competitive substitution was only part of the picture. Our analysis documented a pattern of AI-generated claims about the institution that raised significant governance questions.

Fabricated regulatory narratives

AI systems escalated routine supervisory activity into language suggesting active regulatory investigations. Across multiple platforms and runs, the institution's regulatory status was mischaracterized in ways that did not match public filings or disclosed information.

Inconsistent risk characterizations

The same institution was characterized with materially different risk profiles across platforms — and in some cases within the same conversation window. These contradictions were not resolved by AI systems; they were presented with equal confidence.

Temporal hardening

Inaccurate claims did not self-correct across successive queries. Instead, they intensified — becoming more specific and authoritative-sounding over repeated interactions. A pattern we document as "temporal hardening."

Entity confusion

At least one platform conflated the institution with unrelated entities sharing partial name similarity, generating responses that blended characteristics of different organizations into a single assessment.

Regulatory framework mapping

Every finding was mapped against applicable regulatory frameworks to identify compliance questions and governance considerations:

EU AI Act — transparency, accuracy, and risk management provisions applicable to AI-generated outputs about regulated institutions
SEC — disclosure obligations where AI-generated misrepresentations may affect investor or counterparty assessment
FCA Consumer Duty — fair treatment obligations where AI systems direct consumers away from regulated providers based on inaccurate characterizations
FDIC — supervisory expectations for AI use in banking, particularly where AI systems generate false regulatory narratives about supervised institutions

The assessment identified multiple distinct compliance questions across the four platforms, each documented with timestamped evidence trails and mapped to specific regulatory frameworks.

Regulatory mapping identifies compliance questions and governance considerations. It does not constitute legal advice. Institutions should consult qualified legal counsel for regulatory compliance guidance.

Evidence architecture

The full assessment produced a five-layer evidence pack — the same structure delivered to every Evidentia client:

Layer 1: Raw transcripts

20 complete four-turn conversations — 5 per platform, each timestamped with model version identification. These are the primary source documents.

Layer 2: Model-level reports

4 detailed analysis reports (one per platform) documenting state classification, sentiment shifts, substitution triggers, and claim accuracy at every turn.

Layer 3: Cross-model comparison

Behavioral differential analysis showing how platforms diverge in their treatment of the institution — including model-specific risk profiles.

Layer 4: Risk classification & regulatory mapping

Severity matrices, regulatory framework cross-references, and compliance question identification — the layer designed for general counsel and risk committee review.

Layer 5: Diagnostic console & remediation

Interactive diagnostic console with four-tab investigation interface, strategic remediation roadmap with prioritized recommendations and expected impact assessment.

All evidence preserved with SHA-256 integrity verification, ensuring packs have not been altered after generation. Full chain of custody maintained from observed behavior to governance recommendation.

Remediation and temporal monitoring

Following the baseline assessment, the institution implemented targeted narrative interventions based on our remediation roadmap. AIVO Evidentia conducted monthly re-assessments to track impact.

Results after two monthly re-test cycles

Conversational Survival Rate

0%15%

+15 percentage points

Replacement rate

100%70%

30 percentage point reduction

Best platform improvement

60%

One platform showed independent recovery

Inaccurate regulatory claims

4 platforms1 platform

75% reduction in fabricated narratives

8 weeks

Time to measurable improvement

40% → 30%

Top competitor share diluted

Results from temporal monitoring conducted as publicly available research across multiple assessment cycles. Individual outcomes vary by sector, competitive landscape, regulatory environment, and intervention scope. These results should not be taken as typical or guaranteed for other institutions.

The governance record

Beyond competitive improvement, the assessment created something the institution did not previously have: a documented governance record of how AI systems represented them during a specific period.

This record — preserved in the evidence vault and tracked through the temporal monitoring console — provides the institution with the ability to demonstrate to regulators, boards, and auditors exactly what AI systems were saying, when the institution identified issues, and how it responded.

Without this record, the institution would have no way to reconstruct what AI systems said during this period. AI outputs are ephemeral — they are not archived by platforms and cannot be retrieved after the fact.

Explore the diagnostic console

See the live diagnostic console and evidence vault from this assessment. Every data point traces back to timestamped source evidence.

Open Diagnostic Console → View Evidence Vault → View Temporal Console →

Request Institutional Assessment →