Clinical Governance, Safety & Regulation

Clinical Governance, Safety & Regulation

Issue Date: 2025-12-09 Document Version: v1.1.0

Version Log

v1.1.0 (2025-12-09): Updated Issue Date, Document Version, and aligned structure with related safety/legal documents; no material changes to governance controls. v1.0.0 (2025-12-05): Initial publication covering training & competency, AI risk mitigation, and reliability reporting scaffold.

Purpose

This document summarizes PepperChat's clinical governance approach for AI‑assisted documentation, including: required training and competency for staff, our AI risk mitigation strategy, and a pointer to reliability evaluations with versioned results.

Training & Competency Requirements

Staff involved in clinical safety tasks (e.g., reviewing AI‑assisted outputs, evaluating reliability results, curating clinical templates) must complete and maintain the following training:

Documentation standards: payer rules, clinical accuracy, tone, and completeness

AI behavior & limitations: bias, hallucination avoidance, non‑determinism, prompt hygiene

Privacy & security: HIPAA‑aligned safeguards, PHI handling, minimum necessary, access control

Risk management & escalation: when to escalate concerns, incident reporting, correction workflow

Product workflows: Human‑in‑the‑Loop review and finalization controls in PepperChat

Training Evidence:

Records of completion (dates, version of curriculum, assessor) are maintained in PepperChat's internal HR/compliance system. Proof of completion is available upon request during audits.

Newly onboarded reviewers must complete training prior to performing safety reviews; annual refreshers are required.

Clinical Oversight:

PepperChat was co‑founded by a practicing therapist. We work with multiple licensed clinicians who regularly review product behavior, output samples, and outcomes against approved standards.

AI Risk Mitigation Strategy (Summary)

PepperChat's risk controls center on Human‑in‑the‑Loop review and transparent UX:

Human‑in‑the‑Loop gating: clinicians review, edit, and approve outputs before finalization.

Structure & preview: standardized clinical sections and print‑preview encourage careful review.

Change control: AI behavior changes are reviewed and regression‑checked prior to release.

Monitoring & feedback: issues are triaged; high‑severity items drive prompt/UX updates.

Privacy/security: encryption in transit/at rest, role‑based access, audit logging of key actions.

Standards & Frameworks (informative):

NIST AI Risk Management Framework (RMF)

ISO/IEC 23894 (AI risk management guidance)

HIPAA Security Rule safeguards (administrative, technical, physical)

For a full Risk Management Framework, see Safety, Risk & Suitability.

Risk Management System — Roles, Responsibilities, Oversight

This section documents the end‑to‑end governance system for AI‑assisted clinical documentation in PepperChat.

Roles and responsibilities (RACI):

Clinical Governance Lead (Accountable/Responsible)

Owns Safety, Risk & Suitability documentation; approves AI changes that impact clinical output.

Chairs the Clinical Safety Review (CSR) and maintains the risk register.

Licensed Clinical Safety Reviewers (Responsible)

Clinicians who review samples of AI‑assisted notes for accuracy, omissions, hallucinations, tone, and appropriateness.

Recommend mitigations, updates to prompts/UX, and additional training.

AI Reliability Lead (Responsible)

Designs/executes reliability test suites, publishes versioned reports and regression comparisons.

Maintains test data, scoring rubrics, and CI pipelines for automated checks where feasible.

Product & Engineering Owners (Consulted)

Implement mitigations, prompts, and UX changes. Ensure release gates and change control.

Data Protection & Security (Consulted)

Verifies logging, access control, and privacy safeguards for datasets and evaluation artifacts.

Executive Sponsor (Informed/Accountable)

Receives quarterly governance reports and approves resourcing for remediation.

Oversight & review cadence:

Weekly CSR huddle: triage new issues, assign owners, set due dates.

Monthly risk review: update risk register (likelihood/severity, residual risk), confirm control status.

Quarterly report to Executive Sponsor: outcomes, reliability trends, outstanding risks, planned mitigations.

Annual comprehensive review: refresh Safety & Risk documentation; update training curricula; re‑validate controls.

Risk register & evidence:

Single risk register with: ID, title, scenario, harm, likelihood, severity, initial risk, controls, owner, target date, residual risk, decision.

Evidence store includes: training records, reliability outputs (JSON/plots), change‑control notes, decision logs, and audited samples.

All artifacts are versioned and linked from the register.

Escalation process (textual flow):

Detection (Reviewer/Monitoring/Support) → log issue, propose severity.

CSR triage (same day for High/Critical). If High/Critical:

Immediate containment (e.g., feature flag, warning), assign Incident Lead.

Notify Executive Sponsor and Security as applicable.

Root‑cause & mitigation plan (Owner + Product/Eng + Reliability + Reviewer).

Validation (reliability suite + targeted clinical review), update register.

Communicate outcome; update Safety docs if controls change.

Qualified healthcare professional involvement:

Multiple licensed clinicians participate as Clinical Safety Reviewers; the Clinical Governance Lead is a practicing therapist. These professionals review outputs, advise on safety‑related decisions, and sign off residual‑risk acceptance (see "Acceptance of residual risks").

Summary of Key Risks and Status (High‑Level)

Incomplete extraction in category detection

Initial risk: Medium (L: Medium, S: Medium) → Control: mandatory human review + structured preview.

Residual risk: Low. Impact: minor omissions; mitigated by reviewer confirmation before finalization.

Actions: continue prompt refinements and targeted regression tests.

Misinterpretation of ambiguous phrasing

Initial risk: Medium → Control: clinician review; suggestion labeling; neutral tone prompts.

Residual risk: Low. Impact: low likelihood of incorrect clinical statements post‑review.

Actions: expand examples in prompts; reviewer tips in UI.

Transcription or dictation inaccuracies (where used)

Initial risk: Medium → Control: no automated save; highlight verification; human edit required.

Residual risk: Low. Impact: low; user must verify text and can easily correct.

Data migration inconsistencies (historical imports)

Initial risk: Medium → Control: schema validation, timestamp normalization, reconciliation reports.

Residual risk: Low. Impact: low; discrepancies surfaced for manual fix prior to use.

Audit & Review Schedule

Internal audits:

AI note accuracy sampling (monthly): check omissions, hallucinations, contradictions.

Error‑rate trends (monthly): track variation across features; investigate spikes.

Clinical appropriateness checks (monthly): tone, structure, goal alignment.

Data migration checks (as needed): timestamp, format, completeness verification with reconciliation logs.

Outcomes & evidence:

Findings recorded in risk register with links to sample sets and remediation tasks.

Aggregated summaries included in the quarterly governance report.

Potential Harms, Clinical Impact, and Causes

Potential harms

Inaccurate summaries; missing key statements; misinterpreted dictation; ambiguous phrasing; inappropriate tone.

Clinical impact

Could lead to confusion during follow‑up sessions; extra time to reconcile; risk of incomplete documentation.

Human‑in‑the‑Loop review mitigates impact before finalization.

Causes (examples)

Background noise, low‑quality audio, overlapping speech, domain‑specific terminology, ambiguous source text.

Model non‑determinism in generation tasks; edge phrasing not covered by prompt examples.

Risk Assessment & Acceptability Criteria

Acceptability criteria (examples)

No hallucinations permitted in finalized documentation.

Omission tolerance: residual risk acceptable if omissions are minor, flagged by UI for review, and corrected in normal editing.

Transcription accuracy: within defined thresholds; all drafts require clinician verification.

Initial assessment

Each hazard is rated for likelihood and severity; initial risk recorded in the register.

Controls (examples)

Mandatory human review and finalization gate; structured templates/preview; low‑confidence cues and reviewer tips; change‑control and regression checks.

Residual risk evaluation

After controls applied, re‑rate likelihood/severity; compare to acceptability criteria.

If residual risk exceeds criteria, add further controls or justify deferral with risk‑benefit analysis.

Additional control measures for unacceptable residual risks:

Secondary review for flagged cases; automatic checks (e.g., required‑field prompts); re‑processing low‑confidence dictation; targeted prompt refinements.

Risk‑benefit analysis:

If residual risk cannot be further reduced, weigh the remaining risk against benefits (faster notes, reduced admin burden, improved consistency). Proceed only if residual risk remains within acceptability criteria.

Evidence of implementation:

Confirmation of deployed controls is linked from the register (feature flags, UI cues, logs, training completion, and reliability artifacts).

Acceptance of residual risks:

CSR documents formal sign‑off by the Clinical Governance Lead (and Executive Sponsor for High/Critical items) confirming that residual risks meet acceptability criteria.

Justification for rating reductions:

Where severity/likelihood are reduced, attach evidence (e.g., reliability re‑run demonstrating improved accuracy; production sampling showing lower error rates).

Reliability Evaluation (Latest)

We maintain versioned reliability studies that stress‑test PepperChat against clinical criteria such as accuracy, grounding, absence of hallucinations, and completeness.

Latest report:

Title: Reliability Evaluation of PepperChat's Progress Note Builder

Report Version: 2025.12.05

Issue Date: 2025‑12‑05

Summary of results:

PNB‑1 (Freehand → Preview/Suggestions): 100% accuracy across 20 tests (CI [1.00, 1.00])

PNB‑2 (Suggestions from minimal notes): 100% accuracy across 20 tests (CI [1.00, 1.00])

PNB‑11 (Category detection): 85% accuracy (no hallucinations; misses due to occasional omissions)

Pattern: conservative outputs; variation driven by occasionally incomplete extraction

View details:

A public summary is available on the Reliability page (versioned). Internal, full JSON outputs are archived for audit.

Document Control

Owner: Clinical Governance Lead

Review Cadence: At least annually, and after material AI or workflow changes

Next Scheduled Review: 2026‑01‑31