CoSignal Agentic Risk Detection Case Study

3 Agents in pipeline5 Live API calls4 Risk rules triggered7 AI-generated interventions

Executive Summary

CoSignal is an agentic capability risk detection system built to surface workforce performance gaps before they become operational failures. This case study documents the design, build, and live pipeline run of CoSignal V1 — a portfolio project demonstrating agentic pipeline architecture, provider-agnostic AI design, and measurement-first L&D systems thinking.

The scenario: a Microsoft Dynamics 365 warehouse rollout, three weeks post go-live, 30 employees across three workflow areas. CoSignal ingested simulated signal data, classified risk across all three areas, and generated seven AI-written interventions via live Anthropic API calls — all in a single pipeline run.

CoSignal V1 is not a concept or a mockup. Every number in this case study came from a live pipeline run using real Anthropic API calls. The intervention text was generated by claude-sonnet-4-6 reading actual signal data.

Key Results

OutcomeResult
Pipeline execution3 agents ran in sequence, all stages validated
API calls5 live calls to claude-sonnet-4-6 via streaming
Risk classificationInventory Adjustments = HIGH (4 rules), Receiving = MODERATE, Scanner = WATCH
Interventions generated7 total — 5 AI-generated by Claude, 2 static fallbacks for WATCH area
Provider portabilityAI_PROVIDER=azure activates Azure OpenAI — zero code changes required

The Problem CoSignal Solves

Large technology rollouts fail when capability gaps surface too late. The typical pattern in a D365 implementation looks like this:

  • Week 1-2: Training delivered, completion rates high, leadership satisfied
  • Week 3-4: Support ticket volume climbs, employees call the help desk for basic tasks
  • Week 5-6: Supervisors report frustration, employees revert to spreadsheets and workarounds
  • Week 8+: Leadership asks why the rollout is underperforming — but the damage is done

Traditional reporting identifies these problems after operational damage has occurred. There is no mechanism to detect the capability gap while it is still emerging — before it becomes a business problem.

CoSignal answers one question: where are capability risks emerging right now, before they show up as operational failures? It does not replace training. It detects when training is not transferring and prescribes targeted responses before the window closes.

The D365 Warehouse Context

The V1 scenario is grounded in the real dynamics of a D365 warehouse implementation. Three workflow areas are particularly vulnerable post-go-live:

Workflow AreaTypical Post-Go-Live Risk
Inventory AdjustmentsComplex multi-step process with high exception volume — most common source of errors
ReceivingHigh transaction frequency with tight timing requirements — confidence gaps surface quickly
Scanner OperationsDevice-dependent workflow — near-threshold risk accumulates before rules formally trigger

System Architecture

CoSignal is built on the CoBuild pipeline formula: sequential Python agent scripts, a run_pipeline.py orchestrator, file-based intermediate outputs, and a provider-agnostic AI layer. Each agent has one job. Every intermediate state is inspectable.

The Three-Agent Pipeline

seed_data.json       ↓ signal_aggregator.py    →  aggregated_signals.json   [pure Python, no AI]       ↓ risk_detection.py       →  risk_scores.json          [pure Python, no AI]       ↓ prescription_engine.py  →  prescriptions.json        [claude-sonnet-4-6, streaming]       ↓                         llm_client.py             [Anthropic OR Azure OpenAI]       ↓ React Dashboard         ←  FastAPI /results endpoint

Agent 1: Signal Aggregator — Pure Python

Reads seed_data.json and normalizes three signal types per workflow area into a unified signal object. No AI call — deterministic aggregation.

Signal TypeFields Captured
Support Ticketsticket_volume_7d, unresolved_tickets_48h, unresolved_rate
LMS Recordsavg_quiz_score, high_retake_count, completion_pct
Pulse Surveyavg_d365_confidence, avg_process_clarity, change_readiness

Agent 2: Risk Detection — Pure Python

Applies four threshold-based detection rules. Classification is deterministic — same inputs always produce the same output. No AI call.

Detection RuleThreshold
Process Confusionticket_volume_7d > 8
Knowledge Decayavg_quiz_score > 80 AND high_retake_count > 2
Confidence Gapavg_d365_confidence < 3.0 OR avg_process_clarity < 3.0
SME Bottleneckunresolved_tickets_48h / ticket_volume_7d > 0.60
Risk LevelClassification Logic
HIGH2 or more detection rules triggered for the same workflow area
MODERATEExactly 1 detection rule triggered
WATCHNo rules triggered, but any signal within 20% of a threshold
OKNo rules triggered, no signals near threshold

Agent 3: Prescription Engine — Calls Claude

For each HIGH and MODERATE risk area, sends the signal data to claude-sonnet-4-6 via the Anthropic streaming API and generates a specific, grounded intervention. The system prompt instructs Claude to reference the actual numbers. WATCH areas receive static fallback interventions with no API call.

llm_client.py — Provider-Agnostic Layer

The single most important architectural decision in CoSignal V1. All provider-specific code lives in one file. The Prescription Engine imports only from llm_client — it never imports anthropic or openai directly. Switching providers is a .env change, not a code change.

# .env AI_PROVIDER=anthropic  # or: AI_PROVIDER=azure   # That is the entire switch. # Pipeline, detection logic, dashboard — zero changes.

Live Pipeline Run — Results

The following results are from a live pipeline run on May 24, 2026, using the Anthropic API with claude-sonnet-4-6. All intervention text was generated in real time.

Risk Classification Output

Workflow AreaRisk Level
Inventory AdjustmentsHIGH (Priority 1)
ReceivingMODERATE (Priority 2)
Scanner OperationsWATCH (Priority 3)
The Inventory Adjustments finding is notable: quiz score 85.5 suggests employees know the material, but 64% of tickets remain unresolved past 48 hours. This is the Knowledge Decay pattern — surface-level recall that does not transfer to operational performance under pressure. Claude’s prescriptions directly address this gap.

AI-Generated Prescriptions — Inventory Adjustments (HIGH)

Four interventions were generated — one per triggered rule. Reproduced as generated by claude-sonnet-4-6:

1. Inventory Adjustment Process Clarity Sprint

Triggered by: Process Confusion

Immediately convene a 90-minute hands-on session with all 10 inventory team members where they walk through the exact adjustment scenarios generating the 14 support tickets, using live D365 sandbox data. Create a one-page quick-reference card mapping each adjustment type (write-offs, transfers, corrections) to its specific D365 transaction path, and post it at each workstation. Assign a designated “process buddy” from the 2-3 highest-confidence team members to field same-day questions for the next two weeks, targeting resolution of the 9 tickets stalled beyond 48 hours. Rationale: With process clarity averaging 2.8 and confidence at 2.6 across a small 10-person team, the volume and age of unresolved tickets indicates workers lack a reliable mental model of which D365 path to follow for which scenario — making structured, scenario-based process reinforcement more urgent than additional system training.

2. Inventory Adjustments Hands-On Knowledge Reinforcement Sprint

Triggered by: Knowledge Decay

Schedule two 90-minute in-person lab sessions this week where all 10 employees work through the five most common inventory adjustment scenarios that generated the 14 support tickets, using live D365 sandbox data. Pull the 9 unresolved tickets and convert them into the session’s practice cases, so staff resolve real problems with a trainer present. Identify the 4 employees with 2+ quiz retakes and pair each with a high-performing peer for a dedicated 30-minute walkthrough of their specific failure points before the group sessions begin. Rationale: Quiz scores sitting just above the 80-point threshold, combined with 64% of tickets unresolved past 48 hours, signal that staff have surface-level recall but cannot apply knowledge under operational pressure — requiring practice-based reinforcement against real failure cases rather than additional instructional content.

3. Redistribute Inventory Adjustment Support Load Now

Triggered by: SME Bottleneck

Identify the one or two employees currently handling all escalated tickets and immediately pair them with the 4 employees who have retaken quizzes twice or more, running two 45-minute live walkthroughs this week using the actual unresolved tickets as practice cases. Simultaneously, create a one-page quick-reference card covering the top three recurring ticket issues and post it in the team’s shared channel so all 10 employees can self-serve before escalating. Rationale: With 64% of tickets stalled beyond 48 hours and only a 2.6 confidence score across the team, a single point of SME dependency is creating a resolution backlog that will worsen as go-live volume increases, making immediate load redistribution more urgent than any training or system fix.

AI-Generated Prescription — Receiving (MODERATE)

Receiving Team Hands-On Confidence Building Sessions

Triggered by: Confidence Gap

Schedule two 90-minute supervised practice sessions this week where Receiving staff complete live D365 receiving transactions alongside a system-proficient coach, using actual purchase orders from the past 7 days. Pair the three employees with unresolved tickets directly with the coach first, walking through their specific stuck points in the system rather than generic training. End each session with a 5-minute “I can do this alone” checklist that staff self-sign to mark which tasks they now feel ready to perform independently. Rationale: With average confidence at 2.85 and half of open tickets unresolved past 48 hours, staff are hesitant to act without reassurance, meaning guided repetition on real tasks will close the confidence gap faster than additional instructional content.

Key Architectural Decisions

1. Pure Python for Detection, Claude for Language

Signal aggregation and risk classification are deterministic processes. There is no reason to involve an AI model in threshold comparison. Using pure Python for Agents 1 and 2 means the detection logic is auditable, fast, free to run, and always produces the same output from the same inputs.

Claude is invoked only in Agent 3, where the task is language generation — taking structured risk data and producing specific, readable intervention text. This is the right use of a language model. Classification is not.

2. Structured Markdown Output, Not JSON

The Prescription Engine instructs Claude to return output using labeled plain-text sections rather than JSON. This is a lesson learned from the CoBuild pipeline: JSON responses truncate silently when output is large, producing parse errors. Structured markdown fails gracefully — a partial response produces a partial file, not a crash.

3. Provider Isolation in One File

llm_client.py is the only file in the codebase that imports anthropic or openai. Every other file — including the Prescription Engine — imports only from llm_client. This means the provider can change without touching any business logic, and the codebase can be audited for external API usage by checking one file.

Azure Deployment Guide

CoSignal V1 is designed to run inside enterprise Azure environments. The switch from Anthropic to Azure OpenAI requires a .env change and nothing else.

Prerequisites

  • Active Azure subscription (the same subscription used for D365 if applicable)
  • Azure OpenAI resource provisioned: Azure Portal → Create Resource → Azure OpenAI
  • A deployed model — gpt-4o recommended: Azure OpenAI Studio → Deployments → Create
  • CoSignal V1 running on Anthropic first — verify the Anthropic run before switching

The Switch

# Step 1: In backend/.env, change one line: AI_PROVIDER=azure   # Step 2: Fill in Azure credentials: AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ AZURE_OPENAI_API_KEY=your-key-from-azure-portal AZURE_OPENAI_DEPLOYMENT=gpt-4o AZURE_OPENAI_API_VERSION=2024-02-01   # Step 3: Run normally. Nothing else changes. python backend/run_pipeline.py   # Expected output: CoSignal Pipeline Orchestrator AI Provider: azure   Calling gpt-4o (streaming) via Azure OpenAI…   [ok]  Agent 3 — Prescription Engine — prescriptions.json verified

What Does Not Change

ComponentChanges for Azure?
signal_aggregator.pyNo — pure Python, no API call
risk_detection.pyNo — pure Python, no API call
prescription_engine.pyNo — imports only from llm_client
run_pipeline.pyNo — orchestrator is provider-agnostic
main.py (FastAPI)No — serves results regardless of provider
React dashboardNo — reads ai_provider from prescriptions.json
Detection thresholdsNo — entirely independent of AI layer
Output JSON schemasNo — identical from both providers
For Greentree and government D365 implementations: Azure OpenAI Service runs on the same Azure subscription used for D365. Adding CoSignal requires one resource deployment and one .env change. No new vendor contracts. No data leaving the Azure boundary. No security review for external API calls.

V2 Roadmap

CoSignal V1 demonstrates the architecture. V2 connects it to real data sources.

V1 (Current)V2 (Planned)
Simulated seed dataReal signals from D365, ServiceNow, LMS APIs
Single warehouse locationMulti-location risk aggregation and comparison
Static detection thresholdsAdaptive thresholds based on historical baseline
Manual pipeline executionScheduled execution via Azure Functions or Logic Apps
Local JSON file storageAzure Blob Storage or Azure SQL
React dashboard on localhostAzure Static Web Apps or Power BI embedded
No alert deliveryEmail/Teams alerts for HIGH risk classifications

Portfolio Summary

CoSignal V1 demonstrates four capabilities relevant to AI L&D systems roles and enterprise technology consulting:

  • Agentic pipeline design — CoBuild formula: sequential agents, run_pipeline.py orchestrator with –from N restart, per-stage validation, inspectable intermediate outputs
  • Provider-agnostic AI architecture — llm_client.py isolation pattern, Anthropic to Azure OpenAI with one environment variable, zero code changes
  • Selective AI use — deterministic Python for classification, Claude only for language generation where it adds value
  • Measurement-first L&D thinking — performance signals drive intervention recommendations, not training completion rates
CoSignal positions at the intersection of enterprise L&D, AI systems design, and workforce performance analytics — exactly where D365 implementation firms need capability leadership.
Jason Bouchard Blue EdgeWater  •  AI L&D Systems Design  •  May 2026 blueedgewater.com