Back to DC Solutions
Pro Module Concept Brief
This is a concept-and-design document, not a production product. It documents the technical basis (FMECA + KG + ML + NLP architecture, 12 engineering gaps, 5-phase roadmap, and a 826-row worldwide FMECA seed dataset). The production industrial-advisor build (Maintenance Intelligence Workbench — 11 screens, multi-tenant RBAC, real CMMS / BMS / edge integrations) is a separate, multi-year initiative tracked in docs/plans/2026-05-25-ai-maintenance-product-roadmap.md. AI is advisory-only by default; physical control remains in SIS / protection relays / BMS engineered sequences (IEC 61508).
Prescriptive Maintenance · FMECA + KG + ML + NLP

AI Engineering Maintenance — Intelligent Advisor System Concept

A concept-and-design brief for a prescriptive maintenance advisor that fuses FMECA documentation (IEC 60812), a Neo4j knowledge graph, a Random Forest + PCA diagnostic classifier, and a rule-based NLP layer (Aho-Corasick) into a single closed-loop advisor for engineered assets. Two operator interaction modes: ask in natural language (By API), or upload sensor data and let the engine diagnose (By Engine). This page synthesizes the paper, surfaces 12 engineering gaps, and proposes concrete enhancements with a phased build roadmap.

Source paper: Lin, H. & Ompusunggu, A. P. (2026). Intelligent Advisor System for Prescriptive Maintenance of Engineered Assets Using FMECA, Knowledge Graph and Machine Learning. Artificial Intelligence for Engineering (Wiley / IET). DOI: 10.1049/aie2.70019. Case-study dataset (Cranfield): 10.17862/cranfield.rd.5097649.

MODULE 1
FMECA Documentation
IEC 60812 tabular failure analysis. Components, Faults, Failures, Actions, Mechanisms, Effects, Steps, SOD·RPN.
MODULE 2
FMECA Knowledge Graph
Neo4j graph: 10 node types · 12 relationship types. Cypher-queryable single source of truth for asset reasoning.
MODULE 3
Maintenance Analytics (ML)
Random Forest + PCA over 26 statistical features from position-error + motor-current signals. Bayesian-tuned.
MODULE 4
NLP Layer
Rule + dictionary based: Aho-Corasick word segmentation, NER to FMECA entity classes, Cypher template selection.

Architecture — The Paper's Baseline System

The advisor system has four cooperating modules. FMECA is the source-of-truth documentation; the Knowledge Graph turns the tabular FMECA into a queryable Neo4j instance; the Maintenance Analytics module diagnoses faults from raw sensor data; and the NLP layer translates human questions into Cypher queries. The same Cypher-template tail is reused by both interaction modes, which keeps the answer surface deterministic.

Knowledge / FMECA path Data / NLP path Reply / Advisor output External integration (abstract in paper)

The Four Modules

Each module has a distinct role, a clear input contract, a clear output contract, and a specific tech-stack pick. The cards below name each one, with a sub-flowchart showing its internal pipeline.

M1
FMECA Documentation
IEC 60812 tabular
Failure Mode, Effects & Criticality Analysis. The source-of-truth worksheet built from design knowledge and operational degradation mechanisms. Defines the canonical entities the rest of the system reasons over.
Input
Design specs, expert interviews, historical failure reports
Output
Tabular FMECA worksheet (Components, Faults, Failures, Mechanisms, Effects, Actions, Steps, SOD·RPN)
Tech
IEC 60812 standard · Excel / structured CSV
Asset Design Field History FMECA Workshop IEC 60812 WorksheetRPN = S × O × D
M2
FMECA Knowledge Graph
Neo4j · Cypher
Converts the tabular FMECA into a graph: 10 node types linked by 12 relationship typesHAS_COMPONENTS, HAS_FAULTS, LEADS_TO, HAS_MECHANISM, HAS_RPN, HAS_STEPS, HAS_ACTIONS_OF_FAILURES and similar. Queryable via Cypher templates from both modes.
Input
FMECA worksheet (M1 output)
Output
Live Neo4j graph; Cypher response sets
Tech
Neo4j 4.x · py2neo or Cypher driver · schema mapper script
Case
12 components, 23 faults, 20 failures, 34 actions, 35 mechanisms, 32 effects, 11 SOD·RPN tuples
Comp Fault Fail Effect Mech Action Steps RPN 12 relationship types · Cypher queryable
M3
Maintenance Analytics (ML)
Random Forest + PCA
From raw sensor windows, extracts 26 statistical features (overshoot OVy/OVz, peak values Py/Pz, peak locations, mean steady-state S_z, σ_z, RMS_y/z, skewness SK_y/z, crest factor C_y/z, max-peak-near-max-speed Ps_z). PCA reduces dimensionality; RF classifies. Bayesian optimisation tunes hyperparameters.
Input
Position-error + motor-current sensor stream, 25 Hz, 16-s windows
Output
Fault-class label (Normal / Backlash / Spalling / Lack-of-lubrication)
Tech
scikit-learn RandomForestClassifier · PCA · scikit-optimize / Optuna
Optimal
51 trees · 7 PCA components · 80/20 train/test split
Sensors 26 FeaturesOV, RMS, SK, C ... PCA7 comp. RF (51)Bayesian opt Label Macro F1 = 84.84% · Precision 85.00% · Recall 84.76%
M4
NLP Layer
Aho-Corasick · rule-based
Three-stage rule-based NLP: (a) word segmentation; (b) Named Entity Recognition (NER) maps words to FMECA entity classes — Action / Component / Effect / Failure / Fault / Mechanism; (c) question classification picks a Cypher template, runs it, formats the answer. Deterministic, explainable, no hallucination.
Input
Operator question in natural language
Output
Cypher query · result set · templated answer
Tech
Python pyahocorasick · dictionary built from FMECA entities · lookup-table classifier
Question"what causes...?" Word seg.Aho-Corasick NEREntity class Questionclassifier Cypher template selected

Two Interaction Modes

The user reaches the advisor through one of two entry points. Mode A is for the engineer who knows what to ask; Mode B is for the technician with raw data but no diagnosis. Both modes terminate at the same Cypher-template tail, so the answer surface is consistent — the upstream arrival path is the only thing that changes.

Mode A · By API
Natural-Language Query Path
Operator types a question. NLP segments, classifies, picks a Cypher template, runs it on the FMECA-KG, and returns a templated answer. Bounded to the paper's predefined Cypher templates.
Operatortypes question Word seg.Aho-Corasick NER Q. classtpl. selector Cypher run on FMECA-KGNeo4j Formatted answer to operatorroot cause + actions + steps Strength: deterministic, explainable, traceable
Q: "what is the cause of spalling?"
A: "Spalling has descriptions: Surface defects cause irregular movement and increased wear. May arise from components: nuts, screws, balls, bearings. Mechanisms: surface fatigue, contact-stress concentration. Prescribed actions: replace contaminated grease, inspect raceway, schedule bearing overhaul. RPN tuple = (S, O, D)."
StrengthExplainable. Same query always returns the same answer. No hallucination.
WeaknessBounded to predefined Cypher templates. Cannot handle open-ended or unseen phrasings.
Mode B · By Engine Algorithms
Sensor-Upload Diagnostic Path
Operator uploads raw measurement data (paper's CLI: predict ./dataset/output_3). PCA + RF classify the fault. Predicted label feeds the same Cypher tail as Mode A — same answer surface, different arrival path.
Operatoruploads data Feature ext.26 stats PCA (7) RF (51)vote & class Cypher run on FMECA-KGlabel -> root cause Formatted answer to operatorlabel + cause + actions + steps Strength: data-driven, no human framing
Upload: predict ./dataset/output_3 (label = Backlashneg40)
A: "All 10 motion cycles classified as Backlash. Linked components: bearings, nuts, balls, screws. Failure descriptions and effects attached. Corrective actions: check pre-load, re-shim, replace nut if wear exceeds tolerance. Steps: 1. de-energise; 2. lock-out; 3. dismantle..."
StrengthCatches faults from raw signal anomalies. No expert framing required.
WeaknessBounded to 4 known classes. Cannot detect novel failure modes — will force-fit.

Case-Study Numbers — What the Paper Actually Reports

Asset under test: a single linear actuator (Cranfield dataset). Four fault states × three loads × 50 motion cycles = 600 events. Results below are quoted verbatim from the paper — no fabricated numbers, no averaged-up restatements.

Faults
4
Normal, Backlash, Spalling, Lack-of-lubrication
Loads
3
20 kg, 40 kg, -40 kg
Total events
600
12 conditions × 50 cycles
Sample rate
25 Hz
16-second motion cycles
Macro F1
84.84%
overall, after Bayesian opt
Precision / Recall
85.00 / 84.76
%
RF trees
51
optimum
PCA components
7
optimum
Fault classPer-fault F1Notes
Lack-of-lubrication97.12%Best performance — all events correctly classified across the 3 loads
Backlash83.50%Solid; backlash signal has distinct overshoot/undershoot signature
Normal80.77%Some leakage with weak Spalling cases (early-stage degradation overlaps)
Spalling77.98%Weakest — early-stage surface defects share signal characteristics with normal/backlash; this is the documented model weakness

The 77.98% Spalling F1 is the canonical opportunity for enhancement (see Gap #10 below). Spalling is also the most economically valuable diagnosis — catching it early prevents secondary damage to nuts, balls, and screws.

Gaps & Proposed Enhancements

Twelve concrete gaps surfaced from a critical read of the paper. Each is paired with a proposed enhancement that names the algorithm or library, not just the wishful direction. Open any gap below.

01 Single-asset case study — no fleet generalization
Gap
The paper validates on one linear actuator. No discussion of how the KG, ML, or NLP behave across asset families (pumps, chillers, switchgear, gensets) or across multiple sites.
Why it matters
Engineering operations rarely have one asset. A fleet has thousands. Transfer learning, cross-asset reasoning, and federated training are required for operational value.
Enhancement
Build an ontology-driven KG with asset taxonomies (parent: RotatingMachine, children: Pump, Compressor, Motor; FMECA inherits through the hierarchy). Layer federated learning (FedAvg or FedProx) on the RF model so multiple sites contribute gradients without sharing raw sensor data.
02 Static FMECA — no operational feedback loop
Gap
The KG is frozen at design time. Field failures, novel fault patterns, and lessons learned from completed work orders never propagate back into the FMECA or the KG.
Why it matters
The most valuable knowledge in an engineering operation is what the technicians learn doing the job. A static FMECA wastes it. The advisor stays as smart as the day it was commissioned.
Enhancement
Close the loop with an experience-capture pipeline: when a CMMS work order is closed, an NER pass over the completion notes extracts (action, observed-fault, observed-mechanism) triples and proposes a KG diff. A reliability engineer reviews and accepts in a lightweight admin UI. Auditable, reversible, traceable.
03 No CMMS integration — prescription stops at advice
Gap
Paper mentions CMMS work orders abstractly. Advisor output is a text answer — not an action in the maintenance system. No bi-directional sync.
Why it matters
True "prescriptive" maintenance requires the prescription to land in a planner's queue, with parts reserved, technicians assigned, and an estimated MTTR. Otherwise it's just decision support.
Enhancement
Bi-directional CMMS hook via REST or webhook (IBM Maximo, SAP PM, Infor EAM, Fiix, UpKeep). On advisor output: auto-create WO with prescribed steps, BOM, MTTR estimate, criticality from RPN. On WO completion: capture outcome back into the KG via Gap #02's experience-capture pipeline.
04 PCA opacity — loss of feature interpretability
Gap
PCA collapses the 26 engineering-meaningful features (RMS, skewness, crest factor, overshoot...) into 7 abstract components. A technician cannot tell which signal feature drove a classification.
Why it matters
Maintenance staff need explainable diagnoses. "The RF says backlash" is not actionable; "the RF says backlash because overshoot OVy is 3σ above baseline" is.
Enhancement
Complement PCA with SHAP (TreeSHAP for RF, very fast) or built-in Gini-importance ranking, computed on the original 26 features in parallel. Surface the top-3 driving features per prediction in the advisor UI. Optional: keep PCA only for visualization, run RF on raw features for production scoring.
05 No anomaly detection — novel faults are force-fit
Gap
RF only chooses among the 4 known classes. Any unseen fault mode (e.g. a sudden seal failure) gets force-classified into the nearest known label, silently and confidently wrong.
Why it matters
"Unknown unknowns" are the most expensive failures. A diagnostic system that lies confidently in their presence destroys operator trust.
Enhancement
Add an anomaly pre-filter: Isolation Forest or One-Class SVM trained only on Normal data. Score on every new cycle. If anomaly-score > threshold AND RF confidence < threshold ⇒ flag "anomalous but unrecognised" and route to a reliability engineer (human-in-the-loop) instead of force-fitting.
06 No Remaining Useful Life — diagnostic, not prognostic
Gap
The system diagnoses what is failing, not when. RUL prediction is absent. "True" prescriptive maintenance must be time-aware so the planner can schedule before fault, not after.
Why it matters
Without RUL, the advisor can only react. With RUL, the advisor can schedule the work order to land 2 weeks before predicted failure, with parts in the cage and a technician booked.
Enhancement
Add a survival-analysis regressor (Cox Proportional Hazards or DeepSurv) or a recurrent RUL regressor (LSTM/GRU over the feature time-series). Train on degradation paths in the dataset. Output a posterior over RUL with quantile bands, not a single number.
07 No vendor / spare-parts / cost integration
Gap
Prescription stops at "do action X" + "follow steps Y". No link to part availability, vendor catalog, lead-time, or cost. The planner has to do all of that lookup separately.
Why it matters
If the prescribed action requires a bearing the warehouse doesn't carry with a 12-week lead time, the advisor's recommendation is operationally useless.
Enhancement
Link the FMECA-KG to the existing Spares Readiness Calculator spare-parts catalog. KG nodes get HAS_SPARE_PART edges into SPARES_CATALOG entries with availability, lead-time, cost, and supplier. Advisor output becomes: "replace bearing X — in stock at Bekasi warehouse, $420, 0-day lead time".
08 No edge / offline mode — cloud-only assumption
Gap
Architecture assumes cloud Neo4j + cloud ML inference. Industrial sites often have intermittent connectivity, data-sovereignty constraints (data must not leave the plant), and latency requirements that exclude a cloud round-trip.
Why it matters
Petrochemical, defence, and primary-industry sites won't deploy a cloud-only advisor. Many DC operations sites have the same constraint.
Enhancement
Distill the RF to a quantized ONNX / tflite model runnable on a Raspberry Pi 5 or Jetson Nano gateway. Replace Neo4j with SQLite + recursive CTEs or an embedded graph store (TerminusDB-embedded). Sync deltas to cloud KG when connectivity returns. Same Cypher template surface, two backends.
09 Rule-based NLP fragility — can't handle unseen vocabulary
Gap
Aho-Corasick is a literal-string matcher. If a technician types "lube starvation" instead of "lack of lubrication", or "pitting" instead of "spalling", the NLP fails silently.
Why it matters
Field vocabulary varies by region, language, vendor, and seniority. A diagnostic system that needs the operator to use exact dictionary terms won't survive contact with reality.
Enhancement
Add a small-LLM rewrite stage (Phi-3 mini, Gemma-2B, or Mistral-7B-Instruct quantized) that paraphrases the user's input into canonical FMECA vocabulary, but gate by Cypher-template generation: the LLM never produces free-form output. It only emits a structured Cypher template ID + entity slots, which the deterministic backend executes. No hallucination because the answer surface stays templated.
10 Spalling early-stage misclassification (F1 = 77.98%)
Gap
The paper's documented weakest class. Early-stage spalling has signal characteristics overlapping normal and backlash; the 26 time-domain stats miss the fault-mechanism-specific harmonic content.
Why it matters
Spalling is the most expensive failure mode to miss — it cascades. Early detection saves the raceway, the balls, the seal, and possibly the entire bearing.
Enhancement
Add wavelet-domain features: Continuous Wavelet Transform energy at bearing-fault-mechanism-specific frequencies (BPFO, BPFI, BSF, FTF computed from geometry + speed). Optionally pre-train a self-supervised contrastive encoder (SimCLR-style) on unlabeled motion cycles before RF, so embeddings cluster spalling cases tighter even at early-stage. Expected lift: spalling F1 from 77.98% to >88%.
11 No multi-fault concurrency — single-label assumption
Gap
RF emits exactly one class per cycle. Real degraded assets often have multiple concurrent failure modes — e.g. spalling combined with lack-of-lubrication — that the current model collapses to whichever has the higher vote share.
Why it matters
If a bearing is both spalling AND lubrication-starved, prescribing only "re-grease" while ignoring spalling will accelerate the spalling. Single-label diagnosis can prescribe a partially correct, partially harmful action.
Enhancement
Convert to multi-label RF via one-vs-rest binary RFs per class with sigmoid-thresholded outputs; or migrate to a multi-head neural classifier (shared encoder, per-class sigmoid heads). KG response template then aggregates multiple HAS_FAULTS matches into a combined advisory.
12 No uncertainty quantification — raw vote share, not calibrated
Gap
RF outputs vote share, which is correlated with confidence but not calibrated. There is no statistical guarantee on the prediction; no prediction interval.
Why it matters
A reliability engineer needs to know how confident the advisor is before authorising an unplanned shutdown. "85% probable backlash" is not the same as "95% confident the true class is in {backlash, lack-of-lubrication}".
Enhancement
Apply temperature scaling on validation set for vote-share calibration, then layer conformal prediction (Mondrian conformal for per-class coverage) to emit prediction sets with a guaranteed marginal coverage (e.g. 95%). Advisor UI shows: "Likely Backlash; 95% confidence set = {Backlash, Lack-of-lubrication}".

Proposed Enhanced Engine Architecture

The diagram below incorporates the 12 enhancements into a single architecture. Green dashed lines are the feedback / experience-capture loops. Red blocks are safety nets (anomaly detection, uncertainty quantification). Amber blocks are knowledge surfaces. Cyan blocks are inference.

Inference stack Knowledge surface Output / feedback (dashed = closed-loop) Safety net (anomaly / uncertainty)

Phased Build Roadmap

Five phases. Phase 1 reproduces the paper's baseline; Phases 2-5 add the 12 enhancements in dependency order. Phase complexity is rated by effort, not difficulty — the actual hard part is the cross-functional sign-off, not the code.

Phase 1 · MVP
Paper Baseline
  • FMECA worksheet for 1 asset
  • Neo4j KG (10 node types)
  • 26-feature extraction + PCA + RF
  • Aho-Corasick NLP
  • CLI for Mode A & Mode B
  • Replicates 84.84% F1
Effort: 4-6 weeks · 1 ML eng + 1 reliability eng
Phase 2 · Close the loop
CMMS + Spares + SHAP
  • Gap #03 — bi-directional CMMS hook
  • Gap #07 — Spares Catalog edges
  • Gap #04 — SHAP explainer in UI
  • Web UI replaces CLI
  • Auth tiering integration (Pro+)
Effort: 6-8 weeks · +1 full-stack eng
Phase 3 · Safety nets
Anomaly + RUL + Uncertainty
  • Gap #05 — Isolation Forest pre-filter
  • Gap #06 — RUL regressor (Cox-PH or LSTM)
  • Gap #12 — conformal prediction
  • Human-in-the-loop queue UI
  • Calibration test rig
Effort: 8-10 weeks · +1 ML eng
Phase 4 · Lift the ceiling
LLM NLP + Multi-Label + Ontology
  • Gap #09 — gated small-LLM rewrite
  • Gap #11 — multi-label RF
  • Gap #01 — ontology-driven KG
  • Gap #10 — wavelet features
  • Gap #02 — experience-capture
Effort: 10-14 weeks · cross-functional
Phase 5 · Field-ready
Edge + Federated
  • Gap #08 — ONNX edge model
  • Gap #08 — SQLite-graph fallback
  • Gap #01 — FedAvg across sites
  • Provenance ledger + audit export
  • Pilot site deployment
Effort: 12+ weeks · field eng + DevOps

Open Questions for the Owner

Eight questions to answer before we can scope an MVP. Each one shifts the architecture materially.

Q1
Target asset class? The paper validates on a linear actuator. Are we targeting data-center rotating machines (chillers, CRAH fans, gensets, UPS rotating mass), electrical assets (switchgear, transformers, breakers), or industrial process equipment? Each implies a different FMECA seed and sensor topology.
Q2
What sensor streams are available today? Vibration accelerometers? Motor-current signature analysis (MCSA)? BMS analog points (temp, pressure, position)? Acoustic emission? The 26-feature spec only fits if the data look like the paper's. List what's available now vs what would need to be retrofitted.
Q3
Do we have existing FMECA documents? A live operations site usually has either nothing, a vendor-supplied OEM FMECA, or an in-house spreadsheet. Whichever it is — that's our M1 seed. If nothing exists, Phase 1 starts with an FMECA workshop, not code.
Q4
Which CMMS, if any? Maximo, SAP PM, Infor EAM, Fiix, UpKeep, or an Excel sheet? This decides the Gap #03 integration shape: REST API hook, ETL batch sync, or webhook listener.
Q5
Deploy target? Pure cloud (Neo4j Aura + sklearn on Vercel/Render)? Edge (Pi5/Jetson per asset)? Hybrid (edge inference + cloud KG)? The choice drives Gap #08 priority and the model-distillation budget.
Q6
Labeling capacity? Who labels the historical sensor data into fault classes — a reliability engineer with hours per week, or do we need self-supervised pre-training because labels are scarce? If labels are scarce, Gap #10's contrastive pre-training jumps to Phase 2.
Q7
Success metric? Macro F1 like the paper? Mean-time-to-detect (MTTD)? Avoided unplanned downtime (hours/year)? Avoided cost? The metric drives the loss function, the validation rig, and the cost-justification story.
Q8
Regulatory / safety case scope? Is the advisor advisory-only (no closed-loop control) or does it actuate anything (auto-shutdown, auto-throttle)? If actuating, Gap #12 conformal prediction + Gap #05 anomaly net become safety-critical, not optional. IEC 61508 / 62443 may apply.

Knowledge Base — Worldwide FMECA Seed Dataset (2026-05-23)

The Lin & Ompusunggu architecture is unopinionated about which failure modes it ingests. The graph is only as useful as its seed data. To take the concept beyond a rotating-machinery prototype, we commissioned a parallel research run for worldwide industrial-asset failure-mode data scoped to the data-center estate. Output dropped into docs/research/2026-05-23-fmeca-kg-worldwide-asset-failure-data.md plus eight CSV seed files ready for Neo4j ingestion.

20
Asset families covered (electrical · cooling · controls · fire/life-safety · mechanical/civil)
109
Fault modes documented (avg ~5 per family)
826
KG-ready data rows across 8 CSV seed files (834 lines incl. headers)
46
Primary citations (CIGRE, IEEE 493, ASHRAE TC 9.9, NFPA, NETA, OREDA 7e, NPRD-2016, FMD-2016)

Headline findings

Confidence tiers

Every fault row carries a confidence_tier column (high / medium / thin). Default stance is advisory-only. Confidence tiers gate the engine's recommendation routing, not autonomous action. Engine treatment:

CSV seed inventory

FileRowsMaps to KG node / edge
components.csv144Component nodes
faults.csv109Fault nodes (1 per row)
failures.csv109Failure-state edges
actions.csv138Corrective + preventive action nodes
mechanisms.csv99Physical degradation mechanism nodes
effects.csv42Effect-of-failure rows (local / system / business)
steps.csv76Procedure-step nodes (tool · skill · duration · safety)
sod_rpn.csv109Severity / Occurrence / Detectability + RPN values
Gap #13 — Liquid-cooling fault-mode telemetry below industry benchmark (new, surfaced by this dataset)
Liquid-cooling and immersion-cooling primary sources thin out to ASHRAE TC 9.9 + OCP + one ASME paper. Magnetic-bearing chillers and flywheel UPS depend largely on vendor-stated MTBF rather than independently audited data. Mitigation: a vendor-outreach handoff doc has been scheduled (Vertiv, CoolIT, Asetek, Boyd for liquid; Starline / Universal Electric for busway; Trane / York / Daikin for magnetic-bearing chillers; Piller / Hitec / Active Power for flywheel UPS). NDA-backed telemetry requests in flight.

Full report: docs/research/2026-05-23-fmeca-kg-worldwide-asset-failure-data.md · CSV seed files in docs/research/csv/ · Standard: KNOWLEDGE_BASE_STANDARD.md

Pro or Educator access required

This concept brief is for Pro and Educator accounts. Sign in to continue, or return to the DC Solutions hub.