Topic
Anomaly Detection
Category
Topics
6 issues matching filters
Anomaly Detection
- Field NotesJun 22, 2026
Retention and downsampling on InfluxDB OSS, and why the detector cannot eat a downsampled stream
The test cell's bucket is filling, and the question the series deferred five times is forced. A year of full-rate vibration does not fit on a 40 GB disk, so the raw stream gets a short window and the long-term record gets built from rollups. The catch is that the anomaly detector from issues 06 through 08 reads the exact transient structure that downsampling throws away, so the long-term memory has to be the feature table, not a coarsened waveform.
Since issue 05 the test cell has written full-rate vibration and five process variables into a single InfluxDB OSS bucket on the 40 GB Hetzner CX22, and every issue since has deferred the question of what happens when the disk fills. Issue 09 forces it, because the bucket is filling on a measurable schedule. Two assets streaming an accelerometer channel at two kilohertz plus their process variables produce on the order of a gigabyte a day after compression, which fills the usable disk in roughly three weeks, so retention is no longer optional. The naive fix, a short retention period on the raw bucket, drops history the plant will want the day a bearing fails. The standard fix, downsampling raw data into coarser rollup buckets, is lossy in exactly the dimension that matters, because the anomaly detector from issues 06 through 08 computes its features from the transient shape of the waveform and a one-minute mean keeps none of it. The resolution is to separate the two records the system actually needs. The raw waveform is a short forensic buffer with a seven-day retention period. The long-term memory is the per-window feature vector the edge already computes, which is small enough to keep at full event cadence for years. Downsampled rollups serve the dashboards. A flagged anomaly triggers a pin that copies its raw window into a permanent forensic bucket before the retention sweep can delete it, because a retention policy is a destructive operation on a timer and the window you needed is always the one that just expired. The cost line is unchanged at $5.50 a month plus an optional storage box for cold export, and the reader retrofit window has still not closed.
Influxdb·Retention·Downsampling·Flux·Telegraf·Time Series - Field NotesJun 15, 2026
Tool-change auto-detection, and the case for asking the controller instead of guessing from the signal
The last unbuilt feature from the issue 06 vendor comparison. A signature detector on the spindle stream can learn what a tool change looks like, but the controller already knows. This issue builds the controller-fed path with MTConnect and keeps the matched-filter detector as the fallback for machines that will not talk.
Issue 06 priced a vendor condition-monitoring quote line by line against the open stack and left four features to build. Issues 06 and 07 built three of them: the single-asset Isolation Forest, the cross-asset common-mode gate, and the threshold alerting. The fourth was tool-change auto-detection, the feature that stops the model from paging on the large vibration and current transient a planned tool change produces, which issue 06 logged as a false positive that turned out to be a real but miscategorized event. Issue 08 builds it. The signal-only approach treats the tool-change transient as a known signature and matches it with a matched filter, the optimal detector for a known template in noise. It works, and it has the failure mode every blind detector has: it fires on anything that resembles the template and stays silent on the tool change that does not. The controller-fed approach skips the inference entirely. The CNC already knows it is changing a tool, and MTConnect exposes that as a structured data item the edge can read and publish alongside the sensor stream. The model gates its own scoring on the controller's ground truth. Both are built. The controller-fed path is the recommendation, the matched filter is the fallback for machines with no open data interface, and the honest failure mode of gating any planned event is named: a real fault that begins inside the gate window is suppressed until the window closes.
Mtconnect·Tool Change·Anomaly Detection·Matched Filter·Dtw·Change Point - Field NotesJun 8, 2026
Cross-asset correlation on the open stack, and what it actually buys: fewer false alarms, not more catches
Two spindles, one PyOD model that scores them jointly, rendered on the same Grafana dashboard. The cross-asset feature the vendor quote priced at a premium, built for the cost of the second carrier the second asset needed anyway. The result is mostly a false-positive filter, which is the opposite of how the feature is sold.
Issue 06 deployed a single-asset Isolation Forest in 80 lines of Python and priced the vendor quote against it line by line. One line item survived as genuinely not built: cross-asset correlation, which the SaaS markets as ensemble-learning anomaly detection with cross-asset correlation and which issue 06 estimated at roughly one engineer-week. Issue 07 builds it. A second i.MX 8M Plus carrier on a second spindle publishes to the same broker, a PyOD model scores both assets in one joint feature frame, and a common-mode detector flags the case where both assets move together. The result reframes what cross-asset correlation is for. It catches almost nothing the single-asset model does not. Its real value is the inverse: it rejects the environmental false positives from issue 06, the cooling-system cross-talk and the floor-vibration bleed-through, by recognizing them as plant-wide rather than asset-specific. That is a meaningful feature, and it is not the feature the marketing describes.
Pyod·Cross Asset Correlation·Anomaly Detection·Ecod·Common Mode Rejection·Self Hosted - Field NotesJun 1, 2026
Isolation Forest in 80 lines of Python, against the vendor 'AI/ML machine health' pitch
The model layer on top of the open IIoT stack. Scikit-learn trained on InfluxDB data, anomaly scores written back, Grafana renders the result on the same dashboard. What the model catches that the threshold rules miss, and what the $1,200-per-asset SaaS sells on top.
Issue 05 stood up Grafana, InfluxDB, and Telegraf on the same $5.50/mo Hetzner VM that runs the Sparkplug B broker. The alerts in that issue were threshold rules: RMS velocity above ISO 10816 Class II, drive current above 110% nameplate, anomaly score above a fixed line. Issue 06 builds the model that produces the anomaly score, in 80 lines of Python. Scikit-learn Isolation Forest, trained nightly on the prior 14 days of telemetry from InfluxDB, scoring live frames every 10 seconds and writing the score back as its own measurement. The comparison: the same vendor SaaS quote from issue 05, this time looking at what its 'AI-driven machine health' line item actually buys on top of the open implementation. The model catches one real failure pattern that the threshold rules miss. It also produces a class of false positives that the threshold rules do not. The honest read on which of the two layers should sit in front of the maintenance team.
Isolation Forest·Scikit Learn·Anomaly Detection·Machine Learning·Self Hosted·Influxdb - Field NotesMay 11, 2026
When the $240 pilot graduates
The mid-range step — a real industrial accelerometer on an NXP i.MX 8M Plus carrier — and the moment in-house ML stops being cheaper than a vendor service.
Issue 02 ran a $240 edge-ML bench on a 1995 spindle. Issue 03 is the next hop: industrial IEPE accelerometers on an NXP i.MX 8M Plus carrier with a hardware NPU. BOM $2,847, inference latency 4 ms, recall 96%. The build is real. The harder question is when it stops being cheaper than a vendor service like Augury or Sight Machine — and the answer is sharper than I expected.
Edge Ml·Anomaly Detection·Imx8m Plus·Iepe Accelerometer·Predictive Maintenance·Augury - Field NotesMay 4, 2026
The $240 spindle retrofit
An edge-ML anomaly bench on a twenty-year-old spindle motor — exact BOM, exact data pipeline, exact failure modes.
Last week's scorecard ranked Edge Impulse on the Arduino Opta the highest-floor pilot in the AI-on-the-PLC category. This week the bench. Total BOM $238.94, two days of data capture, three failure-mode labels, GMM model running 23 ms inference. What worked, what didn't, and the four mistakes that nearly killed the pilot.
Edge Ml·Anomaly Detection·Arduino Opta·Edge Impulse·Spindle·Predictive Maintenance