Sensor-Level AI: A 380-Parameter Architecture Resistant to Drift and Noise / Habr

Much attention is currently focused on the size of neural networks and the gigawatts of power consumed by data centers. However, the future lies not only in giant clusters but also in tiny chips embedded directly into the sensing elements of hardware. When a neural network is placed directly inside a sensor chip, it must be exceptionally efficient.

Through experimentation, I have successfully built a neural network architecture with 380 parameters (with potential for further reduction), capable of operating in conditions considered unsuitable for conventional algorithms.

Technology Stack

The main feature of the architecture is the capability to be embedded directly into the sensor logic:

Integer-only Nature: The algorithm is implemented entirely using addition, subtraction, and bitwise shifts (int-only). This allows for the deployment of AI into controllers lacking a Floating Point Unit (FPU).
Minimal Resource Footprint: The Nano-class model utilizes 380 trainable parameters. This allows the neural network and its overhead to fit within a couple of kilobytes of memory.
Linearity and Stability: Only linear calculations are used. The system is not prone to nonlinear artifacts, inference uncertainty, or exploding gradients. If the signal exceeds normal limits, the architecture maintains predictability due to rigid physical constraints embedded in the structure.
Environmental Robustness: Built-in protection against baseline drift and automatic adaptation to the noise floor.
High Fault Tolerance: High resilience to neuron loss; in scenarios where chips operate unstably or suffer partial failure, the sensor continues to function.

Experiments conducted on synthetic data are presented below.

Used Network Tiers

Level	Parameter Count	RAM (INT16)	Potential Application
Nano	~380	~0.8 KB	Ultra-small sensors, single-channel monitoring.
Micro	~4,000 – 5,000	~8 – 10 KB	Complex signal analysis (morphology + stability) within 16 KB.
Medium	~10,000 – 12,000	~20 – 24 KB	Simultaneous analysis of 2-3 coupled signals (e.g., pressure + RPM + temperature).
Large	~40,000	~80 KB	Complex predictive analytics, operation at the level of entire assemblies (complete diesel engine), search for rare correlations.

In all experiments, the name Goliath refers to a classic Convolutional Neural Network (CNN) with standard convolution and pooling layers, lacking specific signal cleaning mechanisms.

Experiment 1: Algorithm Testing Without Task-Specific Tuning

The same architecture can solve different tasks without code modification—basic training is sufficient. Simple tuning dramatically improves noise resilience without increasing the model size. I tested the vanilla version of the new architecture (David) on three signal types: heart (ECG), bearing vibration, and accelerometer data.

TASK: ECG Rhythm

D = David
G = Goliath (Standard Neural Network)

Size	D_Ideal	D_Noise3x	D_Drift50	D_Both	G_Ideal	G_Noise3x	G_Drift50	G_Both
Nano	100.0±0.0	53.7±5.9	100.0±0.0	53.1±6.0	100.0±0.1	75.3±13.6	50.5±2.0	49.1±2.3
Small	100.0±0.0	56.5±7.8	100.0±0.0	56.0±8.1	100.0±0.0	87.9±5.7	48.9±2.3	49.4±2.0
Medium	100.0±0.0	75.4±12.9	100.0±0.0	76.1±12.1	100.0±0.1	91.7±2.4	50.0±2.1	49.2±2.5
Large	100.0±0.0	79.9±12.1	100.0±0.0	78.8±13.7	100.0±0.0	90.1±7.8	49.7±2.0	50.8±3.0

TASK: BEARING (Bearing Vibration)

The most stable task for David: even under noise, accuracy exceeds 80% on the Medium size, and drift is completely ignored.

Size	D_Ideal	D_Noise3x	D_Drift50	D_Both	G_Ideal	G_Noise3x	G_Drift50	G_Both
Nano	98.4±0.6	77.0±5.6	98.5±0.6	76.8±5.2	98.1±0.6	72.3±4.8	49.6±2.3	50.9±2.5
Small	98.7±0.8	79.4±4.2	98.6±0.6	79.4±3.2	98.4±0.8	72.2±5.7	50.3±2.8	50.4±2.5
Medium	98.9±0.4	80.7±2.9	98.9±0.5	80.6±4.1	98.7±0.5	75.8±3.9	50.1±2.5	50.7±1.8
Large	99.0±0.5	82.1±3.4	99.0±0.5	82.1±3.5	98.9±0.5	79.2±4.8	50.3±1.6	49.3±2.6

TASK: Pedometer (Accelerometer)

Here, David Large achieves 91% accuracy under drift + noise conditions, whereas Goliath merely guesses at 50%.

Size	D_Ideal	D_Noise3x	D_Drift50	D_Both	G_Ideal	G_Noise3x	G_Drift50	G_Both
Nano	100.0±0.0	57.1±5.9	100.0±0.0	58.1±5.9	100.0±0.1	83.1±8.0	51.0±7.9	50.7±2.9
Small	100.0±0.0	68.3±8.3	100.0±0.0	67.5±8.9	100.0±0.0	92.1±3.7	48.9±2.7	49.4±3.0
Medium	100.0±0.0	80.7±9.0	100.0±0.0	80.6±9.4	100.0±0.0	93.7±4.0	49.7±2.3	49.4±2.2
Large	100.0±0.0	91.6±4.9	100.0±0.0	91.0±5.2	100.0±0.0	96.1±2.3	49.4±2.5	49.3±2.7

Drift Invariance: David demonstrates 100.0% stability under 50g drift across all tasks. Meanwhile, the classic CNN (Goliath) degrades to the level of random guessing (~50%). This is direct proof that the architecture successfully separates the AC component (variable) from the DC component (constant).
Noise Barrier: The performance drop in the HEART task (D_Noise3x) confirms the need for task-specific tuning.
Scaling Advantage: As the size increases from Nano to Large, David adapts significantly better to noise while maintaining drift stability.

The result of Goliath (CNN) under drift at the 50% level is the mathematical equivalent of coin flipping. This proves that standard neuron weights get lost in the baseline offset, while David is architecturally transparent to the constant component.

Universal Model Results:

Drift 50g: 100% accuracy (architectural immunity).
Noise (baseline): 85–90% accuracy.
Fault Tolerance: Retains functionality with random damage (pruning) of up to 30% of model weights.

Even without fine-tuning, David outperforms classic convolutional networks of similar size, which are susceptible to signal offsets.

Experiment 2: Dropout

This table demonstrates how the Large-sized model handles random neuron deletion under ideal conditions versus extreme signal drift.

Death %	D_Ideal (%)	D_Drift50 (%)	G_Ideal (%)	G_Drift50 (%)
0% (Control)	98.8 ± 0.5	99.0 ± 0.4	98.6 ± 0.5	50.9 ± 1.6
15%	93.8 ± 11.6	94.0 ± 10.6	79.9 ± 21.6	50.9 ± 1.6
30%	77.8 ± 21.7	77.8 ± 22.4	63.2 ± 16.4	50.3 ± 10.5
50%	70.3 ± 19.7	70.4 ± 18.9	58.3 ± 14.1	51.8 ± 8.2

Experiment 3: Task-Specific Tuning

I selected heart rhythm recognition under conditions of extreme chaos. The noise level exceeds the useful signal by a factor of 3 (SNR < 0.5), combined with drift. The tuning did not alter the core algorithm; the code remained practically unchanged. Accuracy and Stability (Std Dev) were measured over 20 independent runs.

Scenario	David V5 (Nano)	Goliath (Nano)	Goliath (Large)
Parameters (weights)	~380	~380	~40,000
1. Ideal (Clean)	100.0 ± 0.0	100.0 ± 0.1	100.0 ± 0.1
2. Drift 50g (Drift)	100.0 ± 0.0	49.1 ± 2.6	49.1 ± 2.7
3. Noise 3x (Noise)	96.6 ± 3.2	91.3 ± 2.1	96.7 ± 0.9
4. Total Chaos (Noise+Drift)	96.8 ± 2.8	48.6 ± 1.9	49.2 ± 2.3

Parametric Parity (Row 3): David (380 parameters) catches up to Goliath Large (40,000 parameters) under noisy conditions.
CNN Blind Spot (Rows 2 and 4): Standard convolutions handle drift much worse. Even the large model (Goliath Large) shows 49.2%, which is essentially guessing. For the CNN, the baseline offset is noise that it cannot filter without external preprocessing.
David's Stability: 100% accuracy on drift and 96.8% on "noise + drift" proves that the algorithm functions as a physical filter embedded within the neural network weights.

Example

The new architecture implemented in a 380-parameter micro-network for ECG analysis.

Task: Diagnosing ECG deviations from the norm. Datasets: 1. Normal rhythm, 2. Arrhythmia, 3. Tachycardia (Standard datasets for these tasks were used).

Model Output:

logs

(venv) C:\Users\admin\Desktop\PGHM>bio_12_heart_test.py

>>> VORTEX V5 REPORT: HEALTHY

TIME | SHAPE | LOCAL CV | FINAL VERDICT

-----------------------------------------------------------------

0085 | NORMAL | --- | ❤️ HEALTHY

0376 | NORMAL | --- | ❤️ HEALTHY

0670 | NORMAL | --- | ❤️ HEALTHY

0954 | NORMAL | 0.01 | ❤️ HEALTHY

1239 | NORMAL | 0.01 | ❤️ HEALTHY

1523 | NORMAL | 0.01 | ❤️ HEALTHY

1817 | NORMAL | 0.02 | ❤️ HEALTHY

2052 | NORMAL | 0.08 | ❤️ HEALTHY

>>> VORTEX V5 REPORT: PVC/ARRHYTHMIA

TIME | SHAPE | LOCAL CV | FINAL VERDICT

-----------------------------------------------------------------

0072 | ABNORMAL | --- | ⚠️ PVC (Anomaly)

0361 | NORMAL | --- | ❤️ HEALTHY

0462 | NORMAL | --- | ☁️ WEAK SIGNAL

0734 | NORMAL | 0.38 | 🌀 AFIB (Chaos)

0829 | NORMAL | 0.48 | 🌀 AFIB (Chaos)

1095 | NORMAL | 0.43 | 🌀 AFIB (Chaos)

1457 | NORMAL | 0.48 | 🌀 AFIB (Chaos)

1839 | NORMAL | 0.37 | 🌀 AFIB (Chaos)

AFIB/TACHY

TIME | SHAPE | LOCAL CV | FINAL VERDICT

-----------------------------------------------------------------

0168 | NORMAL | --- | ❤️ HEALTHY

0423 | NORMAL | --- | ❤️ HEALTHY

0695 | NORMAL | --- | ❤️ HEALTHY

0914 | NORMAL | 0.09 | ❤️ HEALTHY

1197 | NORMAL | 0.09 | ❤️ HEALTHY

1431 | NORMAL | 0.09 | ❤️ HEALTHY

1629 | NORMAL | 0.13 | ❤️ HEALTHY

1974 | NORMAL | 0.21 | 🌀 AFIB (Chaos)

Technical Specifications

At the current stage of research, the architecture is optimized for 1D signals (time series). The Nano-class model (~380 parameters) is sensitive to high-frequency noise if it overlaps with the useful signal spectrum. In such cases, a transition to the Micro or Medium tier is required.

David Nano: Occupies less than 2 KB of RAM and runs on the simplest controllers, such as the ARM Cortex-M0.
Computations: Linear computations. Strictly integer-based operations (int-only). No need for a hardware Floating Point Unit (FPU).
Training Efficiency: The model required only 300 iterations on 64 examples to reach a 96% accuracy plateau.
Resilience: The sensor can operate under conditions of chip degradation.

Porting this new architecture directly into the sensor offers hardware manufacturers the following advantages:

Cost Reduction: Enables the use of low-cost sensors.
Energy Efficiency: Due to integer-only operations and a low parameter count, energy consumption is reduced by orders of magnitude compared to classic DSP algorithms.
A New Class of Devices: The ability to create sensors that output high-quality analysis rather than just raw data.

This is extremely useful for deployment on the most constrained microcontrollers, wearable electronics, Industrial IoT (IIoT), and in harsh environments.

Collaboration

All experiments were conducted on synthetic data. If anyone has the opportunity to provide real sensor readings from similar domains for validation, I would be very grateful.

In general, I am actively looking for partners for further research and the implementation of this new architecture into real-world projects.

Kamil Gadeev Telegram

P.S. The article presents real experimental data based on synthetic datasets. The probability that real-world trials might reveal critical issues in the algorithm is non-zero. And that is exactly what makes it interesting.