DREAM: Dynamic Recall and Elastic Adaptive Memory

Abstract

Contemporary neural network architectures—including Transformers, Mamba, and Long Short-Term Memory (LSTM) networks—are fundamentally characterized by functional parameter staticity following the completion of the training phase. The prevailing paradigm in machine learning presupposes a rigid separation between training and inference stages, thereby establishing a fundamental barrier to model deployment in non-stationary environments. In real-world operational scenarios—such as processing streaming audio under dynamically varying acoustic conditions, interpreting sensor data from autonomous systems operating under fluctuating illumination, or adapting to individual user behavioral patterns—the statistical characteristics of input signals may substantially and continuously deviate from the distribution of the original training dataset. Static parameters encode a "frozen" representation of the world, precluding the model's ability to adjust its internal logical structures in real time. This limitation inevitably leads to progressive degradation in predictive accuracy, accumulation of systematic bias, and an inability to respond adequately to novel contextual configurations without engaging in computationally expensive and temporally protracted retraining cycles.

In this work, we investigate an alternative paradigm for machine learning: the DREAM architecture (Dynamic Recall and Elastic Adaptive Memory), which overcomes these limitations through the integration of principles from Active Inference and Spike-Timing-Dependent Plasticity (STDP) directly into the inference loop. In the proposed model, the adaptation process ceases to be an external auxiliary procedure and instead becomes a fundamental, continuous property of the system itself. The key innovation is a hierarchical predictive coding mechanism that generates internal forecasts and continuously compares them against observed reality. The resulting prediction error signal is transformed into an informative modulatory impulse. This impulse functions as a dynamic controller, governing instantaneous low-rank restructuring of so-called "fast weights."

This approach enables the system to dynamically "recall" and recalibrate its filters in response to environmental changes (Dynamic Recall), while simultaneously preserving Elasticity—the capacity to adapt to local contextual variations without catastrophic interference with globally acquired, long-term knowledge. The utilization of low-rank update mechanisms ensures computational efficiency, allowing the model to learn "on the fly" without necessitating full-batch gradient descent across the entire network parameter space. Consequently, this framework opens a pathway toward the development of autonomous systems exhibiting high robustness and capacity for Lifelong Learning under conditions of uncertainty, while fully preserving accumulated knowledge and maintaining the structural integrity of the base architecture.

Keywords: predictive coding, synaptic plasticity, active inference, continual learning, adaptive inference, low-rank updates, liquid time-constants, non-stationary environments, edge AI, biologically-inspired neural architectures.

1. Introduction and Theoretical Foundations

Foundational neurobiological research over recent decades has consistently indicated that the biological brain does not function as a passive recipient of sensory information, but rather operates as a sophisticated predictive machine. According to the theory of predictive processing (Friston, 2010), cognitive systems continuously generate internal generative models, striving to minimize the divergence—termed prediction error—between anticipated and actual sensory input. A critical aspect of this living system is the dynamic modulation of synaptic connections: in response to error signals, neural pathways undergo reconfiguration on millisecond timescales, enabling instantaneous adaptation to environmental perturbations. In classical machine learning, this process is virtually absent, as model weights are "frozen" following the completion of gradient-based optimization.

The DREAM architecture investigates the hypothesis that direct integration of plasticity mechanisms into the inference structure enables the overcoming of critical limitations inherent to traditional static models, thereby achieving flexibility comparable to biological systems:

Erasure of Boundaries Between Learning and Inference: In traditional state-of-the-art (SOTA) architectures, the model merely applies previously learned patterns. DREAM unifies the processes of perception and physical weight correction. Each prediction, upon encountering reality, generates an error signal that is immediately utilized for localized parameter adaptation. This capability is critical in scenarios exhibiting dataset shift: for instance, upon an abrupt change in room acoustics, the model does not merely produce a less accurate output, but actively restructures its filters, "absorbing" the new acoustic characteristics of the session in real time.
Hierarchical Memory Structure (Slow-Fast Weights): The model implements a two-component weight system, addressing the fundamental stability-plasticity dilemma. Basal ("slow") weights store robust, invariant dependencies acquired during pre-training. Operating in parallel is a plastic layer ("fast weights") responsible for short-term, elastic contextual adaptation. This enables the system, for example, to instantly adapt to a new speaker's accent without eroding general linguistic knowledge encoded in the slow weights.
Informational Homeostasis Mechanism: A primary challenge in online learning is the risk of overfitting to stochastic noise or outliers. DREAM addresses this through a Surprise Gate mechanism. This regulatory component modulates plasticity: the system activates synaptic restructuring only when the prediction error contains structural novelty ("surprise"), rather than mere statistical fluctuation. Thus, homeostasis is maintained—internal parameter stability is preserved while retaining high sensitivity to meaningful environmental changes.
Energy Efficiency and Computational Adaptivity: Static models expend identical computational resources processing both trivial and complex sequences. DREAM introduces the concept of an adaptive uncertainty threshold: if the incoming signal is fully predictable (prediction error near zero), the plasticity module remains inactive. Computationally intensive weight-update operations are triggered only during moments of informational deficit. This emulates biological attentional mechanisms and opens prospects for developing Edge AI devices with extremely low power consumption coupled with high intellectual autonomy.

2. DREAM Architecture: Four Functional Blocks

Block 1: Predictive Coding (Perception Flow)

The functional core of the first block is grounded in the principle of top-down forecasting. The system actively generates hypotheses regarding the next state of the input vector $\mathbf{x}_t$ , utilizing the current hidden state $\mathbf{h}_t$ as a carrier of accumulated contextual information.

Prediction Generation:
$\hat{\mathbf{x}}_t = \tanh(\mathbf{W}_{\text{dec}} \mathbf{h}_t)$
Here, the matrix $\mathbf{W}_{\text{dec}}$ functions as a generative decoder, transforming abstract latent features back into the sensory input space. The use of the $\tanh$ activation function ensures bounded output values and promotes dynamical stability within the recurrent loop.
Informational Signal Formation (Innovation): The residual $\boldsymbol{\varepsilon}_t = \mathbf{x}_t - \hat{\mathbf{x}}_t$ is interpreted as innovation—the pure informational component that the current world model failed to explain. When prediction error is minimal, the system remains in a state of stable signal tracking. However, the emergence of significant divergence is interpreted as a deficit in the precision of internal parameters, thereby initiating an immediate adaptation process in subsequent blocks. Thus, within DREAM, prediction error serves not merely as a quality metric, but as a functional driver for the entire system.

Block 2: Surprise Gate

This block functions as an intelligent nonlinear filter, regulating the intensity of adaptation. It prevents model overfitting to stochastic fluctuations or transient artifacts.

Modulation Function (Surprise):
$s_t = \sigma\left(\beta \cdot (\|\boldsymbol{\varepsilon}_t\| - \theta_t)\right)$
The coefficient $s_t \in [0, 1]$ determines the degree of "gate opening" for plasticity. The parameter $\beta$ controls the steepness of the sigmoidal function, thereby setting the sensitivity threshold to anomalies.
Adaptive Homeostatic Threshold:
$\theta_t = \theta_0 + \gamma \cdot H(\mathbf{x}_{t-w:t})$
The activation threshold is dynamically adjusted based on the local signal entropy $H(\cdot)$ (e.g., background noise level). For instance, under high acoustic noise conditions, the threshold automatically increases, requiring a more powerful and structured error signal to trigger weight updates. This enables the system to maintain stability in adversarial environments, responding exclusively to significant structural changes in the data distribution.

Block 3: Fast Weights and STDP-Plasticity

This block is responsible for the operational physical restructuring of synaptic connections, emulating short-term plasticity in biological neurons.

Low-Rank Update Dynamics:
$\Delta \mathbf{W}^{\text{fast}}_t = \eta \cdot s_t \cdot \left( \boldsymbol{\varepsilon}_t \mathbf{h}_t^\top \right) \mathbf{V}$
1.a. The term $\boldsymbol{\varepsilon}_t \mathbf{h}_t^\top$ implements a local Hebbian rule, correlating current neural activity with the sign and magnitude of prediction error.
1.b. The fixed orthogonal matrix $\mathbf{V} \in \mathbb{R}^{d \times r}$ (with $r \ll d$ ) enables projection of high-dimensional error into a compact feature subspace, substantially reducing the computational complexity of the update operation.
Elasticity and Recuperation: The coefficient $\lambda \in (0, 1)$ ensures elastic memory behavior. Upon cessation of the "surprise" signal ( $s_t \rightarrow 0$ ), fast weights begin to smoothly revert toward their basal state $\mathbf{W}^{\text{fast}}_0$ . This guarantees that temporary contextual adaptations do not crystallize into irreversible distortions of the global model, thereby resolving the parameter drift problem.

Block 4: Liquid Time-Constants (LTC Update)

The Liquid Time-Constants (LTC) mechanism endows the system with dynamic inertia, varying the rate of information integration within the hidden state.

Dynamic Time Constant:
$\tau_t = \tau_{\text{max}} - (\tau_{\text{max}} - \tau_{\text{min}}) \cdot s_t$
The time constant $\tau_t$ governs the system's "memory" of prior states.
Adaptive Effect: During detection of sharp anomalies ( $s_t \approx 1$ ), the time constant decreases, corresponding to a transition of the system into a "liquid" state. This enables the model to instantly discard outdated context and capture the new signal structure. Under stable conditions ( $s_t \approx 0$ ), $\tau_t$ increases, transforming the cell into a conservative integrator that smoothly filters noise and maintains context over extended intervals. Such synergistic block interaction provides DREAM with a unique balance between reactivity and robustness.

3. Mathematical Framework and Stabilization Mechanisms

To ensure numerical stability of the architecture under conditions of continuous in-situ weight updating (i.e., directly during inference), DREAM incorporates a suite of mathematical safeguards against divergence and loss of representational expressivity:

Low-Rank Factorization of Adaptive Weights: Instead of modifying the full synaptic matrix $\mathbf{W} \in \mathbb{R}^{d \times d}$ , the system updates only a low-rank representation via decomposition $\mathbf{W}^{\text{fast}} = \mathbf{U}\mathbf{V}^\top$ . In the present implementation, rank $r = 16$ is employed.
1.a. Advantages: This substantially mitigates the risk of feature space collapse, as modifications are constrained to the subspace defined by the orthogonal basis $\mathbf{V}$ . Furthermore, factorization accelerates convergence of online adaptation by reducing the number of degrees of freedom, thereby preventing overfitting to local input noise.
1.b. Role of Matrix V: The fixed matrix $\mathbf{V}$ serves as a set of stable basis filters, while the trainable matrix $\mathbf{U}$ dynamically adjusts the gain coefficients and saliency of these filters.
Synaptic Homeostasis and Weight Projection: Continuous application of Hebbian plasticity rules (STDP) potentially leads to unbounded growth of weight coefficients, resulting in activation saturation and loss of sensitivity to input variations.
2.a. Control Mechanism: DREAM employs a hard constraint on the L2-norm of the fast weight matrix $\|\mathbf{W}^{\text{fast}}\|_F$ relative to a target value $\kappa$ .
2.b. Implementation: If the current norm exceeds the threshold, rescaling is performed: $\mathbf{W}^{\text{fast}} \leftarrow \kappa \cdot \mathbf{W}^{\text{fast}} / \|\mathbf{W}^{\text{fast}}\|_F$ . This homeostatic process guarantees that plasticity remains within the dynamic range of activation functions, preserving neuronal capacity for pattern differentiation even after prolonged adaptation sessions.
Adaptive Statistical Smoothing and Innovation Filtering: For correct operation of the Surprise Gate, the system requires a reference point for assessing the "normality" of the current signal.
3.a. Exponential Smoothing: The running mean $\mu_t$ and variance $\sigma^2_t$ of prediction error are updated via Exponential Moving Average (EMA) with decay coefficient $\alpha$ :
$\mu_t = \alpha \mu_{t-1} + (1-\alpha)\|\boldsymbol{\varepsilon}_t\|, \quad \sigma^2_t = \alpha \sigma^2_{t-1} + (1-\alpha)(\|\boldsymbol{\varepsilon}_t\| - \mu_t)^2$
3.b. Consequences: This enables computation of relative surprise magnitude against the backdrop of current signal power. For example, in a noisy environment, the mean error increases, and the system automatically desensitizes the Surprise Gate to avoid interpreting persistent noise as a trigger for weight updates. Such statistical filtering ensures robustness to non-stationary processes, enabling the model to effectively distinguish structural changes from background fluctuations.

4. Experimental Results and Analysis

Verification of the DREAM architecture was conducted on tasks involving high-dimensional non-stationary signal processing, specifically audio streams. The selection of audio as a test environment is motivated by its high nonlinearity and dynamic variability of local statistics (timbre, pitch, harmonics), which presents a critical challenge for static models.

Test 1: Real-Time Signal Reconstruction

In this experiment, we evaluated the model's capacity to reconstruct the original mel-spectrogram of audio (LJ Speech dataset) in inference mode, without prior speaker-specific fine-tuning. On this dataset, DREAM demonstrates a 40–70× reduction in reconstruction error compared to baseline models upon completion of sequence processing.

Model	Parameters	Initial Loss	Final Loss	Improvement
DREAM	82K	0.9298	0.0010	99.9%
LSTM	893K	0.7889	0.0478	93.9%
Transformer	551K	0.9416	0.0696	92.6%

Analysis of Efficiency and Convergence:
Results indicate high efficacy of online plasticity mechanisms. Notably, DREAM possesses an order of magnitude fewer trainable parameters (82K vs. 893K for LSTM), yet achieves substantially more precise approximation. This is explained by the fact that static models must employ "averaged" weights optimized across the entire dataset distribution, whereas DREAM utilizes the initial frames of a sequence to calibrate its fast weights to the specific spectral characteristics of the current signal. The 47× difference between DREAM's Final Loss and that of LSTM confirms that the plastic layer successfully captures residual information (innovation) that the static component of the architecture disregards as noise.

Test 2: Context Switch Adaptation (Speaker Switch)

One of the most illustrative evaluations was the "context switch" test, wherein the speaker in the audio stream changes instantaneously. This creates an abrupt discontinuity in signal statistics, triggering a sharp surge in prediction error. The experiment demonstrated that DREAM responds to such anomalies instantaneously—within a single discrete time step (sampling interval).

Dynamics of Surprise Spike: At the moment of voice switching, the prediction error $\|\boldsymbol{\varepsilon}_t\|$ instantaneously exceeds the adaptive threshold $\theta_t$ , resulting in a Surprise Spike (amplitude 0.119). This impulse serves as a trigger for immediate inertia reset in the LTC block (reduction of time constant) and activation of intensive factor matrix $\mathbf{U}$ updates.
Behavioral Comparison: Static models (LSTM, Transformer) exhibit sustained error growth in this scenario, persisting throughout the audio segment, as their weights cannot be modified. In contrast, DREAM performs a "functional reset" and, within milliseconds, retunes its decoding filters to the new timbre, restoring accuracy without external fine-tuning commands.
Implications for Real-Time Systems: Such adaptation speed enables deployment of DREAM in critical systems where latency for traditional fine-tuning is unacceptable—for instance, in adaptive control interfaces or autonomous agents operating in unpredictable environments. The model effectively performs "self-calibration" during operation, representing a fundamental advantage of the active inference paradigm.

5. Memory Consolidation (Sleep Mechanism)

The consolidation process emulates the biologically grounded transfer of significant adaptive modifications from short-term structures ("fast weights") into the long-term architectural base (long-term target or "slow weights"). This mechanism addresses the classical stability-plasticity dilemma, enabling the system to accumulate experience without eroding previously acquired invariants.

Selective Consolidation Criterion: The knowledge transfer process is not continuous; it is triggered only upon satisfaction of an informational significance criterion. The mechanism activates when the mean accumulated "surprise" over a defined period $\bar{s}_{t-T:t}$ exceeds a threshold $\xi$ . This approach guarantees that the long-term parameter structure ( $\mathbf{W}^{\text{slow}}$ ) is updated only in response to statistically persistent environmental changes, whereas random fluctuations or transient noise remain confined to the "fast" elastic memory and gradually decay.
Mathematical Transfer Dynamics:
$\mathbf{W}^{\text{slow}}_{t+1} = \mathbf{W}^{\text{slow}}_t + \rho \cdot \bar{s}_{t-T:t} \cdot \left( \mathbf{W}^{\text{fast}}_t - \mathbf{W}^{\text{slow}}_t \right)$
Here, the coefficient $\rho$ regulates the "sleep learning" rate or consolidation depth. The multiplier $\bar{s}_{t-T:t}$ ensures proportionality of updates: the higher the uncertainty (surprise) during inference, the more intensive the integration of new data into the target matrix. Thus, $\mathbf{W}^{\text{slow}}$ functions as a dynamic attractor toward which the plastic matrix $\mathbf{W}^{\text{fast}}$ continually strives during periods of absent external stimulation.
Functional Consequences and Stabilization: Through the sleep mechanism, the system transforms temporary contextual adaptation (e.g., transient calibration to a specific microphone or accent) into robust knowledge. This prevents so-called parameter drift, which would inevitably arise from infinite accumulation of minor corrections in fast memory. Furthermore, this mechanism enables a recuperation effect: following "sleep," the model resumes inference with an updated basal configuration, thereby reducing initial prediction error upon subsequent encounters with similar contexts—emulating the formation of long-term cognitive skills.

6. Comparative Paradigm Analysis

Characteristic	Static-Parameter Models	Active Inference Models (DREAM)
Training Paradigm	Offline optimization on fixed datasets	Continuous online adaptation
Weight Parameters	Fixed coefficients (static)	Dynamically evolving structure
Role of Error	Penalization function (gradient) during training	Informational signal for inference correction
Context Memory	Attention windows or hidden vectors	Physical synaptic modification
Response to Anomalies	Error growth without corrective capacity	Activation of modulatory mechanisms ("surprise")
Computational Profile	Uniform resource expenditure	Adaptive, event-driven computation
Lifelong Learning	Requires explicit retraining cycles	Intrinsic, continuous capability

7. Discussion: Results, Limitations, and Interpretation

The presented results constitute a preliminary validation and confirm the potential of the proposed adaptive inference paradigm. Nevertheless, transitioning from experimental benchmarks to industrial deployment necessitates addressing several critical challenges.

7.1 Interpretation of Dynamic Learning

Experimental data indicate that DREAM effectively implements a form of inference-as-optimization. The high reconstruction accuracy (Final Loss 0.0010) demonstrates that low-rank updates successfully capture signal-specific correlations typically lost during statistical averaging in traditional networks. This corroborates the hypothesis that ultra-compact models (82K parameters) can compete with architectures orders of magnitude larger through dynamic contextual specialization.

7.2 Computational Capacity and Parallelism

The current implementation of synaptic plasticity algorithms (STDP) imposes significant performance constraints:

Sequential Update Bottleneck: Unlike static networks, where sequence processing can be partially parallelized (particularly in Transformer-like architectures), DREAM requires strictly sequential weight updates at each discrete time step. This creates a computational bottleneck on standard GPU/TPU hardware, which is not optimized for stepwise mutation of weight matrices.
Engineering Challenges: Development of custom CUDA kernels is required to accelerate low-rank STDP operations and minimize overhead associated with reading/writing weights to accelerator global memory.

7.3 Threshold Sensitivity and the "Edge of Chaos"

The efficacy of the Surprise Gate mechanism critically depends on the balance of hyperparameters: temperature ( $\beta$ ) and baseline threshold ( $\theta_0$ ).

Risks of Imbalance: Excessive sensitivity leads to "plasticity hallucinations," wherein the model attempts to learn chaotic noise, misinterpreting it as novel structure. Insufficient sensitivity reduces DREAM to a conventional static RNN, nullifying its adaptive advantages.
Mitigation Strategy: Development of meta-learning methods for automatic homeostatic threshold tuning, conditioned on data domain characteristics, is required.

7.4 Scalability Challenges (DREAMStack)

Investigating the applicability of these principles to deep hierarchical structures remains a priority.

Coordination Problem: In multi-layer stacks, plasticity at each layer must be coordinated with higher-level representations. Uncontrolled simultaneous restructuring across all layers may lead to loss of structural representational integrity.
Hierarchical Prediction: We propose employing a pyramidal structure wherein each layer predicts the activity of the layer below; however, mathematical stability of such a system at ultra-large parameter scales requires independent verification.

8. Conclusion

The DREAM architecture proposes an alternative paradigm for constructing intelligent systems, oriented toward achieving extreme adaptivity in dynamic and unpredictable environments. The synergy of predictive coding mechanisms and regulated synaptic plasticity enables transcending the limitations of static inference and advancing toward autonomous cognitive agents capable of continuous, lifelong learning. The proposed approach demonstrates that compact models can successfully compete with large-scale architectures through operational contextual specialization and efficient separation of structural information from background noise.

We regard this project not as a finalized product, but as an open research platform for investigating deep principles of neural network self-organization. Our priority objectives for the next phase include: enhancing stability of long-term memory consolidation algorithms; optimizing computational kernels to accelerate STDP operations; and scaling the architecture to multi-layer hierarchical stacks (DREAMStack). Successful realization of these plans will lay the foundation for a new generation of AI systems that do not merely process data, but evolve alongside it in real time—ensuring unprecedented levels of autonomy and reliability in mission-critical applications.

9. Acknowledgements

The authors express sincere and profound gratitude to the Manifestro Team, whose dedicated work on the algorithmic core and software implementation enabled the conduct of this research. We are indebted to the engineers and researchers for extensive discussions, development of experimental testbeds, and meticulous analysis of primary benchmark results on non-stationary audio streams.

Special appreciation is extended to the international academic community for advancing fundamental theories in Predictive Coding and Liquid Neural Networks. We particularly highlight the work of laboratories investigating active inference and differentiable plasticity, whose publications provided the conceptual foundation for the DREAM architecture. Your insights regarding "the brain as a predictive machine" and dynamic time constants have fundamentally shaped our research trajectory.

We also thank the creators of open datasets, such as LJ Speech and Mozilla Common Voice, which served as indispensable tools for hypothesis verification under conditions approximating real-world deployment. Finally, we are grateful to all reviewers and participants in open discussions for valuable critiques that helped refine the mathematical apparatus of homeostatic regulation within the Surprise Gate. This work would not have been possible without collective intellectual contribution and a supportive research ecosystem.

10. References

Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87.
Hasani, R., Lechner, M., Amini, A., Rus, D., & Grosu, R. (2021). Liquid Time-constant Networks. Proceedings of the AAAI Conference on Artificial Intelligence.
Friston, K. (2010). The free-energy principle: a rough guide to the brain? Nature Reviews Neuroscience, 11(2), 127–138.
Miconi, T., Stanley, K. O., & Abboud, J. (2018). Differentiable Plasticity: Training Plastic Neural Networks with Backpropagation. International Conference on Machine Learning (ICML).
Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems (NeurIPS).
Gu, A., & Dao, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752.
Ba, J., Hinton, G. E., Mnih, V., Leibo, J. Z., & Ionescu, C. (2016). Using Fast Weights to Attend to the Recent Past. Advances in Neural Information Processing Systems (NeurIPS).
Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. Wiley.
Millidge, B., Tschantz, A., & Buckley, C. L. (2020). Whence the expectation? Predictive coding, active inference, and the free energy principle. Neural Computation, 32(7), 1275–1315.
Zenke, F., Agnes, E. J., & Gerstner, W. (2015). Diverse synaptic plasticity mechanisms orchestrated to form and retrieve memories in spiking neural networks. Nature Communications, 6, 6922.