Pull to refresh

Development

Show first
Period
Level of difficulty

Activation Function Stress Test: GELU vs Tanh

Reading time8 min
Reach and readers1.1K

In modern neural networks, including Transformer-based LLMs, unbounded activation functions—ReLU and GELU—have become the standard. Their main advantage is good gradient flow and the rapid training of deep models.

However, in practice, a problem is observed: when dominant patterns or high-frequency noise appear in the input context (long dialogues, noisy data, repetitive or dominant tokens), models become unstable and prone to generation degradation and hallucinations.

In this article, I attempted to find out if the choice of activation function could be fundamentally linked to LLM hallucinations.

Read more