Activation Function Stress Test: GELU vs Tanh
8 min
Opinion

In modern neural networks, including Transformer-based LLMs, unbounded activation functions—ReLU and GELU—have become the standard. Their main advantage is good gradient flow and the rapid training of deep models.
However, in practice, a problem is observed: when dominant patterns or high-frequency noise appear in the input context (long dialogues, noisy data, repetitive or dominant tokens), models become unstable and prone to generation degradation and hallucinations.
In this article, I attempted to find out if the choice of activation function could be fundamentally linked to LLM hallucinations.









