Development

ArticlesPostsNewsHubsAuthors

Kamil_GR 15 hours ago

Activation Function Stress Test: GELU vs Tanh

8 min

1.1K

Machine learning * Artificial Intelligence

Opinion

In modern neural networks, including Transformer-based LLMs, unbounded activation functions—ReLU and GELU—have become the standard. Their main advantage is good gradient flow and the rapid training of deep models.

However, in practice, a problem is observed: when dominant patterns or high-frequency noise appear in the input context (long dialogues, noisy data, repetitive or dominant tokens), models become unstable and prone to generation degradation and hallucinations.

In this article, I attempted to find out if the choice of activation function could be fundamentally linked to LLM hallucinations.