Resolving “Out of Memory” (OOM) killer events on Linux servers requires a systematic approach to identify the cause and implement appropriate solutions. Here are the steps and strategies to address OOM issues:
1. Analyze Logs and Identify the Cause
- Check System Logs:
Examine the/var/log/messagesor/var/log/syslogfile for OOM-related entries. Search for “oom-killer” or “Out of memory” messages to identify which process was killed.
grep -i "oom-killer" /var/log/syslog -
Use dmesg:
Rundmesg | grep -i "oom"to get recent OOM-related events. -
Monitor Resource Usage:
Use tools liketop,htop, orvmstatto identify processes consuming excessive memory.
2. Optimize Memory Usage
- Adjust Application Configuration:
If a specific application is causing the OOM, review its resource requirements and configuration settings. For example: - Reduce memory limits for caching.
-
Optimize queries or workloads.
-
Enable Swap Space:
If the server runs out of physical RAM, adding or increasing swap space can help:
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Add the swap file to/etc/fstabfor persistence:
/swapfile swap swap defaults 0 0
3. Configure OOM Killer Behavior
-
Adjust
oom_scoreandoom_score_adj:
Lower the OOM priority for critical processes by changing theiroom_score_adjvalue:
echo -1000 > /proc/<pid>/oom_score_adj
For non-critical processes, increase their score to make them more likely candidates for termination. -
Use cgroups:
Configure memory limits using control groups (cgroups) to prevent a single process from consuming all memory:
cgcreate -g memory:/mygroup
echo 1G > /sys/fs/cgroup/memory/mygroup/memory.limit_in_bytes
cgexec -g memory:/mygroup your_command
4. Upgrade Hardware
If the server consistently runs out of memory despite optimizations, consider upgrading hardware:
– Add More RAM: Increase physical memory to handle larger workloads.
– Use Faster Storage: For swap space, use SSDs instead of HDDs for better performance.
5. Monitor and Scale
-
Implement Monitoring Tools:
Use tools likePrometheus,Grafana, orNagiosto track memory usage trends and set up alerts for high utilization. -
Scale Infrastructure:
If the workload exceeds the current server’s capacity, consider scaling: - Horizontal Scaling: Add more servers.
- Vertical Scaling: Upgrade the server with more powerful hardware.
6. Optimize Kernel Parameters
Adjust kernel settings to better manage memory usage:
– Modify vm.swappiness:
Lowering the swappiness value reduces the tendency to use swap space:
echo 10 > /proc/sys/vm/swappiness
– Enable Memory Overcommit:
If safe, allow memory overcommit by adjusting vm.overcommit_memory:
echo 1 > /proc/sys/vm/overcommit_memory
7. Use Memory-Limited Containers
If you’re using Kubernetes or Docker:
– Set Memory Limits:
Define memory requests and limits for containers to prevent them from consuming excessive resources:
yaml
resources:
limits:
memory: "1Gi"
requests:
memory: "512Mi"
– Use Horizontal Pod Autoscaling (HPA):
Scale pods based on resource utilization.
8. Investigate and Optimize Code
If the OOM issue is due to your application:
– Fix Memory Leaks: Investigate the application for memory leaks and optimize its code.
– Profile Memory Usage: Use tools like valgrind, heaptrack, or gperftools to analyze memory allocation.
9. Consider GPU Memory (if applicable)
If the issue is related to GPU memory (e.g., in AI workloads):
– Optimize GPU Workloads: Ensure efficient memory usage in frameworks like TensorFlow or PyTorch.
– Use Mixed Precision: Reduce memory consumption by using mixed-precision computations.
– Monitor GPU Utilization: Use tools like nvidia-smi to monitor GPU memory usage.
10. Reboot (Last Resort)
If OOM events persist and memory is not reclaimable, rebooting the server may be necessary as a temporary solution.
By implementing these strategies, you can reduce the likelihood of OOM killer events and optimize memory usage on your Linux servers.

Ali YAZICI is a Senior IT Infrastructure Manager with 15+ years of enterprise experience. While a recognized expert in datacenter architecture, multi-cloud environments, storage, and advanced data protection and Commvault automation , his current focus is on next-generation datacenter technologies, including NVIDIA GPU architecture, high-performance server virtualization, and implementing AI-driven tools. He shares his practical, hands-on experience and combination of his personal field notes and “Expert-Driven AI.” he use AI tools as an assistant to structure drafts, which he then heavily edit, fact-check, and infuse with my own practical experience, original screenshots , and “in-the-trenches” insights that only a human expert can provide.
If you found this content valuable, [support this ad-free work with a coffee]. Connect with him on [LinkedIn].






