Monitoring Apache Logs to Prevent Abusive Practices: Unterschied zwischen den Versionen

Aus MattWiki
Die Seite wurde neu angelegt: „This article shows my current efforts how to analyze Apache Logs and perhaps in the future how to prevent the offending and abusive practices of private servers. == Introduction == Lately I noticed kind of high CPU workloads on this very wiki server and I wondered why this was. There are regular CPU workloads of 50 % even up to 80 or 100%. <code>htop</code> shows, that there is a lot of CPU utilization by <code>mariadbd</code> process, which seems to b…“
(kein Unterschied)

Version vom 20. September 2025, 13:25 Uhr

This article shows my current efforts how to analyze Apache Logs and perhaps in the future how to prevent the offending and abusive practices of private servers.

Introduction

Lately I noticed kind of high CPU workloads on this very wiki server and I wondered why this was.

There are regular CPU workloads of 50 % even up to 80 or 100%.

htop shows, that there is a lot of CPU utilization by mariadbd process, which seems to be related to the MariaDB Database.

After checking my Apache logs I found a lof of access from different crawlers and often recurring IPs and IP-Ranges.

The logs showed, that there are hundreds or even thousands of daily access logs for different ressources.

When ranking them over the last 60 days based on my Apache logs the worst offenders judging by IP address or IP Address Range according to various IP Databases seemed to be:

  • Microsoft
  • Meta / Facebook
  • OpenAI

and couple of different IPs:

  • 65.109.100.155
  • 185.177.72.54

among others.

After googling I found the following entry on Ycombinator https://news.ycombinator.com/item?id=44971487 referring to an article on The Register titled:

"AI crawlers and fetchers are blowing up websites, with Meta and OpenAI the worst offenders" https://www.theregister.com/2025/08/21/ai_crawler_traffic/

How to analyze Apache Logs

awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -20
zcat access.log.*.gz | awk '{print $1}' | sort | uniq -c | sort -nr | head -20

Further reading: https://www.tecmint.com/find-top-ip-address-accessing-apache-web-server/

Databases for IP Adresses

https://www.ipqualityscore.com/

https://www.abuseipdb.com/