在apache access log里看到很多一搜(现在好像叫“神马搜索”)的爬虫:
貌似完全不理会robots.txt啊:
User-agent: * Disallow: Crawl-delay: 60
果断在tomcat的server.xml里禁掉IP段:
<Context ...> <!-- 220.181.108.*, 123.125.71.*: baidu spider, 106.11.15*.*: yisou --> <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="" deny="106.11.15\d.\d+, 220.181.108.\d+, 123.125.71.\d+"/> </Context>
看看效果:
参考资料:
Tomcat doc - Remote Address Filter
欢迎转载
请保留原始链接:https://bjzhanghao.com/p/920
请保留原始链接:https://bjzhanghao.com/p/920