互联网上有一种说法,网上大约50%的流量都是爬虫的贡献,包括所说善意的爬虫如各大搜索引擎,(其实有些搜索引擎光爬不怎么收录也让人烦),还有些恶性爬虫不会带来流量,还因为大量的抓取请求,造成主机的CPU和带宽资源浪费,所以需要对其屏蔽。
宝塔面板屏蔽不受欢迎爬虫的方法:
一、在/www/server/nginx/conf目录下新建空白PHP文件kill_bot.conf,放入以下代码:
#禁止不受欢迎的搜索蜘蛛抓取
if ($http_user_agent ~* "CheckMarkNetwork|Synapse|Nimbostratus-Bot|Dark|scraper|LMAO|Hakai|Gemini|Wappalyzer|masscan|crawler4j|Mappy|Center|eright|aiohttp|MauiBot|Crawler|researchscan|Dispatch|AlphaBot|Census|ips-agent|NetcraftSurveyAgent|ToutiaoSpider|EasyHttp|Iframely|sysscan|fasthttp|muhstik|DeuSu|mstshash|HTTP_Request|ExtLinksBot|package|SafeDNSBot|CPython|SiteExplorer|SSH|MegaIndex|BUbiNG|CCBot|NetTrack|Digincore|aiHitBot|SurdotlyBot|null|SemrushBot|Test|Copied|ltx71|Nmap|DotBot|AdsBot|InetURL|Pcore-HTTP|PocketParser|Wotbox|newspaper|DnyzBot|redback|PiplBot|SMTBot|WinHTTP|Auto Spider 1.0|GrabNet|TurnitinBot|Go-Ahead-Got-It|Download Demon|Go!Zilla|GetWeb!|GetRight|libwww-perl|Cliqzbot|MailChimp|SMTBot|Dataprovider|XoviBot|linkdexbot|SeznamBot|Qwantify|spbot|evc-batch|zgrab|Go-http-client|FeedDemon|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|YisouSpider|HttpClient|MJ12bot|EasouSpider|LinkpadBot|Ezooms") {
return 403;
break;
}
#禁止扫描工具客户端
if ($http_user_agent ~* "crawl|curb|git|Wtrace|Scrapy" ) {
return 403;
break;
}
注:return 403,指用户url访问Nginx源站,返回403错误码。以上爬虫可以增继续添加。
二、激活文件
进入宝塔 /网站/设置/配置文件,在 #SSL-START SSL相关配置上方空白行添加代码: include kill_bot.conf; 。(带分号)
宝塔面板中使用user_agent方法,屏蔽垃圾蜘蛛和网站扫描工具-明生保存后马上生效,可以用这些蜘蛛或工具扫描网站进行测试,会出现提示:403禁止访问。
注:这种方法只能屏蔽带user_agent的非匿名爬虫。
本文链接:http://78moban.cn/post/9600.html
版权声明:站内所有文章皆来自网络转载,只供模板演示使用,并无任何其它意义!