78模板网分享cms建站教程,提供网站模板、网站插件、办公模板等模板教程免费学习,找模板教程就上78模板网!

.htaccess 阻止不良网络蜘蛛爬虫访问

修改 .htaccess 阻止不良爬虫访问

一些网络爬虫如 360 很霸道. 不管您服务器有少资源, 同时几百几千个线程并发爪取你的网站, 这样给服务器带来不少的压力。

APACHE 服务器可以通过在每个网站目录下添加一些规则来限制访问. 在根目录下加入以下规则就可以阻止大部分不良爬虫的访问 减少服务器的压力。

# 开始 - 阻止不良爬虫访问

SetEnvIfNoCase User-Agent "Abonti|aggregator|AhrefsBot|asterias|BDCbot|BLEXBot|BuiltBotTough|Bullseye|BunnySlippers|ca-crawler|CCBot|Cegbfeieh|CheeseBot|CherryPicker|CopyRightCheck|cosmos|Crescent|discobot|DittoSpyder|DOC|DotBot|Download Ninja|EasouSpider|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|Exabot|ExtractorPro|Fasterfox|FeedBooster|Foobot|Genieo|grub-client|Harvest|hloader|httplib|HTTrack|humanlinks|ieautodiscovery|InfoNaviRobot|IstellaBot|Java/1.|JennyBot|k2spider|Kenjin Spider|Keyword Density/0.9|larbin|LexiBot|libWeb|libwww|LinkextractorPro|linko|LinkScan/8.1a Unix|LinkWalker|LNSpiderguy|lwp-trivial|magpie|Mata Hari|MaxPointCrawler|MegaIndex|Microsoft URL Control|MIIxpc|Mippin|Missigua Locator|Mister PiX|MJ12bot|moget|MSIECrawler|NetAnts|NICErsPRO|Niki-Bot|NPBot|Nutch|Offline Explorer|Openfind|panscient.com|PHP/5.{|ProPowerBot/2.14|ProWebWalker|Python-urllib|QueryN Metasearch|RepoMonkey|RMA|SemrushBot|SeznamBot|SISTRIX|sitecheck.Internetseer.com|SiteSnagger|SnapPreviewBot|Sogou|SpankBot|spanner|spbot|Spinn3r|suzuran|Szukacz/1.4|Teleport|Telesoft|The Intraformant|TheNomad|TightTwatBot|Titan|toCrawl/UrlDispatcher|True_Robot|turingos|TurnitinBot|UbiCrawler|UnisterBot|URLy Warning|VCI|WBSearchBot|Web Downloader/6.9|Web Image Collector|WebAuto|WebBandit|WebCopier|WebEnhancer|WebmasterWorldForumBot|WebReaper|WebSauger|Website Quester|Webster Pro|WebStripper|WebZip|Wotbox|wsr-agent|WWW-Collector-E|Xenu|yandex|Zao|Zeus|ZyBORG|coccoc|Incutio|lmspider|memoryBot|SemrushBot|serf|Unknown|uptime files" bad_bot

SetEnvIfNoCase Referer "Abonti|aggregator|AhrefsBot|asterias|BDCbot|BLEXBot|BuiltBotTough|Bullseye|BunnySlippers|ca-crawler|CCBot|Cegbfeieh|CheeseBot|CherryPicker|CopyRightCheck|cosmos|Crescent|discobot|DittoSpyder|DOC|DotBot|Download Ninja|EasouSpider|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|Exabot|ExtractorPro|Fasterfox|FeedBooster|Foobot|Genieo|grub-client|Harvest|hloader|httplib|HTTrack|humanlinks|ieautodiscovery|InfoNaviRobot|IstellaBot|Java/1.|JennyBot|k2spider|Kenjin Spider|Keyword Density/0.9|larbin|LexiBot|libWeb|libwww|LinkextractorPro|linko|LinkScan/8.1a Unix|LinkWalker|LNSpiderguy|lwp-trivial|magpie|Mata Hari|MaxPointCrawler|MegaIndex|Microsoft URL Control|MIIxpc|Mippin|Missigua Locator|Mister PiX|MJ12bot|moget|MSIECrawler|NetAnts|NICErsPRO|Niki-Bot|NPBot|Nutch|Offline Explorer|Openfind|panscient.com|PHP/5.{|ProPowerBot/2.14|ProWebWalker|Python-urllib|QueryN Metasearch|RepoMonkey|RMA|SemrushBot|SeznamBot|SISTRIX|sitecheck.Internetseer.com|SiteSnagger|SnapPreviewBot|Sogou|SpankBot|spanner|spbot|Spinn3r|suzuran|Szukacz/1.4|Teleport|Telesoft|The Intraformant|TheNomad|TightTwatBot|Titan|toCrawl/UrlDispatcher|True_Robot|turingos|TurnitinBot|UbiCrawler|UnisterBot|URLy Warning|VCI|WBSearchBot|Web Downloader/6.9|Web Image Collector|WebAuto|WebBandit|WebCopier|WebEnhancer|WebmasterWorldForumBot|WebReaper|WebSauger|Website Quester|Webster Pro|WebStripper|WebZip|Wotbox|wsr-agent|WWW-Collector-E|Xenu|yandex|Zao|Zeus|ZyBORG|coccoc|Incutio|lmspider|memoryBot|SemrushBot|serf|Unknown|uptime files" bad_bot

Deny from env=bad_bot

# 结束 - 阻止不良爬虫访问

我决定使用apache的.htacess文件来限制这两个爬虫。

在网站的根目录有个隐藏文件.htaccess 修改此文件 在</IfModule>上方行插入

SetEnvIfNoCase User-Agent “^Yisou|Sogou web spider” bad_bot

Deny from env=bad_bot

设置之后 如果 USER AGENT 是以上列表中的, 服务端就会给出一个 403 错误

403-forbidden-error 修改 .htaccess 阻止不良爬虫访问 

403-forbidden-error

本文链接:http://78moban.cn/post/10824.html

版权声明:站内所有文章皆来自网络转载,只供模板演示使用,并无任何其它意义!

联系技术
文章删除 友链合作 技术交流群
1050177837
公众号
公众号
公众号
返回顶部