如何规避抓取别家数据的潜在风险?

我对星球大战的喜爱始于高中,那阵恰逢周五晚电影频道佳片有约安排播出。我看后便再也无法忘却天行者阿纳·金在塔图因的血色残阳下疾驰的身影。他悲剧性的黑化过程与对力量的迫切渴望成了我叛逆青春的注脚。

然而回到现实,一路创业跌跌撞撞走来,遇到的苦难问题不计其数,纯粹的「力量」似乎并不能解决一切问题,你必须提升「智力」,你的团队必须学会「敏捷」,尤其是在我朝这片土地上。

何况尘世间的事并不都像科幻电影里那样,能清晰地界定出原力的黑暗面与光明面,你从中选择一方站队。很多时候,你必须在灰色地带游走,怀揣光明之心,动用黑暗能量。这过程中,稍有不慎脚下踏空,或将万劫不复永堕业火,但也总有高人得以竹杖芒鞋凌波微步,火中取粟之后全身而退。

《增长黑客》里提到一些数据抓取解决冷启动的案例。其实在互联网行业里,依靠抓取来获取数据的做法,并不罕见(如果你觉得很新鲜,那只能说道行太浅)。但我敢于将其写出来,这当中自然会遭到一些质疑和非议,参加的大大小小活动也总有人提出如何规避风险的问题。

在此我就「如何规避抓取别家数据的潜在风险?」的问题在此简单阐述我的观点:

第一,技术是中立的,本身不带有任何倾向性。如何你觉得从别的平台「右键另存为」来获得素材运用到自己的产品里不构成任何问题,那么写脚本批量抓取没有改变这件事的性质,它只是帮你将原本需要人工操作三天的事情简化到三个小时内自动完成。

第二,平台各有自己的政策,不同平台的抓取策略不同。有的平台明确在自己的用户协议里声明,「本平台只承担数据存储的作用,内容版权隶属原作者所有」,这时候,你完全可以征得原作者同意之后,以你觉得方便的姿势去获取。至于如何快速大量获得原作者同意?通过写脚本批量发私信给目标群体就可以,这个行为的性质参见第一条。

第三,分清学习目的与商用目的。我专门查询过版权法等相关法律法规,其中对「出于学习目的」是有专门的分类讨论的。如果将抓取来的数据用于产品上线前内部的测试、参考、决策依据,我认为是属于这一范围的(当然还是得具体问题具体分析)。至于正式上线成为一款商业产品,那么还是小心为妙。

最后,你有没有越过界其实你自己心里清楚。

4,435 thoughts on “如何规避抓取别家数据的潜在风险?”

  1. It’s actually very complex in this full of activity life to listen news on TV, so I only use the web for that reason, and take the latest news.

  2. You have to waste less time to seek out your required matter on internet, as nowadays the searching strategies of search engines are nice. That’s why I fount this article here.

  3. Hello, for SEO real contents are genuinely needed, if you just copy and paste then you can not rated in search engines.

  4. Okay, you are right buddy, regularly updating website is truly necessary in support of Search engine optimization. Good discussion keeps it up.

  5. I am happy to watch this you tube video at this web page, thus right now I am also going to add all my video clips at YouTube website.

  6. Hi there, yes brother there are of course many blogging websites, except I suggest you to use Google’s without charge blogging services.

  7. This paragraph is related to web programming is actually pleasant in support of me because I am web programmer. Thanks for sharing keep it up.

  8. This post about Search engine optimisation is truly fastidious one, and the back links are really very valuable to market your web page, its also referred to as Search engine optimisation.

  9. Yes you are correct, genuinely Personal home page is a open source and its help we can take free from any forum or website since it happens here at this site.

  10. I also like Flash, but I am not a good designer to design a Flash, however I have software program by witch a Flash is automatically created and no more to work.

  11. It’s genuinely very complex in this active life to listen news on Television, thus I simply use the web for that purpose, and obtain the most up-to-date information.

  12. I think a visualized presentation can be enhanced then simply a effortless text, if information are defined in pictures one can easily be familiar with these.

  13. Why YouTube video clips are shared everywhere? I think one cause is that these are straightforward to obtain embed script and paste that code anyplace you want.

  14. I think a visualized presentation can be improved then just a easy text, if stuff are defined in pictures one can without difficulty understand these.

  15. This piece of writing is good and fruitful in support of all new Personal home pages related web programmers; they must read it and perform the practice.

  16. That’s in fact a fastidious YouTube movie pointed out in this piece of writing concerning how to write a piece of writing, therefore i got clear idea from here.

  17. It’s awesome to pay a quick visit this website and reading the views of all colleagues about this article, while I am also keen of getting knowledge.

  18. My family every time say that I am wasting my time here at net, however I know I am getting familiarity all the time by reading such good articles.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax