如何规避抓取别家数据的潜在风险?

我对星球大战的喜爱始于高中,那阵恰逢周五晚电影频道佳片有约安排播出。我看后便再也无法忘却天行者阿纳·金在塔图因的血色残阳下疾驰的身影。他悲剧性的黑化过程与对力量的迫切渴望成了我叛逆青春的注脚。

然而回到现实,一路创业跌跌撞撞走来,遇到的苦难问题不计其数,纯粹的「力量」似乎并不能解决一切问题,你必须提升「智力」,你的团队必须学会「敏捷」,尤其是在我朝这片土地上。

何况尘世间的事并不都像科幻电影里那样,能清晰地界定出原力的黑暗面与光明面,你从中选择一方站队。很多时候,你必须在灰色地带游走,怀揣光明之心,动用黑暗能量。这过程中,稍有不慎脚下踏空,或将万劫不复永堕业火,但也总有高人得以竹杖芒鞋凌波微步,火中取粟之后全身而退。

《增长黑客》里提到一些数据抓取解决冷启动的案例。其实在互联网行业里,依靠抓取来获取数据的做法,并不罕见(如果你觉得很新鲜,那只能说道行太浅)。但我敢于将其写出来,这当中自然会遭到一些质疑和非议,参加的大大小小活动也总有人提出如何规避风险的问题。

在此我就「如何规避抓取别家数据的潜在风险?」的问题在此简单阐述我的观点:

第一,技术是中立的,本身不带有任何倾向性。如何你觉得从别的平台「右键另存为」来获得素材运用到自己的产品里不构成任何问题,那么写脚本批量抓取没有改变这件事的性质,它只是帮你将原本需要人工操作三天的事情简化到三个小时内自动完成。

第二,平台各有自己的政策,不同平台的抓取策略不同。有的平台明确在自己的用户协议里声明,「本平台只承担数据存储的作用,内容版权隶属原作者所有」,这时候,你完全可以征得原作者同意之后,以你觉得方便的姿势去获取。至于如何快速大量获得原作者同意?通过写脚本批量发私信给目标群体就可以,这个行为的性质参见第一条。

第三,分清学习目的与商用目的。我专门查询过版权法等相关法律法规,其中对「出于学习目的」是有专门的分类讨论的。如果将抓取来的数据用于产品上线前内部的测试、参考、决策依据,我认为是属于这一范围的(当然还是得具体问题具体分析)。至于正式上线成为一款商业产品,那么还是小心为妙。

最后,你有没有越过界其实你自己心里清楚。

248 thoughts on “如何规避抓取别家数据的潜在风险?”

  1. I always used to read article in news papers but now as I am a user of internet so from now I am using net for articles or reviews, thanks to web.

  2. What’s up every one, here every one is sharing such experience, therefore it’s fastidious to read this web site, and I used to visit this website everyday.

  3. These are truly awesome YouTube videos, its my good fortune to pay a quick visit this web site and finding these awesome YouTube video clips.

  4. What’s up to all, I am also actually keen of learning PHP programming, except I am new one, I each time used to read content related to Personal home page programming.

  5. Hello every one, here every person is sharing these experience, so it’s fastidious to read this web site, and I used to go to see this blog all the time.

  6. Hi to all, the contents existing at this web site are in fact remarkable for people experience, well, keep up the good work fellows.

  7. I think a visualized display can be better then only a easy text, if information are defined in graphics one can without difficulty understand these.

  8. Its not my first time to go to see this web site, i am visiting this web site dailly and obtain nice information from here all the time.

  9. Hmmm, yup no uncertainty Google is best in support of blogging except currently word press is also good as a blogging for the reason that its SEO is nice defined already.

  10. I always used to read paragraph in news papers but now as I am a user of net thus from now I am using net for articles or reviews, thanks to web.

  11. When some one searches for his required thing, therefore he/she wants to be available that in detail, so that thing is maintained over here.

  12. Currently YouTube videos quality is more better and superior, therefore that’s the reason that I am watching this video at here.

  13. If you desire to increase your know-how simply keep visiting this web site and be updated with the hottest information posted here.

  14. Asking questions are actually fastidious thing if you are not understanding something fully, but this article presents good understanding even.

  15. Sharing some thing is superior than keeping up-to our self, therefore the YouTube video that is posted at this time I am going to share by my relatives and friends.

  16. Currently YouTube video tutorials quality is more superior and improved, so that’s the reason that I am watching this video at at this place.

  17. It’s genuinely very complicated in this active life to listen news on TV, therefore I simply use the web for that purpose, and obtain the hottest news.

  18. In support of my learning reasons, I at all times used to download the video lectures from YouTube, because it is easy to fan-out from there.

  19. Hmmm, yup no uncertainty Google is finest in support of blogging but currently word press is also good as a blogging for the reason that its Search engine optimization is pleasant defined already.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax