Note to Self: Data Science Reading List:

  • Leo Breiman: Statistical Modeling: The Two Cultures: "The data are generated by a given stochastic data model... [vs.] algorithmic models... treat[ing]... the data mechanism as unknown. The statistical community['s]... commit[ment]... to... models... has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems..."
  • Ben Fry: computational information design: "Fields such as information visualization, data mining and graphic design... each solv[e]... an isolated part of the specific problem but fail... in a broader sense.... This dissertation proposes that the individual fields be brought together as part of a singular process titled Computational Information Design..."
  • Hal Varian: How the Web challenges managers: "There’s already been a big revolution in how we view intellectual property.... It’s not so much the question of what’s owned or what’s not owned. It’s a question of how can you leverage the assets you have to realize the most value.... Disseminating content... has become intensely competitive..."
  • Drew Conway: The Data Science Venn Diagram "The primary colors of data: hacking skills, math and stats knowledge, and substantive expertise..."
  • Gil Press: A Very Short History Of Big Data: "1944 Fremont Rider... publishes The Scholar and the Future of the Research Library...:
  • Gil Press: A Very Short History Of Data Science: "The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by scientists, statisticians, librarians, computer scientists and others for years..."