Here's another take on what was ethically incorrect about the Emil Kirkegaard data dump of nearly 70,000 OKCupid users.
There are essentially four points that are to be considered:
1. OKC users did not consent to their data being used in the way Kirkegaard used them or to be shared. In fact, even a cursory reading of the terms of service at OKC are clear on the matter of how their data are to be used.
2. OKC itself did not consent to the data being used as it was. We can certainly quibble about concern over a company's well-being, but certainly there is precedent for researchers obtaining a company's permission to analyze their members' data (with safeguards for anonymity put in place). The reason I add that last sentence is that some of my first research experiences as an undergrad involved analyzing data from a computer matching service, from which my mentor had obtained all necessary permission. Everything was above board. In the process, we were able to make a number of statements with that data, including confirming that successfully matched couples tended to have considerably more in common than couples that were not successful matches. It made for a couple conference presentations, if nothing else, and no one's privacy was harmed in the process. A win-win in my book.
3. Which leads to a third point - the way Kirkegaard went about releasing the database could lead to real harm to real people. That is not how we operate as scientists. A guiding ethical principle that was imparted to me and that I share with my students is that we do no harm to those we study. Putting others' privacy at risk is a good reason to not make a database public or go forward with a particular project. The potential for individuals in the OKCupid database to be personally identified is one I find rather unsettling (and that is really an understatement), especially to the extent that their being personally identified could lead to real physical or financial harm. The publishers of the database showed cavalier disregard to that possibility, and those who are using this data going forward are doing likewise.
4. Any work involving human participants requires review by an institution's ethics panel. That applies not only to faculty and staff members of an institution, but its students as well. Students in particular are still beginning to learn the research process, and the idea of students (at any level) being turned loose to simply run whatever they want without oversight is not something a reputable institution will stand for. However, regardless of whether one is a student or a seasoned professional, and regardless of how excited one might be about their research ideas, there is a need for an impartial third party to provide some modicum of oversight - if nothing else to determine the ethical soundness of a particular project. In addition, the Kirkegaard data dump along with its accompanying paper provides another problem: conflict of interest. Not only did Kirkegaard entirely sidestep his institution's IRB, but he made sure that another potential safeguard was sidestepped: the role of the journal editor and peer reviewers. As the editor in chief of the "journal" in which his work was published, Kirkegaard is his own judge, jury, and executioner.
I am sure we all have our own various war stories to tell about our institutions' IRBs or ethics panels, but at the end of the day, they do serve an important function. Regardless of any frustrations I've had with my own, ultimately I and my students get to conduct our research. We may a wait on our hands, but we'd rather err on the side of safety.
I am sure that this incident will cast a shadow over the open science movement, which is a shame since I do think that this is a movement that serves a purpose as well. The appeal of the open science movement was its apparent dedication to transparency, leading to better and more ethically sound science. OSF, which housed the data set in question, should simply remove the data set posthaste. In other words, an organization that is the face of the movement needs to take a strong stand now, rather than freeze in a time of crisis. In any form of communication, there is a fine line between openness and TMI (aka., too much information). Kirkegaard has erred on the side of TMI, and could drag down a potentially beneficial movement within the sciences with him in the process. That should concern all of us.