Case Study
Respecting Data Privacy: Anonymizing Practices at DataShop
Submitted by:
Ken Koedinger, Carnegie Mellon University
Intervention Types
Process
Software
Respecting learner privacy in sharing data can be accomplished by implementing some simple practices which still give learning researchers reasonable access to such data. Privacy risks for data sharing depend on the nature of the data, and thus access privileges that researchers get can be associated with the degree of risk for any given set of data. So, for example, researchers in Data Shop collect mostly clickstream data (i.e., data about student interactions in online courses). There is nothing self-identifying about such data. We considered including demographic and record information for individual students or for schools, but, to be on the safe side with respect to student privacy, we decided that public datasets would not include that information. So for Data Shop public access data, the risk of connecting data to a specific student is near zero.
Further, any student identifier in our data should already be anonymized, but as an extra precaution, we anonymize again any student identifier we are given.
Data Shop also offers a set of interfaces to support research managers in checking that each entered dataset meets all standards and requirements (e.g., that there is appropriate IRB approval) for either public access or for more limited sharing amongst an identified set of researchers.
- Privacy risks for data sharing depend on the nature of the data, and appropriate access privileges can be associated with the degree of risk for any given set of data.
- DataShop takes extra precautions by anonymizing data a second time to eliminate any privacy concerns for it public access data.
- DataShop interfaces are designed to support research managers in checking that datasets meet all privacy standards and requirements.