Understanding values in the data science process

New information and communication technologies have facilitated the collection of data, both personal and non-personal, at a large scale, creating what some call the “data-driven society”.1 This data collection has facilitated the growth of “data science”, and the development of automated algorithmic tools for making or informing decisions about people in a wide range of areas with legal or societal impact, including employment, policing and the media.

Addressing these concerns about the use of data science has led to research from a technological per- spective, such as attempts to create interpretable classification algorithms2 or preventing bias.3 Legal academics have also examined possible remedies through data subject rights in the new EU General Data Protection Regulation.4 There is debate, however, over the effectiveness or appropriateness of exante versus ex post solutions; should system designer predict and mitigate harm before it occurs, or me-diate after the fact? There is also concern over the effectiveness of individual data subject rights when harm might affect an entire cohort of citizens; e.g. access to individual personal data may not detect discrimination across a group of individuals.5

This has led to calls for system designers to think about societal concerns at the outset of the design process, e.g. through impact assessment6 or the GDPR’s requirements for data protection by design. At the same time, there has been some resistance from developers who claim that their algorithms are merely neutral.7 This project complements existing technological and legal analysis by understanding the processes by which data science projects are created. In contrast to work that has focused on particular end uses of data science,8 we propose to study the entire data science process from initial research to system deployment. Do system developers and designers consider the ways in which their algorithms and systems might be used, and if so are these concerns expressed throughout the process? In particular the project aims to study the questions:

• how are values expressed in the data science process, from the original research into classifiers, to the sharing of findings and datasets, to the development of systems?
• where, and how, should we embed values in data science to achieve desirable societal outcomes?

Project outline
This project will:
• collect a corpus of information about large data science projects; from the initial research papers that developed the algorithms, to the datasets that were subsequently shared, and to any subsequent projects that used these datasets or algorithms.
• record information about how, if at all, values and concerns are expressed during these projects. Are datasets released under particular licenses? Do papers record ethical concerns? Do projects include privacy policies or record the existence of privacy by design processes or privacy impact assessments?
• use a methodology derived from science and technology studies to first map the clusters of ex- pertise which form data science projects and the connections between them. Networks may cross fields of computational expertise, organizations and jurisdictions.9
• trace the flow of values across these dispersed networks interrogating where values are formed, how they are transferred and how their trajectory across the network is shaped.10
• propose solutions to augment existing privacy by design, data protection by design, and impact assessment techniques to embed values in the data science design process.

The proposed research should result in novel research outputs, datasets of use to the data science re-search community, and potential impact routes through standards bodies and policy makers.

Interdisciplinarity
This is an inherently interdisciplinary project which combines aspects from both computer science and management. The use case itself is embedded in computer science, as are some of the techniques used to collect and analyse data. Management provides the tools needed to analyse the structure of the data science ecosystem. We are therefore open to applicants with a wide variety of interdisciplinary and multidisciplinary backgrounds, ranging from management to computer science to technology law, and can tailor the project as suited to the chosen candidate.

Supervisors
This PhD project will be co-supervised by Kirstie Ball and Tristan Henderson.
Kirstie Ball is Professor of Management at the University of St Andrews. Her research interests include surveillance, privacy and employee monitoring.
Tristan Henderson is Senior Lecturer in Computer Science at the University of St Andrews. He has a multidisciplinary background (MA Economics; PhD Computer Science; LLM Innovation, Technology

How to Apply
Candidates must meet the usual standards for entry to Ph.D. programmes at the University of St Andrews, including a Masters degree due to be completed by summer 2018, and appropriate English language skills.  You should apply via the usual University of St Andrews online procedure specifying ‘Computer Science’ or 'Management' as the primary programme for administrative purposes, and depending on your interests.  Please include a research proposal of up to 1000 words, including some indication of how your education, interests, and professional experience (if relevant) prepare you to undertake a project of this nature. 

Contact Dr Henderson and/or Professor Ball  for informal enquiries.

Deadline: Friday 23rd February 2018

Endnotes
1. Alex Pentland, ‘The Data-Driven Society’ (2013) 309(4) Scientific American 78.
2. Finale Doshi-Velez and Been Kim, ‘Towards A Rigorous Science of Interpretable Machine Learning’ (March 2017).
3. David Danks and Alex J London, ‘Algorithmic Bias inAutonomous Systems’ (Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, August 2017).
4. Lilian Edwards and Michael Veale, ‘Slave to the Algorithm? Why a ‘Right to Explanation’ is Probably Not the Remedy You are Looking for’ [2017] Duke Law & Technology Review; Sandra Wachter, Brent Mittelstadt and Luciano Floridi, ‘Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation’ (2017) 7(2) International Data Privacy Law 76.
5.  Tristan Henderson, ‘Does the GDPR help or hinder fair algorithmic decision-making?’ (LLM dissertation, Edinburgh Law School 2017).
6. Lilian Edwards, Derek McAuley and Laurence Diver, ‘From Privacy Impact Assessment to Social Impact Assessment’ (2016) IEEE Security and Privacy Workshops (SPW), San Jose, CA, USA, May 2016).
7. Xiaolin Wu and Xi Zhang, ‘Responses to Critiques on Machine Learning of Criminality Perceptions (Addendum of arXiv:1611.04135)’ (May 2017).
8. Michael Veale, ‘Logics and practices of transparency and opacity in real-world applications of public sector machine learn- ing’ (Proceedings of the 4th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017), Halifax, NS, Canada, August 2017).
9. Greg Satell, ‘What makes an Organization “Networked”?’ [2015] Harvard Business Review.
10. Kirstie Ball, ‘Elements of surveillance: A new framework and future directions’ (2002) 5(4) Information, Communication & Society 573.