Data Hazards is a project about worst-case scenarios of Data Science. Data Scientists are great at selling our work, for example communicating the gains in efficiency and accuracy, but we are less well-practiced in thinking about the ethical implications of our work. The ethical implications go beyond most ethics Institutional Review Boards, to questions about the wider societal impact of Data Science and algorithms work.
We aim to create resources to:
Create a shared vocabulary of Data Hazards in the form of Data Hazard Labels.
Make ethical and future-thinking more accessible to data scientists, computer scientists and applied mathematicians - to apply to their own work.
Enable bringing together and respecting diverse and interdisciplinary viewpoints to this work, through workshops or mailing lists.
Find out what circumstances, and for who, these resources work best by
To support our aims we will:
Get feedback on our draft Data Hazard Labels, to develop them with the communities who will be using them.
Create resources that help data scientists reflect on their own work, by creating prompts, frameworks, and forms for them to consider.
Run workshops and mailing lists where data scientists can listen to diverse perspectives and grow their ideas of what is possible, and where interdisciplinary researchers and the public can both be heard, respected, and listened to by the people who are doing computational and mathematical work.
Listen to our community’s feedback through surveys.
Why are the Hazard Labels so scary-looking?
We know that the Data Hazards labels are a bit frightening. Argh, there’s a skull! Please know that we don’t want these labels to scare anyone away from considering ethics or from doing data science, and we will do everything that we can to make applying Data Hazards labels as welcoming and approachable as possible, but also have some good reasons for choosing these images.
We chose this format because of the similarity to COSHH hazard labels - hazard labels for chemicals. We made this choice because we want a similar response from people:
Attention-grabbing, asking people to stop and think, and take the safety precautions seriously, rather than as an optional extra.
We’re asking people to “handle with care”, not to stop doing the work. We still use chemicals, but we think about how it can be done safely and how to avoid emergencies.
They are familiar, especially to scientists, who (within universities) tend to have the least experience of applying ethics.
Here’s a rough project timeline to let you know what we’ll be up to:
March-April 2021: Behind the scenes plans
Thinking, reading and planning
Getting feedback on initial ideas
Sept 2021 Run first Data Hazards workshops (academic-focused)
Run first Data Hazards workshop on 21st Sept.
Oct-Dec 2021 Use workshop feedback to improve data hazards
Look at workshop feedback to make improvements to:
data hazards labels
Nov 2022 Trial asynchronous Data Hazards (without group discussion) as a tool for assessing funding applications.
Early 2022 Run second Data Hazards workshop (public and company/local-government focused)
Spring 2022 Write up Data Hazards paper