Data Hazards

Data Hazard warning signs labelled algorithmic decision-making, ranking people, difficult to understand.

Summary

Data Hazards is a project about worst-case scenarios of Data Science. Data Scientists are great at selling our work, for example communicating the gains in efficiency and accuracy, but we are less well-practiced in thinking about the ethical implications of our work. The ethical implications go beyond most ethics Institutional Review Boards, to questions about the wider societal impact of Data Science and algorithms work.

Aims

We aim to create resources to:

  1. Create a shared vocabulary of Data Hazards in the form of Data Hazard Labels.

  2. Make ethical and future-thinking more accessible to data scientists, computer scientists and applied mathematicians - to apply to their own work.

  3. Enable bringing together and respecting diverse and interdisciplinary viewpoints to this work, through workshops or mailing lists.

  4. Find out what circumstances, and for who, these resources work best by

How

To support our aims we will:

  1. Get feedback on our draft Data Hazard Labels, to develop them with the communities who will be using them.

  2. Create resources that help data scientists reflect on their own work, by creating prompts, frameworks, and forms for them to consider.

  3. Run workshops and mailing lists where data scientists can listen to diverse perspectives and grow their ideas of what is possible, and where interdisciplinary researchers and the public can both be heard, respected, and listened to by the people who are doing computational and mathematical work.

  4. Listen to our community’s feedback through surveys.

Why are the Hazard Labels so scary-looking?

We know that the Data Hazards labels are a bit frightening. Argh, there’s a skull! Please know that we don’t want these labels to scare anyone away from considering ethics or from doing data science, and we will do everything that we can to make applying Data Hazards labels as welcoming and approachable as possible, but also have some good reasons for choosing these images.

We chose this format because of the similarity to COSHH hazard labels - hazard labels for chemicals. We made this choice because we want a similar response from people:

  1. Attention-grabbing, asking people to stop and think, and take the safety precautions seriously, rather than as an optional extra.

  2. We’re asking people to “handle with care”, not to stop doing the work. We still use chemicals, but we think about how it can be done safely and how to avoid emergencies.

  3. They are familiar, especially to scientists, who (within universities) tend to have the least experience of applying ethics.

Project timeline

Here’s a rough project timeline to let you know what we’ll be up to:

March-April 2021: Behind the scenes plans

Thinking, reading and planning

Writing proposal

Getting feedback on initial ideas

May-Aug 2021: Prepare for first Data Hazards workshop

Get website online

Submit ethics application

Get initial feedback on Data Hazards labels

Draft workshop materials

Get feedback on workshop materials

Begin advertising workshop

Set up Open Science Framework project and preregister analysis

Sept 2021 Run first Data Hazards workshops (academic-focused)

Oct-Dec 2021 Use workshop feedback to improve data hazards

  • Look at workshop feedback to make improvements to:

    • data hazards labels

    • workshop exercises/materials

Nov 2022 Trial asynchronous Data Hazards (without group discussion) as a tool for assessing funding applications.

Details TBC

Early 2022 Run second Data Hazards workshop (public and company/local-government focused)

Details TBC

Spring 2022 Write up Data Hazards paper

Details TBC