Data Hazard labels¶
This page contains the Data Hazard labels themselves. These labels, descriptions, examples, and safety precautions will evolve as we develop the hazard labels with the communities who will use them. We welcome you to suggest changes via email or GitHub.
Each hazard has:
Hazard image, title, and description which represents and describes the risk.
Examples to clarify what the hazard covers.
Safety Precautions - things that we would want to see done before the research is deployed.
They are designed to help us think about the different types of hazards
Hazard: Data Hazard
Data Science is being used in this output, and any negative outcome of using this work are not the fault of “the algorithm” or “the software”.
This hazard applies to all Data Science research outputs.
All other Data Hazard examples could feature as examples here.
Hazard: Reinforces existing biases
Reinforces unfair treatment of individuals and groups. This may be due to for example input data, algorithm or software design choices, or society at large.
Note: this is a hazard in it’s own right, even if it isn’t then used to harm people directly, due to e.g. reinforcing stereotypes.
Hazard: Ranks or classifies people:
Ranking and classifications of people are hazards in their own right and should be handled with care.
To see why, we can think about what happens when the ranking/classification is inaccurate, when people disagree with how they are ranked/classified, as well as who the ranking/classification is and is not working for, how it can be gamed, and what it is used to justify or explain.
Example 2: School league tables (which rank the perfmance of schools).
Hazard: High Environmental Cost
This hazard is appropriate where methodologies are energy-hungry, data-hungry (requiring more and more computation), or require special hardware that require rare materials.
Hazard: Lacks community involvement This applies when technology is being produced without input from the community it is supposed to serve.
Hazard: Danger of misuse There is a danger of misusing the algorithm, technology, or data collected as part of this work.
Example 1: Statistical method to do impossible tasks, for example predicting future human behaviour.
Example 2: The collection of a large data set of individuals, which could be hacked, or used for other purposes.
Hazard: Difficult to understand There is a danger that the technology is difficult to understand. This could be because of the technology itself is hard to interpret (e.g. neural nets), or problems with it’s implementation (i.e. code is not provided, or not documented).
Depending on the circumstances of its use, this could mean that incorrect results are hard to identify, or that the technology is inaccessible to people (difficult to implement or use).
Example 1: Deep learning is used to perform credit-scoring (i.e. could deny people credit), but it is difficult to understand (and therefore check) what these decisions are based on.
Example 2: Even when journals have a policy of having code and data availability, published researchers can be unaware of what they agreed to and resist sharing it, as this paper surveying Science publications shows.
Hazard: May cause direct harm The application area of this technology means that it is capable of causing direct physical or psychological harm to someone even if used correctly e.g. healthcare and driverless vehicles may be expected to directly harm someone unless they have 100% accuracy.
Hazard: Privacy This technology may risk the privacy of individuals whose data is processed by it.
Example 1: Facial recognition technologies are a widely recognised risk to privacy.
Example 2: Apps designed to monitor children’s digital activity risk children’s privacy, and can be repurposed maliciously as stalkerware.
Hazard: Automates decision making Automated decision making can be hazardous for a number of reasons, and these will be highly dependent on the field in which it is being applied. We should ask ourselves whose decisions are being automated, what automation can bring to the process, and who is benefitted/harmed from this automation.
Example 1: Predictive policing is used to decide where to deploy officers.
Example 2: Credit scores are produced automatically and rarely involve human input.
Hazard: Lacks Informed Consent This hazard applies to datasets or algorithms that use data which has not been provided with the explicit consent of the data owner/creator. This data often lacks other contextual information which can also make it difficult to understand how the dataset may be biased.
Example 1: Large public social media datasets rarely collect informed consent from ‘participants’.
Example 2: Data linkage projects tend not to involve informed consent.