Computer bits vs. California fish

Can computational toxicology reproduce lab tests for consumer products?

In 2017, the California Department of Toxic Substances Control (DTSC) published a study on the aquatic toxicity of 417 consumer products

They sent each product – vitamins, soaps, cleaners, ink cartridges, toothpastes, and more – to multiple labs and ran tests on fish to determine the products’ toxicity. The tests implemented a measurement called LC50 – or Lethal Concentration 50 –which is the amount of a chemical needed to kill 50% of a tested species (in this case fish). Conducting these lab tests is technically a DTSC requirement for determining whether a substance is hazardous waste.

That got us thinking.

Is it possible to reproduce those lab results using computational toxicology instead of killing fish? And is it possible using only publicly available data? The answer, we think, is “yes”.

Here’s our stepwise analysis of one product to see what is possible.

Picking Our Product

We limited our set to products that have two valid LC50 measurements from the DTSC study, as well as a Safety Data Sheet (SDS) and a full ingredients list. Seems reasonable.

Then we picked a lucky winner: Febreze Fabric Refresher Allergen Reducer Clean Splash.

Here’s what you need to know about it:

Pay particular attention to the last three rows of the table. Those are the results of DTSC’s lab tests on our Febreze product. In two tests, it took 201 mg/L and 330 mg/L to kill 50% of fish in the respective test. The average is 266 mg/L. Hold those numbers in your brain.

Step 1: What does the Safety Data Sheet tell us?

To compute the aquatic toxicity for Febreze, we first need to know something about the chemicals in the product.

Let’s start by looking at the product’s Safety Data Sheet.

Section 3 gives us our first glimpse of chemical detail:

The SDS says the product has 1%-5% ethanol. If you’re surprised that only one ingredient is listed, don’t be. You’ll soon see this Febreze has far more ingredients, but due to OSHA’s requirements for Section 3 of an SDS, only the ethanol must be listed here.

We can work with that. We’ll start measuring the product’s aquatic toxicity immediately and then see how that measurement evolves as we uncover more data. If you’re a toxicology nerd, this may even be fun.

We measure aquatic toxicity using Acute Toxicity Estimate (ATE) – a formula that weighs the toxicity of a mixture’s chemical constituents and estimates the overall toxicity of the mixture.

ATE is a broadly accepted alternative to conducting live animal tests, and is the preferred method for evaluating toxicity by many state, federal and international regulatory bodies. And – good news – it is measured in the same units as LC50 (milligrams per liter or mg/L).

Before we can measure, we need to know something about ethanol, so we consult our vast library of toxicology studies, limiting our results to aquatic studies.

Here’s what we find:

A little help reading this: We have 29 distinct aquatic toxicity values for ethanol. The distribution covers a lot of ground – from 100 mg/L to 28,100 mg/L – which isn’t uncommon. We take the median (11,200 mg/L).

Now we can take our first hack at calculating ATE, using both the minimum and maximum percentages listed in the SDS (1%-5%).

ATEmix(min concentration) = 100 / (1 / 11,200) = 1,120,000 mg/L
ATEmix(max concentration) = 100 / (5 / 11,200) = 224,000 mg/L

Higher ATE values indicate that more of the substance is needed to be lethal. So the ATE values ranging from 224,000 – 1,120,000 mg/L are almost entirely non-toxic, meaning that our estimate is very “fish-friendly”.

Remember, the DTSC lab results had a range of 201 – 330 mg/L, which shows that this ethanol-only mixture isn’t telling the full story. We’re currently way off.

Let’s throw our data on a graph – with a logarithmic axis – so we can begin measuring our computational ecotoxicity against the lab results, shown here as horizontal red lines.

That’s not satisfying yet. Let’s go further.

Step 2: Can we extract more useful data from the SDS?

Take a look at Section 15.

We picked up two new ingredients. CAS Numbers 111-46-6 and 110-16-7 are both listed as Pennsylvania Right To Know chemicals.

The fact that these ingredients are listed in Section 15 – but not Section 3 – tells us something.

OSHA rules regarding SDS authorship state that potentially hazardous chemicals must be listed in Section 3 if they exceed concentrations of 1%, unless they are carcinogenic, in which case they must be listed in Section 3 if they exceed concentrations of 0.1%.

Logically, what this means is that if these two chemicals are carcinogens, then they have a theoretical maximum concentration of 0.1%. If they are not carcinogens, then they have a theoretical maximum concentration of 1%.

A quick check of our two carcinogen databases (EPA and IARC) lets us know that our chemicals are non-carcinogenic.

Great! We have two more chemicals to add to our ATE formula. We can assume a maximum concentration of 1% for both of these chemicals.

We’ll also assume a minimum concentration of 0.01% based on standards set by the California Cleaning Products Right to Know Act. More on this later.

Now let’s update our picture of this product:

Time to run our ATE calculator:

ATEmix(min concentration) = 100 / ((1 / 11,200) + (.01 / 500) + (.01 / 5)) = 47,409 mg/L
ATEmix(max concentration) = 100 / ((5 / 11,200) + (1 / 500) + (1 / 5)) = 494 mg/L

And update our visualization:

[! img not found]

The ATE for the assumed maximum concentration (494 mg/L) is far more in line with the lab average (266 mg/L). That made a difference. But the range is still vast.

Let’s keep digging.

Step 3: What does a publicly-available ingredients list tell us?

Thanks to Procter & Gamble’s participation with SmartLabel, we found an Ingredients List for this formulation of Febreze.

Now we’ve grown our knowledge of this product from three ingredients to sixteen. Assuming minimum and maximum concentrations as we did in the last section – and adding more LC50 values from our toxicology library – we again refresh the picture.

A few things to note.

First, let’s return to why we’re assuming a minimum of 0.01% for these ingredients. As of this year, the California Cleaning Products Right to Know Act requires that brands list on their website all ingredients for cleaning products that meet or exceed 100 parts per million (0.01%). Therefore, we assume that P&G is listing the ingredients that meet that criteria.

Second, note that we’ve called out the order in which the ingredients are listed on the label. Brands are required to list ingredients in order of descending concentration. In the next section, we’ll use this rule to further our analysis. Or not. I don’t want to give too much away.

You’ll also note that we’ve called out the ingredients that are fragrances. Fragrances are often included in much smaller concentrations, but here we’ll still assume the same 0.01%-1% range. (There’s one exception: CAS 123-35-3 is an IARC 2B carcinogen and therefore has a maximum assumed concentration of 0.1%.)

Now with far more information than we had to start – ”Section 3 only” feels like a distant memory – we can rerun the ATE calculation, which yields the following:

ATEmix(max concentration) = 18 mg/L
ATEmix(min concentration) = 457 mg/L

Now we’re getting somewhere.

That’s still a wide range of values, but it’s far, far tighter and our target lab value of 266 mg/L is in between our minimum and maximum.

In fact, the median of our “SDS + Full Ingredients List” range is 238 mg/L, which is within spitting distance of our lab median.

Check out how much our predictive ability has improved as we’ve introduced new data.

Of course, that graph has its own lie. We’ve been using a logarithmic axis to keep everything in a readable graph that fits on your computer monitor. When we map our values on a linear axis you can see for real how much closer our computation is getting to the lab values.

We push on.

Step 4: What can statistical modeling tell us?

We can’t know the exact concentration of each ingredient using only publicly available information. But that doesn’t mean we’re done. Frankly it doesn’t even necessarily mean we need to know exact concentrations.

Let’s use what we know about the Rules of the Label to model informed predictions of each chemical’s concentration.

We know that the ingredients are listed in descending order of concentration. Therefore, we can apply that logic to assign randomized concentration values with a descending maximum limit. We did this 1,000 times – creating 1,000 simulations of what the concentration could be – and then arrived at a distribution of ATE values.

Here are the end results:

What did we achieve? Let’s graph it again.

The confidence interval got slightly tighter (29.0 mg/L to 455.0 mg/L).

The median of the simulations (138 mg/L) is lower than the median of our previous range. That 75th Percentile value lines up nicely, but who’s to say what that means? Again, we’re only analyzing one product here – hardly a trend.

But there’s comfort in knowing that running 1,000 simulations – that are modeled based on the logic of industry regulations – creates a range that is so in line with the lab results that DTSC derived for this product.

Plus, the ATE values we got agree with the labs in that the product has an aquatic toxicity less than 500 mg/L, which is the threshold for a product being a hazardous waste in California due to aquatic toxicity.

Where does this leave us?

Even without exact concentration values for a product’s ingredients, one can begin replicating toxicity results from a laboratory by using publicly available product data and historical toxicological data.

The more information the better, but we saw how close we got armed only with a Safety Data Sheet, an ingredients list, our tox library, statistics and gumption. We’re actually armed with a lot, but you get the point. We didn’t do an aquatic bioassay.

We believe this analysis points toward a potential future wherein ATE could replace live fish testing. It’s just one product, but it’s a start. We’re going to keep testing and sharing our results to see if we can take this further.

Once we analyze a greater number of products we’ll better be able to spot trends and tweak our models in the name of computational toxicology for consumer products – and California’s fish.

We’ll report back soon.

‍