Hacker News new | past | comments | ask | show | jobs | submit login

> fairly similar reporting and testing regimes thanks to a long shared history, such as the UK and Australia.

Why would this result in such a difference with Canada then?




The US is such an outrageous outlier that it does seem suspicious. Stereotyping the US health system, my first bet would be that doctors in the US can charge patients slightly more if they identify a cancer.


It could also be due to the EU's stricter pollution, chemical and food regulations compared to the US, the Brussels effect tends to make them apply globally to smaller markets.


I honestly think it could be as simple as being proportional with obesity, sedentary lifestyles, and sugar consumption that exist in the US and Canada.

The US and Canada really stand out specifically in their suburban infrastructure, which may strongly correlate with cancers in very obscure ways, such as simply needing more preservatives, such as sugars in bread, simply because people buy more in bulk because the culture of supermarkets.

I'll be honest, I think the chart needs a full explanation, but when I think of the US and Canada, and what makes them really unique, it's the complete lack of walking anywhere except the few urban centers.


In 1990, US was 18.7% obese with a cancer incidence of 1,760, UK and Australia at 780.

Most recent is 2016 showing Australia and UK at 30% obesity, yet their cancer incidence is lower than ever at 750 and 682, respectively.

Everyone but the US (and Poland) are increasing their obesity while their cancer incidence is flat or decreasing: https://ourworldindata.org/grapher/share-of-adults-defined-a...


I would really like to see a regional breakdown within the US.

At least by state. If possible also by urban/rural populations.

There are areas which are notorious for greater cancer risk, e.g., "cancer alley" through the Mississippi delta region. And there are regions with far greater access to healthcare, notably the north-east and northern plains states (especially Minnesota), and western states (CA, OR, WA).

Many of the more rural states (WY, ID, NM, ND, SD) would have correspondingly less access to healthcare.

The mix of potential causes (e.g., polluted regions, lifestyle) and available Dx / Rx / Tx) would be useful in distinguishing the cause of reported incidence.


I suspect you can't really get much from that kind of geographic breakdown in practice at least when it comes to pollution or lifestyle, because people move, especially nowadays people move quite a lot from where they are born and exposure to harmful stuff during pregnancy and youth are suspected to be much stronger factors than comparable exposure during adulthood.


Even if people move:

- Those who are exposed to a specific area for a prolonged period of time will leave a strong signal.

- Particularly those exposed during pregnancy, infancy, and early childhood, which you note. Even if significant numbers subsequently move you should see a trace amongst those who do not, and the movement will be somewhat arbitrary.

- Those who do move often and readily often move among specific types of areas, mostly large and coastal cities.

- Those who do not move often tend to be concentrated among regions of lower wealth. Where they do move, it's typically among such regions, or often within poorer regions (e.g., from a smaller town to a nearer larger city).

Even then, overall numbers are still ... comparatively small. Looking at California, 6.1 million people left the state from 2010 to 2020, of a total population of just under 40 million, or 15%. Which leaves us with 85% of state residents who did not move out of the state. (They may of course have moved within the state.)

<https://worldpopulationreview.com/state-rankings/people-leav...>

So I'm still pretty sure that state and major city stats would be illuminating.


The problem I'm thinking about is that you can end up with as many people having grown in say a pesticide-polluted rural environment later living on big cities as you have people remaining where the problem comes from. The map can then seem to show literally the opposite of the phenomenon occurring.

You can correct for these things, but it's going to require more data, and a more complex causal model describing the phenomenon if you don't want your controls to introduce bias themselves.


So, first, that's a fair point.

It's also a challenge that public-health epidemiologists have been dealing with for a long time, for which there's been a tremendous recent explosion in both data and research methods. And there are ways to test for this, which I'm not fully aware of, though I've some basic familiarity.

I've already addressed some of this above, so with some repetition:

- People simply don't move that much, have been moving less over time within the U.S.,[1] and moreover don't move consistently. So wherever there's an initial strong cause, you'll have a fairly large cohort remaining on that site and showing impacts over time, particularly those who are most susceptible to such influences. Again: neonates, infants, children. Many cancer / disease clusters are found by such mechanisms.

- Where people do move, the end result is something of a "blurring of the signal". You'll get a blob at the origin, and maybe scattered points elsewhere. Those will tend to be at likely points of migrations: nearby neighbourhoods and towns, nearby large cities, regional/national cities of prominence, and (sometimes) locations with established immigrant communities (whether intranational or international). These are ... somewhat ... predictable patterns. The signal will tend to be strongest at or near the source.

- Deeper and extended data. Where topical data (e.g., diagnosis and current residence) don't seem to correlate with a known possible cause, or show a rare-but-below-threshold cluster, epidemiologists will dig for further information. Possibly with patient surveys, possibly other methods. What they're looking for in that case will be recent, or non-recent, movement patterns. Once a probable cluster source is identified, that can be used as a specific clue for further research. This is of more use to an epidemiologist who can conduct such further research than a data scientist who's working off extant databases (partial, limited data capture, etc., etc.), but are possible. And yes, this is one of the fundamental limitations of strictly broadly-captured data research.

There's a lot of medical research, even within healthcare and governmental organisations which relies on fairly low-quality and easily-collected data. The reason is that those data exist and are cheap. The questions are how to maximise utility of such sources, and knowing when to dig deeper.

Again: people moving really isn't the major problem you're making it out to be. Yes, it makes the job somewhat more challenging. But it's still generally tractable.

________________________________

Notes:

1. See "Despite the pandemic narrative, Americans are moving at historically low rates" (2021) <https://www.brookings.edu/articles/despite-the-pandemic-narr...>, "Americans are moving at historically low rates, in part because Millennials are staying put" (2017) <https://www.pewresearch.org/short-reads/2017/02/13/americans...>, and "Americans no longer want to move for work. Here's why." (2023) <https://www.cbsnews.com/news/moving-for-work-mobilty-record-...>. In the 1950s and 60s, as much as 20% of Americans moved every year. By 2021 that had fallen to 8%.


I'm not saying that moving people are a complete showstopper for epidemiologists, what I'm saying is that it make the map visualization a poor fit for the task because it will induce the casual reader with access to only the map (that is, not epidemiologists with access to the full data) into making wrong conclusions.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: