Hacker News new | past | comments | ask | show | jobs | submit login

I'd certainly be interested to hear how quantities where dimensions would normally divide out should be modelled and some concrete statistical examples how this really helps avoiding likely errors.



I spent some time a while back looking at it from an algebra point of view, and I would like to make connections to specific analyses. The challenge for me is finding mistakes.

One example area is percentages, which seem dimensionless. Grams per gram yield a percentage. Like grams of sodium per gram of cheese. Or grams of potassium per gram of cheese. I can sum these percentages to get grams of alkali per gram of cheese ONLY if these come from the same or “sufficiently similar” sample.

Looking only at the unit of the measurement, without also looking at the set of measurements, can quickly go awry. “Dimensionless” values can be used as multipliers/scalars only when the context makes sense.

Here’s a subtle one: let’s say I have an additive quantity that I’m measuring, like gallons of paint in various buckets. What are the units for the total paint across all buckets, and what’s the units for the mean quantity of paint per bucket? If these truly have the same units, I could sum averages and sums together. Heck, we could meaningfully sum averages alone. But we know that’s not the case.


How do you propose to encode the additional information that is needed to detect these mistakes though?


The unit expressions need more information. Some of it is simply extending with more operator: For example, logarithms and exponents can be carried through and track that the units resulting from log(e, height in cm) are log(e, height in cm) by allowing the log() function to be used in a unit expression. This allows validation that if we add log(x) to log(y) and we raise the base to the power later we expect units of x * y.

For sums vs means, we need to include count of samples in the mean. If we forget it and it becomes a free variable N, we lose the ability to combine by addition, but other operations may still hold. The unit expression for a sum of count quantities is therefore different from the unit expression for average. You can derive unit expressions for std. deviation and other expressions as well.

Ultimately you also need a subjective ontology as well. Your usual unit system says it’s fine to add mass to mass and mass per volume to mass per volume. That misses the semantics of mass of what per volume of what? Mass of sodium per liter of water and mass of potassium per liter of water, maybe. Mass of sodium per liter of water and mass of hydrogen per km3 of star, I don’t think so.

Unit tracking is all about validation of the sensibility of performing math, and the more detail you need the closer the unit expressions become to a signature for the math + objects being quantified.

Time permitting from my day job, I’ve been trying to flesh this out for a while and could use interested parties to poke at it, give me a spur to flesh out and refine more. Counts of objects, linear measurements and rational combinations work out pretty well in this, plus stats and transforms like log and exp. Angular units are a TBD, I think they’ll be tractable, just a little different than conventionally expected.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: