Hacker News new | past | comments | ask | show | jobs | submit login
88% of all integers have a factor under 100 (datascienceworld.com)
116 points by psangrene on April 2, 2017 | hide | past | favorite | 41 comments



A few comments asked if the limit is 100%. The answer is yes, but it use some advanced math. Luckily the main steps are nice and easy to explain theorems. So if you blindly believe the theorems then you will be fine.

First, it's easier to think about

  q(m) = 1 - p(m) =      PROD      (1 - 1/prim(k))
                     {m = 1 to k}
i.e. the proportion of integers that are not divisible by any of the first m prime numbers. We want to prove the limit of q(m) is 0.

We need an auxiliary result. It is well known that the sum of the reciprocals of the primes is infinite. https://en.wikipedia.org/wiki/Divergence_of_the_sum_of_the_r...

       SUM      1/prim(m)  =   + ∞
  {m = 1 to ∞}
["Intuitive fake explanation": There are infinite primes, they are not so far away, so the sum of 1/primes is infinite.]

The second result is usual in complex analysis. https://en.wikipedia.org/wiki/Infinite_product The real version says that if we have infinite numbers x(1), x(2) ,... x(m) such that 0 <= x(m) < 1 then

* If

         SUM      x(m)  =   + ∞
    {m = 1 to ∞}
then

         PROD      (1 + x(m))  =  +∞
    {m = 1 to ∞}
and

         PROD      (1 - x(m))  =  0
    {m = 1 to ∞}
["Intuitive fake explanation": For the first result, (1+x) * (1+y) * (1+z) > 1 + x + y + z . So if x + y + z + ... goes to infinite, then (1+x) * (1+y) * (1+z) * ... goes to infinite too. The second part uses a similar idea, with more hand waving and the fact that log(1+x) ~= x.]

Now we can join both results. If we take x(m) = 1/prim(m) then using the first result we have that

         SUM      1/prim(m)  =   + ∞
    {m = 1 to ∞}
and then using the second part of the second theorem

         PROD      (1 - 1/prim(m))  =   0
    {m = 1 to ∞}


An amazing explanation I must say regardless of all the fake disclaimers does illuminate the key point :-)


It's more surprising that it grows so slowly. 50% of integers are have a factor under 3, 73% - under 6, but then it slows down to a trickle. 88% under 100, and just 92% under 1000.

Another interesting fact is that the sum of 1/p for every prime p diverges. It shows that the set of primes is "large" in a certain sense. It would probably be much easier to factorize integers if it wasn't.


I don't find the growth rate surprising at all!

> 50% of integers are have a factor under 3

every other number is divisible by 2, that is literally half.

> 73% - under 6

Well I think about it this way: every other number is divisible by two, every third number is divisible by 3 and every fifth number is divisible by 5. It just so happens that some of the numbers that are divisible by 2 are also divisible by 3 or 5 such as 10 for instance. Thus the 73% is nor rly surprising.

1/2 + 1/3rd of what remains + 1/5 of what remains - the intersections!

> but then it slows down to a trickle. 88% under 100, and just 92% under 1000.

I honestly expected it to be a bit lower than 92 but I am clearly wrong about this. Here is an interesting quick visualization I made. Blue are primes, whites are composites!

https://imgur.com/a/7EhdI

Edit: you can clearly notice a diagonal shape in the pic above. If you however change the table from 5 cols to any N cols, you will get a distinct shape for each N. It's pretty cool to play with actually. There are a ton of shapes that have been known and used for centuries. I'd highly recommend googling around and reading a bit about them on wikipedia!


The growth rate of a function obviously has nothing to do with the first few values, I don't know why you would assume those were the ones that surprised me, or that I don't know how to calculate them. What I wanted to point out is that this function approaches its limit (100%) very slowly, much slower than 100(1-1/n), or even 100(1-1/log(n)).


> or that I don't know how to calculate them.

apologies, I don't know your background nor do I have an insight as to what goes on in your head. All I wanted to point out is that it's actually not a surprising growth rate. And while yes, you are right about the "speed" of the function. However that is not "unexpected". At least I don't find that to be the case personally!


Nice explanation. Thanks! A bit off topic, but how do people make images that are so tall (in px)?


Np!

I used a chrome plugin called "Full Page Screen Capture". You can use similar extension for FF if you are using that!

The table itself is a long html page! :D


Wouldn't the function sort of "eat itself" if it grew too fast? I can't make this precise, but if it grew fast enough, then most numbers would have many factors accounted for among the smaller numbers already (in a sort of "greedy" manner).


I'm too tired to work out a closed form, but...

f_n = (1 - f_(n-1)) * P_n^(-1) + f_(n-1), where f_n is the fraction divisible by the first n primes and P_n is the nth prime.

So we can see the faster f_n approaches 1, the faster the first term approaches 0, and the faster f_n ~ f_(n-1). Perhaps doubly so, as if f_n approaches 1 faster, then P_n is smaller and it's reciprocal bigger as well. (We could plug in an estimate for P_n if we were converting to a closed form, to work out the asymptotic implications.)


Somehow it reminds me of the inverse Ackermann function.. Especially how \alpha(n) has to do w. the time complexity of union-find - maybe related to disjoint factors somehow?


This post has a little graph that shows the curve you get when you plot how many integers have factors in the first n primes: http://alquerubim.blogspot.com.br/2016/08/cobertura-dos-n-pr...


Do we know if this fonction has a limit different from 100 ? I mean, is there a number 'x'<100 where, for any number 'n' the percentage of all integer having a divisor below 'n' is always < x ?


We can look at this question as asking whether the product of (p-1)/p for all primes p goes to zero. Taking the log, this is the same as asking whether the sum of log((p-1)/p) is unbounded. log((p-1)/p) = log(p-1) - log(p), which is less than -1/p (greater in magnitude).

The sum of the reciprocals of the primes is unbounded (https://en.wikipedia.org/wiki/Divergence_of_the_sum_of_the_r...). So the sum of the logs is less than the negative of the sum of the reciprocals of the primes, it is unbounded as well. Therefore the limit is 100.


You can evaluate this for different limits: 88% of numbers have a divisor below 100, 92% below 1000, 50% below 2, etc.


...at least 33% have factors <=3


50% have 2 as a factor.


2/3, you mean :)


We know the limit is not 100 because of Euclid's proof of the infinitude of primes.


I don't think that tells you anything without comparing to the infinitude of composites too. The fraction of numbers less than n that are prime is roughly 1/logn, which goes to zero in the limit. So I think the limit of numbers with factors less than n as n goes to infinity must indeed be 100%.

That doesn't mean there's an n with 100%, just that there's no smaller limit that some n doesn't exceed.


The primes become more sparse in the integers as you go farther. For most reasonable definitions of the density of composite integers in the integers, the density would be 100%.

Iirc, it is believed that the number of primes less than n is O(n/log(n)), so primes/integers among the first n integers should go as O(1/log(n)), so goes to 0 as n goes to infinity.


Not just believed, pi(n) ~ n / log(n) is the prime number theorem.


I wasn't sure if the proof depended on the Riemann hypothesis or not.

So, I wasn't sure if it had been proven, or just proven given that assumption. So that's why I said that it was believed.


Riemann hypothesis gives better error estimates for the asymptotics in the prime number theorem.


Ah, alright, thanks!


There are infinite perfect squares also, but only 0% of all integers are perfect squares.


Yet there are just as many perfect squares as integers.


it's so strange to see Victor Granville, there is also Andrew Granville and he writes professionally on patterns of the primes.

"Mathematicians Discover Prime Conspiracy" (Quanta Magazine, 2016)

A previously unnoticed property of prime numbers seems to violate a long-standing assumption about how they behave. https://www.quantamagazine.org/20160313-mathematicians-disco...


50% have 2 as a factor! Many more than once. Wow!


Generalize much.


Should be:

88 percent of all integers have a PRIME factor under 100

All integers have a factor under 100, which is 1.


The concept of factorization, properly defined, usually excludes "units", that is, numbers with an inverse. So 1 is a unit, since we can solve

  1x = 1 
by letting x = 1. -1 is a unit too, since (-1)x = 1 is also solvable.

Factorization is then defined "upto units and reordering", so, for instance,

  6 = 2 * 3 = (-2) * (-3) = 3 * 2 = (-3) * (-2)
are all considered the same factorization.

Why might this be useful? Sometimes, number systems ("rings") we care about might have more units, and in order for there to be a sensible theory of factorization[1], we have to take these things into account. Not all rings are as nice as Z[2].

For instance, allow me to go off on a tangent: consider the set of all numbers of the form

  a + ib (a, b integers)
called the Gaussian integers (written Z[i]). One can show that the units here are 1, -1, and also two new ones: i and -i. Studying factorization in this ring -- what numbers can we factorize further now? -- is very interesting. It turns out that not all of our usual prime numbers "remain prime":

   2 = 1  + 1 = 1 -  i^2   = (1 + i)^2
   3 = 3 
   5 = 1  + 4 = 1 - (2i)^2 = (1 + 2i)(1 - 2i)
   7 = 7
  11 = 11
  13 = 4  + 9 = 4 - (3i)^2 = (2 + 3i)(2 - 3i)
  ...
The key to this puzzle is to notice that the primes that "split"[3] all leave a remainder of 1 modulo 4. It is a beautiful fact that the converse also holds: if

  p = 1 (mod 4) 
then it must factor in Z[i], and, further, it always factors as

  p = (a + ib)(a - ib) = a^2 + b^2
(And, of course, if we have p = a^2 + b^2, it can be factored just like we did above.) In other words,

  Theorem[4] (Fermat, sometimes called the "Christmas theorem"). 
  The primes that can be written as a sum of two squares are exactly those congruent to 1 mod 4.
Neither the question (which primes can be written as a sum of two squares) nor the answer (those congruent to 1 mod 4) involve any of the machinery we used in getting from one to the other. :)

PS. Yes, I may or may not have ignored the case of 2 above. Tons of theorems in number theory are stated for "odd primes". 2 is, after all, the oddest prime.

--

[1]: e.g. it turns out that "p is prime", defined as

  p | ab implies p | a or p | b
and "p is irreducible" (i.e. not factorizable in any sensible way), defined as

  p = ab implies either a or b is a unit
are not equivalent! One direction always holds, but the other doesn't come for free.

https://en.wikipedia.org/wiki/Prime_element

[2]: Each step up the ladder of class inclusions adds some nice property that makes life easier, but diminishes the power of the theorems one proves. (Euclidean domains are almost perfect in that sense!) Behold, a bestiary most vile:

https://en.wikipedia.org/wiki/Euclidean_domain

[3]: Notice that 2 factorizes as a power, not as a product of distinct factors. This is technically "ramification", not "splitting" (that term is reserved for the distinct-factors case). The following Wikipedia page treats the entire content of this post far better, and has links to the "real" algebraic number theory pages ;)

https://en.wikipedia.org/wiki/Splitting_of_prime_ideals_in_G...

The numbers that don't factorize at all are said to remain "inert". (And there's something related called an "inertia group" that disappointingly has little to do with mechanics.)

[4]: https://en.wikipedia.org/wiki/Fermat%27s_theorem_on_sums_of_...


What a beautiful response to a completely unnecessary comment! Thanks.


School has forced me to take a break (thankfully coming to an end now) from learning math for the past few months, so throwing walls of text at the gracious HN crowd simultaneously makes for good practice and expository-itch-scratching. You're welcome.


Your factorization of 2 is wrong.

(1 + i)² = 2i

2 = (1 + i)(1 - i)

I don't think that invalidates anything else you have here, much of which is interesting.


The factorization of 2 is fine -- factorizations are equivalent if they're off by a unit, and i is a unit.

By the same token, 1+i and 1-i are essentially the same prime, since you can multiply one by a unit to get the other. That's how we can get away with saying 2 is a power.

edit: Just noticed the sibling comment by mrkgnao, which is a very good overview of some of the technical nitty-gritty that goes into this.


Apologies. I was remembering more than I was thinking: you are right in that the relation doesn't hold at the level of elements[0]. The correct statement is

  (2) = (1+i)^2
where (2) and (1+i) represent the ideals generated by 2 and 1+i respectively. Officially, an ideal of a ring R is a subset I of R which is closed under addition and under and multiplication by elements of R:

  i, j ∈ I implies  i + j ∈ I
  i    ∈ I implies     ri ∈ I for all r ∈ R
For our purposes, a simpler case -- that of principal ideals -- suffices, and that's what I'll discuss in the rest of this comment.

The ideal generated by an element a of a ring R, usually denoted (a) or aR, is essentially the subset of all multiples of a in R. For instance, over the integers, we have ideals like

  (2) = {...,  -4, -2, 0, 2, 4,  ...} 
  (3) = {...,  -6, -3, 0, 3, 6,  ...}
  (4) = {...,  -8, -4, 0, 4, 8,  ...}
  (6) = {..., -12, -6, 0, 6, 12, ...}
One can also define the ideal generated by the elements a,b,c,... of R:

  (a,b,c,...) = {aa' + bb' + cc' + ... : a',b',c' ∈ R}
i.e. the "linear combinations" of a, b, c, and so on.

Here are a few things you might like to verify (i.e. just nod along) about some ideals of Z, to get familiar with the notation:

  (1) = Z (i.e. all the integers)
  (2) = (-2)
  (6) = (2) ∩ (3) 
  15Z =  3Z ∩ 5Z
where ∩ is the intersection of the two sets, or the set of common elements. ("Fact": The intersection of two ideals is also an ideal.)

In general, if you have an ideal of the form (a), then we always have

  (a) = (ua) for all units u
Write G (for Gauss?) for Z[i]. In G, the previous statement means that (2) = (2i). To check this, note that we have

      x ∈ (2) 
  iff x =  2y            for some y
  iff x = (2i)(-yi)
  iff x ∈ (2i)
so all elements of (2) are in (2i), and vice versa.

Now one defines the product of ideals: given ideals I and J in some ring R,

  IJ = ideal generated by the elements {ij : i ∈ I, j ∈ J}
In our case, if I = (a) and J = (b), IJ is just (ab). Question: what's the relation between (ab) and (a) ∩ (b)?

For instance, you can verify that

  (2)(3) = (6)
and, to bring us back to our original point,

  (1+i)(1-i) = (2i) = (2) = (1+i)^2
where all the products are ideal products.

--

There is a very incomplete treatment at [1] (with a bit of the LaTeX broken) that is meant to provide background for a sequence of posts on algebraic geometry -- it's been suspended for a while, but I'm going to look over a couple of drafts this week!

[1]: https://mrkgnao.github.io/prime-ideals/

[0]: As always in math: if you can't prove it, just add a few adjectives or simply change the language. Facts arrived at by alternative definitions, if you will. :)


> 88 percent of all integers have a PRIME factor under 100

In fact, all integers have at least one prime factor under 100, since is_prime(n) implies is_prime(-n).


I'm pretty sure the standard definition of primes permits only natural numbers.


Nope, we usually mean both negative and positive when talking about prime integers. Still, the question on hand is clearly restricted to the positive case, otherwise it would be trivial.

EDIT: better yet, the proper formulation with regard to all integers would be "integers having a prime factor p such that |p| < 100"


Prime elements in a ring (usually an integral domain) are elements which satisfy Euclid's lemma (so p is a prime element if p | a b implies p | a or p | b), so the prime elements in the ring of integers are 2, -2, 3, -3, and so on. But if someone refers to prime numbers, they mean 2, 3, 5, ... - it's the standard definition. In fact, if I meant to include both positive and negative elements in the context of the integers, I would go out of my way to say so to avoid confusion! I think the key is the use of the word "number" as opposed to "element" (say).

I think the terminology is a convenience - so many results pertain to 2, 3, 5, ... that it's easier to let "prime numbers" refer to them, rather than to have to qualify everything by saying "positive primes" or taking absolute values.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: