Hacker News new | past | comments | ask | show | jobs | submit login
My uncle's factorization algorithms (github.com/daoudclarke)
420 points by daoudc on Dec 27, 2011 | hide | past | favorite | 51 comments



This is very interesting. Thank you for devoting your time to putting this on the web. Otherwise it might have rotted in your attic!

It's unclear to me whether there is anything theoretically new in here yet. The algorithms are all familiar to me except for the P^2 + 1 and P^2 + P + 1 algorithms. But don't read too much into that. They are probably well known.

What I can tell you is that there is an extraordinary amount of work in implementing all those algorithms. I do this stuff for a living and it is a tremendously impressive feat for a single individual over any span of time.

I'd be interested in hearing how fast the MPQS is. The state of the art for factoring a number like 840931001586212064794450601167289569811131781103613687750579 on a single core is probably around 6s or so on a modern 2GHz x86 processor.

UBASIC already had a 32 bit x86 assembly optimised MPQS in it, and it performed pretty well actually. So it would be interesting if your uncle improved on that, especially if he had theoretical improvements.

The state of the art for the number field sieve should factor the 79 digit number here: http://www.loria.fr/~zimmerma/records/rsa.html in around 10-20 minutes on a single core, though in general the GNFS is for much larger integers. It seems unlikely that a little UBASIC program could manage those, however.

Having said that, I am still in shock that your uncle actually implemented the GNFS. That is a staggering accomplishment and is something to be genuinely impressed by.

I don't know if you have a gold mine there or the cutting edge life's work of someone from yesteryear.


OK, after some effort I got the first of the MPQS programs to compile. Unfortunately it fails with a runtime error so I can't evaluate it.

The program itself requires two files to exist before it can operate. Then it asks for the input number in base 10000 as a set of digits separated by spaces or returns. This is pretty bizarre by today's standards.

It does succeed in factoring very tiny numbers, but it seems to have bugs which prevent it from working for large numbers.

The double large prime variant of the MPQS seems to allow numbers up to a pretty small bound. This strongly indicates that it is not competitive.

I'll see if I can get anywhere with the double large prime variant version (edit: I tried the second program and it fails with the same runtime error), but given the extremely poor state of the code (no comments, full of gotos, no indication of structure, no documentation, no test suite, buggy, very poor interface, etc.) I would say this code is not going to be particularly interesting to modern day researchers.

At the same time, it is a tremendously amazing accomplishment that one man essentially toiled away in secret and created a massive number theory library like this! When I google his name I find nothing. Was he associated with an institution? Did he publish any papers? I know he didn't use the web, so that explains why he doesn't have a web presence, but it is odd not to find him referred to anywhere. Or am I just looking in the wrong place?


He didn't publish any papers that I'm aware of, and he wasn't associated with an institution. I'm hoping to write a short biography at some time.


Looking through the code I see a lot of references to a group of japanese researchers. See for example this file which references "A.KUDO & Y.KIDA" and is complete with japanese comments. https://github.com/daoudclarke/wartonlegacy/blob/master/src/...

Looks like K.Aoki (Kazumaro Aoki) A.KUDO & Y.KIDA used to post to academic factorization related mailing lists see: http://www.loria.fr/~zimmerma/records/6353


Kida wrote UBASIC I believe.


Thanks for investigating this. I'd be grateful if you could make the changes you made to get it to compile available.


The mpqs code actually compiles unmodified (but with many warnings) if you use f95. When the code is run it complains of a missing file. One can create an empty file of this name to keep it happy. In fact it fills the file with data. Then it complains of a second missing file, however the same trick does not work here. It complains of missing records indicating that it expects this file to contain data. I've no idea what data it should contain or in what format.


I have the files but I haven't uploaded them as they are very big. I imagined one of the other programs would create the files, but I've yet to discover it. I believe these files are precomputations that are meant to speed up the algorithm.


Perhaps they are tables of precomputed inverses of primes to speed up the trial division. There might be up to 100000 entries in the file if this is the case.

I can't think what else would be in the file. I would have guessed the file was generated by the program and would contain relations from a sieving run for the last large factorisation your uncle did. But I also don't see how these files get generated by the code.


If p^2+p+1 is smooth or if p^2+1 is smooth then n can be factored with cyclotomic methods. http://www.ams.org/journals/mcom/1989-52-185/S0025-5718-1989...


You say you do this stuff for a living - Could I pick your brain about your job/experience/education? What is the best way to contact you?


I work in academia as a mathematician. I maintain a library called FLINT (Fast Library for Number Theory).


We are finally coming to the era when the first amateurs who grew up with Personal Computers are starting to die of old age.

Stuff in BASIC coded by somebody in the anonymity of the pre-Internet. Maybe something truly novel in its time or even now. There's something extremely romantic to this, like a rescued roll from the Library of Alexandria.

Naturally, most of that anonymous work will be lost forever. Heck, most of my 8-bit era stuff stored in tapes and 5 1/4 floppies must be dead by now. I had some original Sountracker mod files from Amiga times and when I tried recovering those it was too late.


This reminds me that my father had a printout for the first SGML (standard generalized markup language) parser which is only interesting as that's the parent language for HTML, XML, etc. Apparently he was on the standard committee while working for the navy and wrote a parser to verify that the standard was correct, but it was far to slow for machines of that time period or some such.

http://en.wikipedia.org/wiki/Standard_Generalized_Markup_Lan...

PS: Not that I would be able to recognize what some random FORTRAN code did, but it's probably worth poking around the barn.


  >> first amateurs who grew up with Personal Computers are starting to die of old age
There are few people older than early 50's who had a computer in high school. Although mortality starts to rise in the early 50's, I wouldn't call this dying of old age.


The error is in the phrase "grew up with", as it suggests they were kids in the early days of personal computers.

The adult amateurs who went from programmable calculators to early single-board microcomputers are the generation who are starting to die off.

My Dad is of that cohort. In his 40s he went from programmable calculators to building a simple computer with an octal keypad and a row of LEDs for output, to working on an AIM-65, coding in basic and (I think) machine code.

He worked in elevator maintenance, being in charge of service in much of Connecticut for one company. At one point he designed and built, with circuit boards he etched himself, some kind of electronic box which I think the company adopted? Maybe? For some kind of elevator-related purpose. I was never clear on what it did, and he probably doesn't remember now.

He didn't have an academic background that would have exposed him to computing, having joined the Navy at 15. He later took some radio repair/electronics night school classes.

He's 80 now, and has forgotten most of that stuff, and uses a MacBook and an iPad.


Actually "old age" is never a reason to die, but heart stroke and cancer are most common natural life terminators in our species and they are already likely in your 50's. So yep... they are starting to basically die of old age.


The prime numbers group on yahoo is quite good. You might be able to entice someone to explore what your uncle has done.

http://tech.groups.yahoo.com/group/primenumbers

The group contains experts in this field that manage to do a commendable job of responding to questions at many levels.

It's a shame that your uncle didn't have a chance to connect with them.

Good luck.


Thanks for the suggestion, I will explore this.


>Sadly, he was never able to benefit from this work himself, although he tried many times to sell the software developed here. Instead, he entrusted his work to me as his death neared, and he told me that he wanted it to benefit his family. Since there is little hope of me succeeding in selling this, instead, I hope that by releasing it under the GPL it may benefit mankind. I only ask that if you use it, you use it for good, not evil, to liberate and not oppress, and to educate and not to spread ignorance.

Thank you so much for this.


This feels like the modern day equivalent of the secret to alchemy being found in some musty trunk in the attic. Does any of this actually advance the field?


That's a good question, and I'm not entirely sure of the answer. I'm pretty sure my uncle believed it did, but I'm not sure how in touch he was with the state of the art, as he didn't use the internet. His idea of research was photocopying articles (and whole books, in particular I remember vol 2 of Knuth's TAOCP) from the British library.

My plan is to get some of these algorithms running and compare running times with the other open source ones available.


While perhaps not an advance, the algorithms implemented are state-of-the-art (GNFS, Pollard's p-1 test, Williams' p+1 test, cyclotomic polynomial test). But whatever the running times, it is an astounding artifact.


There is a year or two of work on top of the documentation I have included. Also, I think it is likely that there are some incremental improvements to the existing algorithms that my uncle put in place without necessarily documenting. Whilst these might not be theoretically interesting, their combined effect could be significant. There may also be some new maths in there too, as he was undoubtedly a brilliant mathematician.


Do you have any kind of bio on him? Even a picture?


I'm hoping to get both of these. It needs some collaboration with my family to do it properly.


It's remarkable that we refer to algorithms invented as "late" as the 90's as state-of-the-art. But sadly, it is so.


Even more remarkable is that they build directly upon the 400-year-old method of Fermat (http://en.wikipedia.org/wiki/Fermat%27s_factorization_method).


Well, directly is a bit of a stretch. It's probably more fair to say they're directly based off the Powers-Lehmer-Morrison-Brilhart (also Kraitchik, for the discrete log) method, which introduced the concept of factor bases and combining relations to find a difference of squares.


The officially published ones. It remains an open question what the NSA knows.

I was going to back up the idea that the NSA might potentially be ahead of the public community of number theorists and cryptographers by citing the relative sizes of the two groups. Unfortunately, those numbers are classified. :(


The NSA recently stated that they are only a couple of years ahead. And by that I take it they mean their computers are bigger. Recall that they are a military outfit. Only a small part of what they do is crypto related. Despite them having 40,000 employees or so, one shouldn't read that as meaning they have 40,000 people working on breaking RSA. They do many other things.


Did you find any copies of a data input file for (at least some of) the Fortran programs, called recl.dat?

EDIT: I am so intrigued by this that I asked the question above without saying thank you for releasing this. So thank you! I don't know much Fortran but if ever there was something to induce me to dive in, it's this. I've compiled his test program using gfortran and am looking into some of the others.


Yes, it seems to contain a lot of prime numbers (36,500 to be precise). I haven't uploaded any data files since the total size is 3Gb.


Can you make a torrent of the data files and put the .torrent file on GitHub?


I've collected what I think are the most important files. They compress very well because they seem to be very inefficiently stored (lots of repetition). The compressed collection is now on github.


I would love to have an uncle whom I could talk about cryptography with. You were very lucky!


To be honest, most of the time I didn't have a clue what he was talking about, and I was by far the most mathematical of the family! He wasn't a great teacher, but he was certainly an inspiration. And apart from all this, he was a great uncle, and great company.


...and somewhere out there, right now, P(=|!=)NP is probably solved on some 5 1/2" floppy, rotting away in an attic...


I doubt it.

EDIT (after down vote): I doubt it because back in the day this problem was not as widely recognized as it is today, and no expert in the field today (that I am aware of) thinks we are even close.


Daoud, did you add this In the name of God, the Merciful, the Compassionate. or did your uncle?


I added this - my uncle was not religious.


I'm not religious either... and I wouldn't like anyone adding that kind of stuff to my work, but I guess you meant to do good.


Sad that someone would mod you down for having religious convictions.

Thank you for giving a portion of your inheritance to the world.


Sad that someone already forked it for the sole purpose of removing it https://github.com/oscardelben/wartonlegacy/commit/bc528d4d7...

Pathetic "contribution"


So you start it with something religious, which would probably go against your uncle's wishes if he wasn't religious, then in the first paragraph state you hope people use it "to educate and not to spread ignorance".

I find the contradiction amusing.


I was pleasantly surprise to see it. It would be interesting to see if you title all your work with it. Very nice, indeed.


My fear is that this contains some sort of mathematical break through which is then copied and claimed by another person as their own work. Didn't something like this happen to Grigori Perelman where another man attempted to claim Perelman's work? This is really amazing, I hope something comes from it.



I can't be of much help, but I'm extremely interested in how this all unfolds. Good luck.


Language files blank comment code Fortran 90 224 50999 8235 315981 ubasic 248 ? ? 107383


This is brilliant. Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: