More

jre · on Dec 17, 2017

Thanks.

You're right about Merkle tree. This is a whole section of the bitcoin paper and it's pretty important. But as far as I understand, it's "only" an optimization to save disk space, so it doesn't change the underlying logic.

jre · on Dec 17, 2017

OP here.

Swapping MD5 for SHA256 is very easy. I'll actually do it - see my other answer above for why MD5.

For the other differences to bitcoin and from the top of my head :

- In my implementation, wallet addresses are the public key of the owner. Bitcoin addresses are slightly more complicated [1] and a wallet can (and should) generate a new address for each transaction.

- Bitcoin uses ECDSA instead of RSA

- Bitcoin transactions use a (simpler than ethereum but still) scripting language [2].

- The whole communication part was left out : you need a way to broadcast blocks. I haven't looked into that

- Bitcoin uses a Merkle tree to store transactions (and prune spent ones).

I think the scripting and communication would be the two biggest tasks. But it would also require unit testing and obviously wouldn't fit in a single notebook.

[1] https://en.bitcoin.it/wiki/Technical_background_of_version_1...

[2] https://en.bitcoin.it/wiki/Script

ecesena · on Dec 17, 2017

Bitcoin also hashes twice, i.e. it computes sha256(sha256(.)). Supposedly, this is to protect against extension attacks [1].

Was wondering, any specific reason to choose RSA vs ECDSA? Signatures would be smaller.

[1] https://en.wikipedia.org/wiki/Length_extension_attack

jre · on Dec 17, 2017

No good reason for RSA vs ECDSA. I was just more familiar with RSA, but apparently pycryptodome supports ECDSA as well, so I guess the change should be minimal.

jre · on Dec 17, 2017

Thanks ! The awesome-blockchains is a great resource, thanks for sharing.

jre · on Dec 17, 2017

Thanks ! I've completely left out communication from this because it wouldn't fit in the notebook and I haven't researched it. Would also appreciate if anybody has good resources on it.

jre · on Dec 17, 2017

OP here.

erikb is spot on in the sibling comment. This hasn't been expert-reviewed, hasn't been audited so I'm pretty confident there is a bug somewhere that I don't know about.

It's educational in the sense that I tried as best a I could to implement the various algorithmic parts (mining, validating blocks & transactions, etc...).

I originally used MD5 because I thought I would do more exploration regarding difficulty and MD5 is faster to compute than SHA. In the end, I didn't do that exploration, so I could easily replace MD5 with SHA. I'll update the notebook to use SHA, but I'm still not gonna remove the warning :)

I'll also try to point out more explicitly which parts I think are not secure.

magnat · on Dec 17, 2017

> I'll also try to point out more explicitly which parts I think are not secure.

Things I've noticed:

* Use of floating point arithmetic.

* Non-reproducible serialization in verify_transaction can produce slightly different, but equivalent JSON, which leads to rejecting transactions if produced JSON is platform-dependent (e.g. CRLFs, spaces vs tabs).

* Miners can perform DoS by creating a pair of blocks referencing each other (recursive call in verify_block is made before any sanity checks or hash checks, so they can modify block's ancestor without worrying about changing its hash).

* mine method can loop forever due to integer overflow.

* Miners can put in block a transaction with output sum greater than input sum - only place where it is checked is in compute_fee and no path from verify_block leads there.

jre · on Dec 17, 2017

Those are all very good points I didn't think about, thanks for these.

I'll fix the two bugs with verify_block and the possibility for a miner to inject invalid a output > input transaction.

I'll add a note for the 3 others.

westurner · on Dec 17, 2017

For deterministic serialization (~canonicalization), you can use sort_keys=True or serialize OrderedDicts. For deseialization, you'd need object_pairs_hook=collections.OrderedDict.

Most current blockchains sign a binary representation with fixed length fields. In terms of JSON, JSON-LD is for graphs and it can be canonicalized. Blockcerts and Chainpoint are JSON-LD specs:

> Blockcerts uses the Verifiable Claims MerkleProof2017 signature format, which is based on Chainpoint 2.0.

https://github.com/blockchain-certificates/cert-verifier-js/...

Cyph0n · on Dec 17, 2017

FYI, dicts are now ordered by default as of Python 3.6.

cpburns2009 · on Dec 18, 2017

That's an implementation detail, and shouldn't be relied upon. If you want an ordered dictionary, you should use collections.OrderedDict.

westurner · on Dec 18, 2017

It's now the spec for 3.6+.

> #python news: @gvanrossum just pronounced that dicts are now guaranteed to retain insertion order. This is the end of a long journey.

https://twitter.com/raymondh/status/941709626545864704

More here: https://www.reddit.com/r/Python/comments/7jyluw/dict_knownor...

OrderedDicts are backwards-compatible and are guaranteed to maintain order after deletion.

Cyph0n · on Dec 18, 2017

True, but if you need both key ordering and performance, dict is the better option.

tfha · on Dec 17, 2017

For what it's worth, MD5 is perfectly safe for use as a proof of work algorithm. You just don't want to use it for authentication / data integrity

jre · on Dec 16, 2017

You forgot to mention that nodes always consider the longest block chain to be the consensus. A 51% (majority) attack is when one miner is able to produce more blocks than the rest of the network, therefore controlling consensus.

Such a miner could do a double-spend transaction by first spending on the short chain and then reverting his transaction on the longest chain. See https://en.bitcoin.it/wiki/Majority_attack

jre · on March 2, 2017

I don't know about the others, but the two visions dataset they compare to (MNIST and the face recognition one) are small datasets and the CNN they compare to doesn't seem very state of the art.

It also seems each layer of random forest just concatenates a class distribution to the original feature vector. So this doesn't seem to get the same "hierarchy of features" benefit that you get in large-scale CNN and DNN.

jkbschwarz · on March 2, 2017

To your point that they are comparing small datasets. I dont see that as a problem. If they achieve better results on small datasets that is a great achievement, as often the bottleneck is the size of the dataset rather than computation time.

krona · on March 2, 2017

> often the bottleneck is the size of the dataset rather than computation time

That's generally true for DNNs, which is a good place to be if you have lots of data. This typically isn't true for tree based approaches, which is why they fell out of fashion in some problem domains; they don't generalize as well. This paper doesn't seem to change what we already know in this respect.

argonaut · on March 2, 2017

They achieve substantially worse results than state of the art on (the toy dataset) MNIST.

aerioux · on March 2, 2017

^ The authors time and effort they observe it takes to create state-of-the-art CNNs, but their point-of-comparison CNNs look to be fairly simple -- I don't see an AlexNet or something for some of these tasks either just as a point of comparison even if not a fully relevant one

curuinor · on March 2, 2017

Comparable to SOTA circa 1998.

jre · on Feb 1, 2017

I built a small plugin to do that, it's called vim cell mode and provide "send selection to ipython" and "send block to ipython". I find the block mode very useful when developing data analysis scripts.

https://github.com/julienr/vim-cellmode

jre · on June 8, 2016

What company are you working at ? Looks like you do some interesting geospatial/map rendering stuff. Are you hiring ? :-)

jre · on June 1, 2016

Location: Lausanne, Switzerland

Remote: Yes

Willing to relocate: No

Technologies:

• Scientific Python : numpy, scipy, theano, tensorflow, keras (deep learning), vispy, mayavi, osgeo, gdal

• General Python : django, google appengine, flask, PyQT, PyOpenGL, cython, boost.python

• C/C++: OpenCV, Point Cloud Library (PCL), Eigen, OpenGL, Ceres

• Android : Sensors API (GPS, Accelerometer), Camera API, OpenCV4Android, NDK, OpenGL, Project Tango

• Big Data: Hadoop, HDFS, Spark

• Others : HTML/Javascript/CSS, d3js, PHP, Kinect SDK

Résumé/CV: http://fhtagn.net/files/resume.pdf

Email: julien.rebetez@gmail.com

Github: https://github.com/julienr

Website: http://fhtagn.net/

I am looking for Machine Learning and/or Computer Vision opportunities. In my day job, I currently work on a python machine learning platform for satellite images analysis. On the side, I have been hacking computer vision stuff using Google's Project Tango devkit.