Hacker News new | past | comments | ask | show | jobs | submit login

Unless you need to multiply large matrices, where you need access to very large rows and columns...like in...ML applications



That's what the absurdly fast interconnect is for. You send the data to where the weights are.


Absurdly fast != Single cycle

It will be physically impossible to access that much memory single cycle at anything approaching reasonable speeds. I suppose you could do it at 5Hz :)


A core receives data over the interconnect. It uses its fast memory and local compute to do its part of the matrix multiplication. It streams the results back out when it's done. The interconnect doesn't give you single-cycle access to the whole memory pool, but it doesn't need to.


I think it is telling that in one sentence there is a claim that it is faster than nvidia, and in another, a claim it does tensor flow. I do not think this architecture could do both of these at once. It could not do tensor flow fast enough (not enough local fast mem) to compete even with a moderate array of GPUs




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: