this is so far from accurate it should be considered libelous; from the link
> PyTorch/XLA is set to migrate to the open source OpenXLA
so PyTorch on the XLA backend is set to migrate to use OpenXLA instead of XLA. but basically everyone moved from XLA to OpenXLA because there is no more OSS XLA. so that's it. in general, PyTorch has several backends, including plenty of homegrown CUDA and CPU kernels. in fact the majority of your PyTorch code runs through PyTorch's own kernels.
Robotic perception is the one relevant to me. You want to do object recognition on an industrial x86 or Jetson-type machine, without having to use Ubuntu or whatever the one "blessed" underlay system is (either natively or implicitly because you pulled a container based on it).
1. use underpowered devices to perform sophisticated tasks
2. using code/tools that operate at extremely high levels of "abstraction"
don't be surprised when all the inherent complexity is tamed using just more layers of "abstraction". if that becomes a problem for your cost/power/space budget then reconsider choice 1 or choice 2.
Not sure this is worth an argument over semantics, but modern "embedded" development is a lot bigger than just microcontrollers and wearables. IMO as soon as you're deploying a computer into any kind of "appliance", or you're offline for periods of time, or you're running on batteries or your primary network connection is wireless... then yeah, you're starting to hit the requirements associated with embedded and need to seek established solutions for them, including using distros which account for those requirements.
> IMO as soon as you're deploying a computer into any kind of "appliance", or you're offline for periods of time, or you're running on batteries or your primary network connection is wireless
yes and in those instances you do not reach for pytorch/tensorflow on top of ubuntu on top of x86 with a discrete gpu and 32gb of ram. instead you reach for C and micro or some arm soc that supports baremetal or at most rtos. that's embedded dev.
so i'll repeat myself: if you want to run extremely high-level code then don't be "surprised pikachu" when your underpowered platform, that you chose due to concrete, tight budgets doesn't work out.
The hardware can be fast, actually. Here’s an example of relatively modern industrial x86: https://www.onlogic.com/ml100g-41/ That thing is probably faster than half of currently sold laptops.
However, containers or Ubuntu Linux don’t perform great in that environment. Ubuntu is for desktops, containers are for cloud data centers. An offline stand-alone device is different. BTW, end users don’t typically aware that thing is a computer at all.
Personally, I usually pick Alpine or Debian Linux for similar use cases, bare metal i.e. without any containers.
That is the moat they tried to cross. Imagine you have a PyTorch app and run on iOS, arm based, amd based and intel … cloud, or embedded. just imagine. You scale and embed as your business case, not as any one firm current strategy is.
Or at least you have some case as heaven never come. Or come just we do not aware now like internet. Can you need to use ibm to rub sna to provide a token ring based network. In 1980 …
Not that I want to encourage gatekeeping in the first place, but you'll have more success if you have a clue what the other person is talking about in the first place (and some idea of what embedded looks like outside of tiny micros, and how the concerns about abstractions extend beyond matters of how much computational power is available).
Clearly you've never used a Nvidia Jetson and have no idea what it is. You don't need a discrete GPU, it has a quite sophisticated GPU in the SoC. It's Nvidia's embedded platform for ML/AI.
Anyway I knew this thing was gonna tank - it's been in development for years with high turnover on the team (with most of the real work being done by contractors).
> AMD's software roadmap for AI/datacentre leans heavily on Vitis (for software) and AI Engines (as an execution platform).
This is incorrect along all 3 dimensions:
1. AMD has its own data-center class GPUs - I don't know how good they are because I don't work on them
2. Vitis is just a brand and will be taken out of the equation before the end of the year.
3. I don't know what execution platform means because AI Engine is one core in a grid of such cores on the chiplets that are on the Phoenix platform (shipped with new Ryzens) and the VCK boards.
> It's Xilinx technology, but you should expect it to look more like a GPU accelerator than a traditional LUTs-and-routing FPGA.
It is correct that there are no LUTs in the fabric but there are "switchboxes" for data traffic (between cores) and you do have do the routing yourself (or rely on the compiler).
I'm not trying to be snarky but have you considered reading the code? Like I'll be honest I can't remember the last time I looked at docs at all instead of reading the code itself.
my guy what exactly are you expecting here? this is free as in beer code (apache license). no one is forcing you to use this and no one is asking anything of you for using it. i fully support people releasing their code (that took enormous amounts of blood sweat tears to get working) absolutely however they want to. if i'm interestd enough i'll figure it out and thank them.
so as i see it you have like three options if you are unhappy with that:
1. close the tab
2. dig into the impl and learn as you go
3. do 2 but also write docs
i just really believe i've covered literally all the cases that any reasonable (not whiney, not entitled) person would concede.
> the first issue in the repo is a "Request for a more verbose README", which I agree with.
posted today - do you think it might have something to do with this post we find ourselves convening on? i.e. no one was so bothered about a lack of docs until now?
edit:
i forgot actually something else you could do: email the author and ask nicely for some tips.
I don't see affordances for operating at multiple levels of abstraction. The single example of another level is ccall to an LLVM instrinsic - that's not any different from inline assembly in basically any other compiled language. Supporting multiple levels would mean you can do all (or most) the same things with LLVM IR that you can do with Julia itself.
Do you have literally any proof of this?