More

mathisfun123 · on Oct 12, 2023

> If you are DEI candidate and have low SAT/ACT, you are steered to not provide the score

Do you have literally any proof of this?

mathisfun123 · on Oct 6, 2023

> Pytorch is just using Google's OpenXLA now

this is so far from accurate it should be considered libelous; from the link

> PyTorch/XLA is set to migrate to the open source OpenXLA

so PyTorch on the XLA backend is set to migrate to use OpenXLA instead of XLA. but basically everyone moved from XLA to OpenXLA because there is no more OSS XLA. so that's it. in general, PyTorch has several backends, including plenty of homegrown CUDA and CPU kernels. in fact the majority of your PyTorch code runs through PyTorch's own kernels.

mathisfun123 · on Oct 6, 2023

> especially once you're in embedded

is this a real problem? exactly which embedded platform has a device that ROCm supports?

mikepurvis · on Oct 6, 2023

Robotic perception is the one relevant to me. You want to do object recognition on an industrial x86 or Jetson-type machine, without having to use Ubuntu or whatever the one "blessed" underlay system is (either natively or implicitly because you pulled a container based on it).

mathisfun123 · on Oct 6, 2023

>industrial x86 or Jetson-type machine

that's not embedded dev. if you

1. use underpowered devices to perform sophisticated tasks

2. using code/tools that operate at extremely high levels of "abstraction"

don't be surprised when all the inherent complexity is tamed using just more layers of "abstraction". if that becomes a problem for your cost/power/space budget then reconsider choice 1 or choice 2.

mikepurvis · on Oct 6, 2023

Not sure this is worth an argument over semantics, but modern "embedded" development is a lot bigger than just microcontrollers and wearables. IMO as soon as you're deploying a computer into any kind of "appliance", or you're offline for periods of time, or you're running on batteries or your primary network connection is wireless... then yeah, you're starting to hit the requirements associated with embedded and need to seek established solutions for them, including using distros which account for those requirements.

serf · on Oct 6, 2023

fwiw CompTIA classifies an embedded engineer/developer as " those who develop an optimized code for specific hardware platforms."

mathisfun123 · on Oct 6, 2023

> IMO as soon as you're deploying a computer into any kind of "appliance", or you're offline for periods of time, or you're running on batteries or your primary network connection is wireless

yes and in those instances you do not reach for pytorch/tensorflow on top of ubuntu on top of x86 with a discrete gpu and 32gb of ram. instead you reach for C and micro or some arm soc that supports baremetal or at most rtos. that's embedded dev.

so i'll repeat myself: if you want to run extremely high-level code then don't be "surprised pikachu" when your underpowered platform, that you chose due to concrete, tight budgets doesn't work out.

Const-me · on Oct 6, 2023

The hardware can be fast, actually. Here’s an example of relatively modern industrial x86: https://www.onlogic.com/ml100g-41/ That thing is probably faster than half of currently sold laptops.

However, containers or Ubuntu Linux don’t perform great in that environment. Ubuntu is for desktops, containers are for cloud data centers. An offline stand-alone device is different. BTW, end users don’t typically aware that thing is a computer at all.

Personally, I usually pick Alpine or Debian Linux for similar use cases, bare metal i.e. without any containers.

cannonpalms · on Oct 7, 2023

> Ubuntu is for desktops

Tell that to their (much larger, more profitable, and better-funded) server org. This is far from true.

iopq · on Oct 7, 2023

It also works much better as a server. Snaps work really well for things like certbot

On Desktop you have to worry about things like... UIs, sound, Wine, etc.

ngcc_hk · on Oct 7, 2023

That is the moat they tried to cross. Imagine you have a PyTorch app and run on iOS, arm based, amd based and intel … cloud, or embedded. just imagine. You scale and embed as your business case, not as any one firm current strategy is.

Or at least you have some case as heaven never come. Or come just we do not aware now like internet. Can you need to use ibm to rub sna to provide a token ring based network. In 1980 …

Imagine and let us or they competite …

rcxdude · on Oct 7, 2023

Not that I want to encourage gatekeeping in the first place, but you'll have more success if you have a clue what the other person is talking about in the first place (and some idea of what embedded looks like outside of tiny micros, and how the concerns about abstractions extend beyond matters of how much computational power is available).

nightski · on Oct 6, 2023

Clearly you've never used a Nvidia Jetson and have no idea what it is. You don't need a discrete GPU, it has a quite sophisticated GPU in the SoC. It's Nvidia's embedded platform for ML/AI.

mathisfun123 · on Oct 6, 2023

this isn't going to be a product, this is going to be for internal use only...

mathisfun123 · on Oct 5, 2023

That's hilarious they literally just gave a talk about it at the Triton dev meeting last week https://youtu.be/ZtiiMHSWhw8?si=Lh45JmaXz_kZ2gmB

Anyway I knew this thing was gonna tank - it's been in development for years with high turnover on the team (with most of the real work being done by contractors).

mathisfun123 · on Oct 5, 2023

> AMD's software roadmap for AI/datacentre leans heavily on Vitis (for software) and AI Engines (as an execution platform).

This is incorrect along all 3 dimensions:

1. AMD has its own data-center class GPUs - I don't know how good they are because I don't work on them

2. Vitis is just a brand and will be taken out of the equation before the end of the year.

3. I don't know what execution platform means because AI Engine is one core in a grid of such cores on the chiplets that are on the Phoenix platform (shipped with new Ryzens) and the VCK boards.

> It's Xilinx technology, but you should expect it to look more like a GPU accelerator than a traditional LUTs-and-routing FPGA.

It is correct that there are no LUTs in the fabric but there are "switchboxes" for data traffic (between cores) and you do have do the routing yourself (or rely on the compiler).

mathisfun123 · on Oct 3, 2023

I'm not trying to be snarky but have you considered reading the code? Like I'll be honest I can't remember the last time I looked at docs at all instead of reading the code itself.

n3150n · on Oct 3, 2023

Are you for real? I'm also not trying to be snarky but...

8360 unique lines scattered across more than 100 files. Good luck deciphering that in a single day!

By the way, the first issue in the repo is a "Request for a more verbose README", which I agree with.

mathisfun123 · on Oct 4, 2023

my guy what exactly are you expecting here? this is free as in beer code (apache license). no one is forcing you to use this and no one is asking anything of you for using it. i fully support people releasing their code (that took enormous amounts of blood sweat tears to get working) absolutely however they want to. if i'm interestd enough i'll figure it out and thank them.

so as i see it you have like three options if you are unhappy with that:

1. close the tab

2. dig into the impl and learn as you go

3. do 2 but also write docs

i just really believe i've covered literally all the cases that any reasonable (not whiney, not entitled) person would concede.

> the first issue in the repo is a "Request for a more verbose README", which I agree with.

posted today - do you think it might have something to do with this post we find ourselves convening on? i.e. no one was so bothered about a lack of docs until now?

edit:

i forgot actually something else you could do: email the author and ask nicely for some tips.

jshobrook · on Oct 4, 2023

You could try using something like Adrenaline https://www.useadrenaline.com/ I built it exactly for this use case :)

mathisfun123 · on Oct 3, 2023

I don't see affordances for operating at multiple levels of abstraction. The single example of another level is ccall to an LLVM instrinsic - that's not any different from inline assembly in basically any other compiled language. Supporting multiple levels would mean you can do all (or most) the same things with LLVM IR that you can do with Julia itself.

mathisfun123 · on Oct 2, 2023

mathisfun123 · on Oct 2, 2023

just replace `f"{` -> `(` and `}"` -> `)` and it's conventional expression eval.