I recently wanted to learn how to use tf.js, so had a couple of mini-projects. The first is a minimal working example showing how to train a model in python and then load it in javascript.
I have a hard time sorting through the ml logic vs the game logic, it would really help my learning (and others too) if the game logic was decoupled from the ml logic into 2 separate files.
As a non-ML practitioner who has an interest in the field, I have a question I was hoping someone could answer.
Looking at the page this links to, it gives pretrained models for "common use cases", among other things. Can you actually do anything interesting or novel with those? I feel like any meaningful new use of ML will not be possible with such "beginner building blocks" and you would just wind up with a weaker version of existing offerings. Could someone give me any examples of meaningful ways this can be used?
You are correct that all commoditized AI currently cannot do anything completely novel.
They ship a network to convert landscape photos to landscape paintings, for example. You can retrain that with limited effort and knowledge by using their Collab notebook. That way, you can make an AI to convert photos of cats to photos of dogs. But you wouldn't be able to create an AI to convert between cats and rabbits, for example, because those are too different for that AI architecture to tolerate.
Similarly, they ship a classifier to detect objects of some categories based on surface texture. You can retrain that to detect different categories of things. But if someone paints a chair to look like elephant skin, the AI will be confused and short of inventing a new architecture yourself, there's nothing you can do to fix it.
Plus in any case, you will need to have large high quality datasets for your retraining. The biggest cost in building a bird classification app isn't the one ML employee that creates the final AI, it's hiring biologists to correctly tag thousands of photos to train the AI on.
That's why many AI research nowadays tries to offload the data generation to random people on the internet. For example, recaptcha, or crawling Flickr for that faces dataset.
Edit: since it was brought up in the comments, the retraining that I mentioned can be done only on parts of the model, which is then called transfer learning and can save you some computing time.
But, you'll need to determine if the first feature layers of the original model are suitable for your new classification task. Even experts sometimes guess wrongly here, so as a beginner, I'd say you can only pray and hope.
Both of them massively reduce the complexity by working on position-aligned images of purely the head. I would estimate that even a picture of a cat from the side will already be too much for this network to handle.
These kind of networks learn their feature space mapping by treating the input data set as continuous. So for them to learn a good mapping from cat to dog, it would need to also see photos of an animal that is half cat half dog. If I had to train this case for work, I guess I'd try to go through baby pictures. Dog -> baby dog -> baby cat -> cat. That might work if baby cats and baby dogs look similar enough.
I don't think pretrained models are "beginner building blocks". They provide a kind of "base model" that can be fine tuned to your specific need. The advantage of not training the full model means you can save many resources (in design time and computation).
There are many boring but meaningful tasks in which this can be used. For example, I'm sure many industries could be benefit from image classification for very specific cases (e.g., fruit categorization). In those cases, you are not interested in the classification of "general" objects such as car, person, bike or horse (as provided by MobiletNet pretrained model), but you can use that model as base to classify different categories of fruit.
Therefore, you are right that they might not be very useful to build new ML algorithms or network architectures. But they are useful to build specific (and novel) uses of current neural networks.
Re-training! The generalised models are... Generalised. They have learned to pick up details from a vast, varied data set.
You can reuse that, and fine tune to the model to your specific data set, which could save you days worth of computation, and also decreases the data set size requirement on your end.
And some years back I use posenet in node.js, just for its head tracking, while exploring 3D UIs. Because it was fairly robust, at least as long as an eye was visible. Even now, it degrades more gracefully with heads turned away than some face nets. Though its positions are a bit noisy, which hurt multi-camera-parallax 3D pose estimation.
There is a difference between consuming models and training them yourself -- I could be wrong but with tensorflow.js I think it is much more geared toward consuming models in the browser, rather than training novel ones yourself. Seems like you can customize an existing model via transfer learning, which may or may not be good enough for your usecase, but even then I feel like basically the training side of it is second rate. But you could potentially make _predictions_ with a really sophisticated/cutting edge model in JS -- its not the prediction part that takes enormous compute, it is the training part
You can also build and train models in js. This is one of the features I'm looking at. Currently there is a lot of information websites are throwing away or they're using centralized aggregation mechanisms to provide personalization but Tensorflow.js changes that dynamic.
I think small AI models that run completely in the browser and provide personalization by learning from how the user interacts with a given website are the future. This empowers the user and puts them in charge of how their data is used. The example I mentioned previously to demonstrate this was about ranking new submissions on HN: https://news.ycombinator.com/item?id=23407549. I'll quote the relevant part
> TensorFlow.js is a pretty nifty piece of software and it's underutilized. If the model parameters can be stored in IndexedDB then users could train TensorFlow.js based site augmentation to suit their own needs.
For example, what if HN had a TensorFlow.js model for ranking new submissions based on the user's preferences? This model could be trained like a spam filter and would eventually learn the types of articles that someone likes to see but they would be in charge of the model's evolution and so would be empowered to use it however they saw fit. Maybe I don't care about politics then my model parameters would eventually converge on downgrading all political posts and the more technical submissions would rise to the top based on how I upvoted new and front page submissions.
This is a cool idea, but requires some hooks. Otherwise, you also need api access and remake the UI. I guess you could intercept API calls with a browser extension, but having a hook for you personalization so would be great!
Cool idea, but why not have that client side model operate across websites -- if I switch from hacker news to reddit why not carry over that data to figure out what I will like on the separate platform?
Excellent idea. Yes, you could do that as well. Nothing would prevent copying the parameters and models for one site and using at another one since at the end of the day it would all just be data controlled by the user.
One can even imagine a decentralized sharing mechanism where users can create ensembles of models by combining models trained by different users.
All the building blocks are there. Just requires mindshare and a few killer applications.
I'm sure that many people would not class this as interesting or novel, but it demonstrates that at least some of the building blocks can actually be used to make actual things.
There is a process called transfer learning where you take a model that has been trained on a large data set and then modify the last few layers of the neural network by training them on your specific/smaller data set. Not sure if there are any good resources for beginners but I learned about the technique from fast.ai videos.
It depends on your definition of "novel." I've seen some of these base models for tfjs used in several neat hackathon projects ranging from music generating web apps to "hot dog, not a hot dog." I've seen tfjs run simple RL game agents in-browser, too. The models are re-trained, tweaked, or slightly augmented - but not too much, since you only have 24 hours.
Any example of this being used in the wild? I feel like it could save with backend costs (especially for hobby projects) but the bandwidth/loading time might make it infeasible?
Also if you have any novel model it will be trivial to reverse engineer it. You gotta send the weights over to the client and they can just run tensorflow.js themselves right?
I used it in the wild for some week-end project [1][2]. To get around the bandwidth issue, I used it as a browser extension. This means that weights are downloaded and installed only once during the setup. But I also used it in the past in a website for real-time audio processing when model are small enough [3].
Novel models will probably need to have some custom layers, which are quite painful to write. The weights for the web version will probably be a low quality of what you achieve on a more powerful machine. And you don't need to provide the code for the training, nor the dataset used, so you will still have an edge over people copying you.
The main problem I have with tensorflow.js is that it's not production ready yet. It's probably 10 years ahead of it's time. Most people don't have recent GPU or recent video drivers. As it's doing some tricks with the GPU to have some acceleration, a fraction of your clients will encounter random bugs and crashes. I even got some bad review for causing reboots on Firefox :). For some people it will be so slow it will feel unresponsive, and others will find that it's unimpressive because you would have to use the minimal model. In a day and age where you need 99% satisfaction to exist, picking tensorflow.js is a mistake.
I was playing with tf.js last year and there were some core components still being converted to WASM, which gave me some big performance boosts once I switched. I had some problems with WebGL-related bugs too. It seems that they're currently in the process of writing WebGPU-based back-ends. I'm hoping that when WebGPU is on-by-default in most browsers, and there's been a year or two to sort out all the major teething problems, we'll see some wide-spread uptake.
Internet speeds are still increasing exponentially, so hopefully the model size problems will become less of an issue - perhaps aided by some CDN-served (and thus cached across domains) "base" models that are fine-tuned with some parameters downloaded from the server. I think I could start playing with it seriously if I could get the model sizes under 30mb or so. In a few years (with increasing internet speeds) that might be 50mb. I think huggingface's distilled GPT-2 model is a couple of hundred megabytes, for reference, so we're certainly not going to be doing anything revolutionary in the browser, but I have a bunch of neat little ideas that I think would be useful.
For bringing the "big" stuff to the web we're probably stuck with APIs, like OpenAI's new GPT-3 offering. I think access to SOTA models on hard problems is going to be almost exclusively via APIs for the foreseeable future.
Yes, the idea is that the model is local to the user and runs in their browser (pushing the computation to the edge). Here's a project demonstrating pose estimation from Twilio's engineering blog: https://www.twilio.com/blog/pose-detection-video-tensorflow-....
Re: bandwidth. Do you mean the weights? If so then yes, you'd have to think about distilling large models to smaller ones. There are techniques for doing this. Here is an explanation of distillation from Floyd Hub's engineering blog: https://blog.floydhub.com/knowledge-distillation/.
Super noob question,
I have a awesome use for image classification on my job, can I make it on Teachable Machine export as TensorFlow Js and make a simple page/app with simplistic UI just to drop/choose a image for the model to compare.
What would be a roadmap or path to do this ? (I started programming literally yesterday With the intention to make what I wrote real)
I might have misunderstood your comment, but if you've just started learning to code, I'd probably start with simpler tutorials and projects - just to get a handle on the basics. Depending on your background it could take a good 6 months of daily practice to get comfortable with the basics. It took me and my friends at least that long - similar time frame to learning a spoken language like Japanese, say, to a basic level. Learning to code is absolutely worth it though - 6 months is a very cheap price in my opinion. I'd probably start with a tutorial like this one: https://www.youtube.com/watch?v=yPWkPOfnGsw Or the Khan Academy computer science course.
Here’s something I made awhile back that uses it to generate Bob Ross drawings live from your sketches. (Never wrapped it up into an app but maybe I should):
I'm trying to tackle the topic of ML / AI for some time but I don't want to just use any existing tool and go with the flow - what I would like to do, is to understand how everything works, including underlying math.
Just to give you an example, I would like to build my own image recogniton tool from scratch and understand every part of it.
Where can I start? Any book recomendations would be greatly appreciated.
For an image classification project I wanted to complete, I had to do a bunch of matrix/tensor manipulation, and I thought it would be interesting to do it all client side in JavaScript. Things like changing the color space, taking slices/patches of images, etc. I'd be happy to share some of that if people are truly interested!
I will have to purchase a new PC soon. Provided I want to play around with Tensorflow.js, is there anything that I shoudl avoid o urgently include in the spec? Any specific graphics card or processor?
If you can wait, wait as the other commenter stated. However, if you're just interested in playing with machine learning, you don't need to get the latest and greatest tech! Even the raspberry pi can do some machine learning experiments or inferences.
I'd wait for the RTX 3000 series if possible, which would be around September or October. Seems like it will have better ML performance, and more VRAM, too.
If you just want to play with Tensorflow.js in browsers then any video card will do. If you want to use Node then I think you can get accelerated performance using a NVidia video card.
If you want to train your own models then you'll need to use Python (or in theory Swift or something..) and you want an NVidia video card.
I played around with tf.js a few weeks ago and learning to use python-trained models was the first hurdle. I made a minimal working example of this here, feel free to check it out:
See model/mnist_js.ipynb to see how the model is trained and exported to a js readable format, and you can see in lines 13-15 in index.html how the model is loaded.
Note: I have a decent amount of ML experience but almost no javascript experience, so YMMV
How is an ad defined? In the case of mining i.e. they probably don't explicitly define it as an ad, no? Also I guess using clients' computing power for AI is as valuable as mining Monero.
https://github.com/paruby/mnist
The second is the classic game of snake, controlled by pointing your head in the direction you want to steer the snake.
https://github.com/paruby/snake-face/
This uses the MediaPipe Facemesh model with the device camera to work out which direction your head is pointing in.