Hey guys - Yangqing here, worked on Caffe and Caffe2 over the years, super excited to have continuous contribution to the OSS community. I am more than happy to answer questions if you are curious.
How difficult is it to set up custom recurrent networks? The only system I've ever seen that handles this well is CNTK - you can just say `x = PreviousValue(y)`.
I've tried to work out how to do similar things in Torch and Tensorflow but all they really offer is pre-packaged layers like LSTM. If you want to make your own it's difficult, undocumented and not at all ergonomic.
I am a complete machine learning noob, so this could just be my lack of skill. Having looked at most of the popular ML frameworks, none of them seem to provide an easy, functional way to implement a custom 'neuron'. That is:
function (someinput):
output = do_stuff
return output
Which then gets passed on to the next layer. Again, it's quite possible I've just missed something, or there is some inherent ML limitation that makes this impossible. But if such a thing is possible, it would be a rather awesome feature from my POV.
Yeah tried that one a while ago but didn't have much success. I'm a bit of a dummy when it comes to ML. That said, it looks like they've updated the documentation or something, so might be time to give it another whirl. Thanks for the reply.
tensorflow's py_func allows you to do that, you can also write your own C++ operators, their API interface for that is pretty nice. In fact, you should be able to pretty much write any op you need by combining existing ops (maybe not as efficient as you'd like but yields the results you want).
The recurrent network work is under way - we are revisiting the designs to consciously balance between performance and API niceness. We'll share more tech details in the upcoming days.
Right now, this is an example RNN:
I know one of the biggest struggles of training RNNs in TensorFlow, for instance, is the fact that the graph is static. Other frameworks, like PyTorch, have dynamic graphs.
Will Caffe2 support dynamic graphs (perhaps down the line)? What is your reasoning behind whichever decision?
A super inexperienced observer here, so just a very basic question - how does Caffe(2) differ from TF/Theano/Torch etc.? What are the obvious upsides and potential downsides? A tweet-sized answer will do, I'm just curious as to what the high level differences are.
> Caffe2 is built to excel at mobile and at large scale deployments. While it is new in Caffe2 to support multi-GPU, bringing Torch and Caffe2 together with the same level of GPU support, Caffe2 is built to excel at utilizing both multiple GPUs on a single-host and multiple hosts with GPUs. PyTorch is great for research, experimentation and trying out exotic neural networks, while Caffe2 is headed towards supporting more industrial-strength applications with a heavy focus on mobile. This is not to say that PyTorch doesn’t do mobile or doesn’t scale or that you can’t use Caffe2 with some awesome new paradigm of neural network, we’re just highlighting some of the current characteristics and directions for these two projects. We plan to have plenty of interoperability and methods of converting back and forth so you can experience the best of both worlds.
PyTorch definitely makes experimentation much better. For example, if you want to train some system that is highly dynamic (reinforcement learning, for example), you might want to use a real scripting language which is Python, and PyTorch makes that really sweet.
Sometimes the line gets a bit blurred - for research that are focusing on relatively fixed patterns, such as Mask RCNN, both PyTorch and caffe2 are working great. In fact, Mask RCNN is trained in Caffe2, and that also makes things much easy when we put it on mobile - what our CTO Mike Schroepfer showed in his keynote is a Mask RCNN model trained and then deployed onto mobile with Caffe2.
RE the performance - many frameworks nowadays enjoy the high-performance libraries such as CuDNN for optimized runtime on different platforms. For example, we've been collaborating with NVidia on optimized distributed training on GPUs, and you can check out more details here: https://blogs.nvidia.com/blog/2017/04/18/caffe2/
I am not aware of that right now. Internally we measure the performance by looking at the theoretical peak possible (like the scaling efficiency when you use distributed training). C2 has been doing pretty well; if you find performance degradations, send an issue to us and we are usually very good to figure out the perf.
Great work. I would like to know "how beta" the Windows version is. Is there a small library for inference that we can use integrated in an application?
I am currently exporting models from an existing Theano system running on an a Linux server.
I'm currently running inference using tiny-dnn and an alternative direct implementation using Eigen (which is much faster) but in the long run it would be nice to have stuff like quantization and GPU available.
Thanks
Why do you people at Facebook use a patents clause and destroy integrity of open source license of BSD?
Better create a proprietary license which allow you to protect your predatory patent claims instead of acting benevolent and releasing with this kind of corporate malarkey.
Ah, it is because we do not want to create too much chaos for people to migrate between codebases. Evan and I had played both scenarios and decided that it is cleaner to put the code in separate codebases. A lot of the runtime code are shared, and we are working on the migration tools - such as model converters - that helps migration. If there are bugs in these tools, shoot us an issue on github and we'll fix it.
Will there be any project management or stewardship of this project,l? unlike caffe1 where every paper, project, or hypothesis yielded a mutually incompatible fork on github?
I think Caffe2 is especially suited to machine learning that runs on mobile devices, so I wouldn't be surprised to see it become more popular as that mode of machine learning becomes more popular.
So we carefully made the core much small and also made the platform more modular, so that the dependencies can be minimal when you build on Android/iOS. With our build system (buck) we are having very small binary footprints, which helps delivering the runtime to the phones more easily.
We also did a lot of optimizations on the mobile side - like using NEON, mobile GPU and stuff for optimized speed.
Is ARM, and in particular the NVIDIA TK1/TX1/TX2 supported? Ease of use on these platforms with Theano vs Tensorflow/PyTorch is the main thing keeping me on Theano.
Yep, if you look under the scripts/ folder we are putting on example scripts that you can use to build on specific platforms. Let me know what you think and feel free to send issues/PRs!
Right now we're training models in server-side environment and run highly optimized inference on pre-trained model in products (with focus on inference optimization).
The framework itself allows fine-tuning and training of the model on the mobile device too, but more work is required to enable particular use cases.
A quick question:last year when caffe2 first came out. You suggested to stick to caffe because caffe2 at that time is not mature. Now that caffe2 is officially release, is there any reason we should still use caffe?
There is no push to migrate from Caffe to Caffe2 for sure. After Facebook "dogfooding" our own implementation, I think it is safe to say that C2 is now much stable. I would encourage you to try migrating it, letting us know if you run into problems, and stay tuned for the nice additional features that you may be able to enjoy from C2 - like optimized computation with MKLDNN, etc.
Another issue I am concerned about is the visualization tools. As you know, caffe lags behind tensorflow on the nice visualization tool such as tensorboard. Would caffe2 have some nice visualization tool like this?
Congrats ! I find the site really well-made and I see a great deal of efforts were made on making the library usable
Q: I've never used Caffe - based on the examples provided, I would say it's best for images and videos? I'm interested in NLP (eg seeing patterns in science papers) or in studying wearables data (gps, heart rate etc.) to predict user activity.
Haha yeah, I definitely feel your pain - as a caffe developer it really makes me cry when things get so incompatible. I've made some improvements in caffe2 to make it more modular - checkout http://GitHub.com/caffe2/caffe2_bhtsne/, things like such will potentially make things more maintainable than the old Caffe solution.
I think Caffe 1.0 made a fundamentally flawed design choice in using protocol buffers to represent network without allowing users to provide a standardized way of adding new operations. Since network was represented as protobuf, to add new layers you had to recompile entire library or use python layers. Thus as a result each project had a new Caffe fork which had to be compiled and maintained separately. Caffe 1.0 was ahead of its time in several aspects compared to Theano or Torch but the dependency hell combined with per paper fork made it difficult to use. It should be case study in unintended consequences of design choices.
Can someone explain me why they chose Caffe2 as the name? I think this is a problem for Caffe, as people would see Caffe2 as the newest version of the library, when they are not related (aren't they?)
I would have preferred them to chose another name.