Hacker News new | past | comments | ask | show | jobs | submit login
Live Computer Vision Coding (dinusv.com)
202 points by turrini on Feb 1, 2018 | hide | past | favorite | 23 comments



There are a lot of rapid prototyping tools for computer vision usually done via some dataflow interface:

http://imageplay.io/

https://www.adaptive-vision.com/en/software/

http://www.adrianboeing.com/improvCV/

https://github.com/utwente/parlevision

I don't use any of these but I see a few of these reinvented every year with not much differentiating them from each other. I wonder why none of them caught on yet?

I think they are useful compared to the tedious workflow I am stuck on (prototype with python notebook to dump visualizations and play with hyperparameters, then port to C++ later). But the fear of getting trapped into some unsupported edge case and having to deal with a messier port usually stops me from changing.


> I wonder why none of them caught on yet?

Because the data pipeline is never the hard part in computer vision.


What is the hard part? What types of inputs/outputs are we talking about here? Fairly ignorant of this area :)


The hardest parts with computer vision, like with any advanced field, are

1) making the algorithmic bits in the first place (see opencv, vxl, itk, or insert any other vision library here where many of these bits are made for you)

2) knowing how to achieve the desired result using those bits

Many of these data flow tools show you boxes and arrows, and pretend that connecting bits with arrows instead of function calls is where the magic is.

But if one of those boxes is called, say, "ComputeOpticalFlowBetweenTheseTwoInputImages", how is that any different from calling the ComputeOpticalFlowBetweenTheseTwoInputImages function in the underlying algorithmic engine that they're building on top of?

The new layer doesn't solve either of the two hard problems for you.

Giving a live result is cute, but it's not something that you can't already do with a Python IDE without having to learn some new arbitrary modeling language.


That's not so arbitrary, that's plain QML.


QML is 100% arbitrary. WTF does a UI markup language have to do with CV? WTF are they using a UI markup language for this? WTF thought that QML would be a good fit for thiss? (sorry for using WTF in three different ways)


> "ComputeOpticalFlowBetweenTheseTwoInputImages", how is that any different from calling the ComputeOpticalFlowBetweenTheseTwoInputImages function in the underlying algorithmic engine

There can be a ton of difference, at least in the theoretical possibility space. But the tools, especially free, open ones have not matured enough to show the difference yet.

To begin with, on a text based environment you may need to look up the library's documentation to figure out the available operations within that. A visual environment can visually show you all the operations. You can say that there are certain environments and editors that can provide autocompletion, especially static typed languages, but there are several limitations even in that: First, the autocompletions almost always doesn't have much context built into them, so they will keep showing the same completions on several contexts and can be overwhelming. So the user need to go back to the documentation to figure out the available operations in that context. A smart, environment (can be either visual or text) can show completions for that context alone. Second, there can be many reactive parameters on a function call, especially if the parameters are long or complex objects that requires extensive studying of the underlying documentation. For e.g., if you are changing a parameter, another parameter's value might depend on what you have given in the previous parameter. A properly constructed visual interface can show and hide or change the structure of the parameters based on the rules and can replace the documentation if properly created, saving the users from frequently referring documentation or causing unnecessary compile and runtime errors. You might think it's a small thing, but if you look closely, this is where most of the time a developer spends, apart from figuring out the sequence of operations to perform. Now a text based environment can do most of these "smart" things, except for showing the sequence flow.

Now for the sequence of operations, a visual or data flow language, if properly implemented can arguably be better than text, because A) It can automatically show the available operations depending on the sequence (or the context). B) More importantly, it can show the big picture at a glance with the visual connections between the sequences. This can be a life changer, especially during debugging and when you have large number of programs or files.

Now in my experience there are no proper visual implementations in public domain that successfully done both A and B, that is reducing the general public's confidence in such a platform. But a proper implementation can make the life of developers significantly better, except a catch - whether the developers need it or not. It's like, It's almost like asking a truck driver, whether he wants an AI assistant or not and he somehow gets that while it makes his life easier but also significantly lower the barrier of entry for truck driving skill :)

ps: I am building a visual environment that can hopefully meet many of things said above..

edit: fixed few typos and added a disclaimer.


These are essentially people reinventing node based compositing software. These have been used in the visual effects industry for 25 years, so they have caught on.


I've never seen a live CV before. its pretty cool.

I have a flag that dumps out each stage of my CV pipeline when i need to debug something. I imagine this would be useful but its nice to be able to use my actual code base vs a separate system.


The first coined live system (at least identified as much) was actually a CV system. See Tanimoto, S. VIVA: A visual language for image processing, J. Vis. Languages Computing, 127-139, June 1990.


Forgive me if I'm wrong, but several vital plugins seem to be missing in the Linux download. Lcvcore for example. Compiling from source seems to properly populate the folders.


I had the perfect use case for something like this last year.

I wanted to make a (very) short movie with a similar feel to Waking Life, except instead of rotoscoped images I just wanted to do varying degrees of edge detection on each frame. It came out looking pretty good but it took a lot of tweaking individual numbers in a python script and repeatedly rendering the same clip before I could even get a set of choices to decide between.

Having something like this with me when I was filming it would have made a huge difference!


That would take about 10 minutes to set up in touch designer and it would work in real time.


This can never work in practice. First, it seems super simple to do CV in the video, but only because the author already knows what to type there. It will never be so smooth in real life. Next, if you accept that you have to think what you are doing, why not to fall back to OpenCV in combination with ImageWatch visualizer. The good news is that you save yourself one layer of abstraction which hides important details from you, not to mention learning yet another API.


Basically this, although I wouldn't say "it can never work in practice", just that "a boxes-and-arrows UI doesn't actually make CV (or anything else hard) significantly easier".

It's great to start out with and play around with but eventually you're going to want the GUI to get out of your way and just let you edit the code directly. At which point something like shadertoy is (IMO) the best of both worlds.


You are right, there is good application for this tool in education or people who want to quickly test something new. Shadertoy is great and I like it. It is at the right level of abstraction that I need for my current CG knowledge. I can imagine that there are many people who find Live CV very useful and then move downwards.


I wish this didn't force us to use pre-existing lego pieces by abstracting away the math. So so so many cool things could be built in the computer vision sphere if engineers weren't afraid to go a level deeper, ignoring all the opencv stuff, and just implemented the math themselves. There's no reason we can't get creative on a level further down. With all the clever math people that are enthusiastic about abstract algebra in security, you'd think there'd be some spillover of that math confidence into the CV world.


It's Qt based, it's easy to create your own blocks with C++.


I wonder how hard it would be to create deep learning based plugins, for example YOLO or Faster R-CNN for object detection.


A lot of the groundwork's already been done from the OpenCV side.

OpenCV has a dnn[1] module that can import models from other frameworks like Caffe, TF and DarkNet (YOLO's native framework) into OpenCV's own DNN object model, and run inference on them. Both Yolo and R-CNN families have been implemented as examples[2].

What's missing is that OpenCV does not abstract all these behind a single object detection interface if that's what the LiveCV side expects. So such a plugin would have to define such an interface itself and provide an adapter per model.

[1]: https://github.com/opencv/opencv/tree/master/modules/dnn/src

[2]: https://github.com/opencv/opencv/tree/master/samples/dnn


Prototyping YOLO or the other DNN means just run detector and see the boxes. Of course training these systems is clearly out of reach for this kind of abstraction technology, not to mention modify and develop something new.


Any word on a macOS version?


They provide the source, and it's Qt based. So it should be buildable for Mac.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: