Hacker News new | past | comments | ask | show | jobs | submit login
Visual Analysis of Binary Files (binvis.io)
223 points by h4x0rr 8 months ago | hide | past | favorite | 48 comments



Ah, this is an old side-project of mine. Something I should probably make clearer is that files are not uploaded anywhere - the app is completely local, and all analysis is done in the browser.

This version is written in React but when time permits I plan to release an updated version written in Rust, along with a library of fast implementations of space-filling curves and related utilities.


This tool has been incredibly useful to me for getting a high level overview of compiled binaries and JS bundles: you can typically tell if something fishy is getting included by the unexpected changes in entropy/categories. Thanks!


I just wanted to say thanks, this is something I use regularly when looking at unknown firmware blobs or file formats, to look for compressed/encrypted data or other structures


Looks fantastic cortesi!

and binvis is the coolest name!!


Look cool. What can you get out of this kind of visual analysis?


Getting a really quick, coarse view of what a file format looks like, easily picking out different types of data. Try uploading a SNES ROM or similar and you'll probably see a lot of distinct squares because of how the ROM banks work.

Super Metroid has a surprising amount of filler scattered all over.


> Super Metroid has a surprising amount of filler scattered all over.

It only really looks scattered because the visualizer uses a space-filling curve instead of laying out the bytes of the file linearly. If you change the curve to "scan" instead of "cluster", it becomes a little easier to see what the filler is: it's the unused space at the end of each ROM bank.

(As someone who's spent a lot of time in that particular binary, I think the use of the space-filling curve obscures a lot of details that would be a lot easier to spot otherwise -- such as the structure of the ROM banks, the location and layout of various pieces of compressed and uncompressed data, and repeated structures like a couple KB of common routines that are duplicated at the beginning of every bank containing enemy code.)


It'd be cooler if these were interpreted as sound waves. I'd love to goto sleep listening to notepad.exe.


Did you happen to catch the post regarding ROM dumping the Nintendo Gameboy (DS?) by allowing the crashed game play through eventually playing ROM contents? just wondering if that's what gave you the idea here. But I agree, ASMr would be interesting!


I dunno, depends on whether it's raw sound or something gentle. Imagine listening to the modem connecting sound.


I've been trying to figure out the file format used for "POV" fan displays I've bought from AliExpress. This visualizer was a big help....


I’m fascinated by this but on mobile at least I can’t quite figure out what the colours mean. Is there a legend I can peek at?


There is, but I see it's not visible on smaller resolutions - I should fix that. The default color scheme just classifies bytes into black (0x00), white (0xff), blue (ascii), and low (green) and high (red).


Does it render data line by line or with something like Hilbert's curve to preserve locality of distinct features?


Hey what’s the clustering algorithm for the “cluster” vis mode?



This is where I miss | still pull out the original ERMapper geospatial viewer with the .ERS ASCII file "header" for viewing.

https://gdal.org/drivers/raster/ers.html

does not in any way do it justice.

What it allows you to do is create a text header description for a binary file .. pretty much any binary image (and not intended to be image) "raw" (ish) format that existed then and now.

Band interleaved, band seperated, row interleaved, column orientated, etc. 8 bit, 12, 24, 32 bit, big or little endian, with or without header "junk" to skip at start of file and|or at the start of each "virtual row"

The handy hack was the ability to view as image and file that had time series aquisition by sketching out how to block it, how to understand the binary, insert optional colour maps .. and then being able to rapidly visually scan 2GB or 16GB of binary blob with real time fly about zoom in zoom out capability.

Intended for multichannel satellite | aircraft | other instrument data with a variety of metadata (timing, position, instrument orientation, etc) - useful far beyond its core target.

Addendum: for the curious, somewhat better description of the .ERS layout from PDF page 33 onwards of https://www.aseg.org.au/sites/default/files/ER%20Mapper%20Cu...



both of these are 404 for me


Definitely curious, thank you for sharing this.


This reminds of old ZX Spectrum computers that had 48K RAM of which ~4K was the buffer containing what was actually shown on the screen. Some software was so space-constrained that when loading, it would temporarily have to write into the screen buffer, so when you loaded something it may have filled the screen with images similar to those of this tool. Then as loading completed, it would shuffle things around in memory to be able to use the screen for its intended purpose.


This is a screenshot from a COSMAC ELF program that displayed a picture of Enterprise.

https://en.m.wikipedia.org/wiki/File:Iconic_COSMAC_Elf_space...

The machine was short enough on RAM by default that even at runtime, the screen buffer was shared. The random noise at the top of the screen is the program used to display the image on the bottom.


On the C64 some packers used screen memory to host the code. The first was The Screen Cruncher by 1001 Crew, a Dutch group. It's mentioned in this interview by its original author.

https://web.archive.org/web/20040902191610/http://www.c64hq....


Reminds me of a cool tool I made at a previous company. I combined a software validation ticket database with version control blame in order to paint the entire source code repo line-by-line to represent how "validated" it was. I generated a HTML listing with png thumbnails next to every source file, allowing you to quickly find areas of interest. Maybe that sounds boring, but it was flight code for a space ship.


Sounds like a fun project!




Thanks! Macroexpanded:

Visualizing binaries with space-filling curves (2011) - https://news.ycombinator.com/item?id=24813704 - Oct 2020 (15 comments)

Visualizing binaries with space-filling curves (2011) - https://news.ycombinator.com/item?id=14544191 - June 2017 (8 comments)

Binvis.io – Visual Analysis of Binary Files - https://news.ycombinator.com/item?id=11077222 - Feb 2016 (9 comments)

Binvis.io: visual analysis of binary files - https://news.ycombinator.com/item?id=9140249 - March 2015 (1 comment)

Visualizing binaries with space-filling curves - https://news.ycombinator.com/item?id=3449743 - Jan 2012 (20 comments)

Others?


https://codisec.com/veles/ This is also really cool, similar idea but in 3d space


Wait this is really cool!! Reminds me of a thing I did a few years ago that I never completed (sigh) which was about visualizing access patterns in database storage engines.

Here's the report with pics and videos if that sounds interesting!

http://akmanalp.com/static/memory_trace.pdf


This is awesome. I'm actually working on a project at the moment that uses DynamoRio to capture write instructions. Would be super interesting to use your tool to visualise the access patterns. Is your code available somewhere?


Gah, it's probably in some old laptop somewhere and I can see if I can dig it up this weekend. What does your project do?


We are looking at how you can take an existing in-memory datastore (e.g. Redis) and migrate it to use non-volatile memory (NVM) such as Intel Optane NVDIMMs for fast persistence. Similar pitch to this paper: https://www.usenix.org/conference/osdi20/presentation/zhang-... But better obviously :)


That sounds awesome! I can't seem to find it right now but I'm pretty sure I started out with one of the samples, I think memtrace_x86.c and that required only minimal modification to get it to do most of what I needed, minus eliminating addresses I don't care about, plotting and stuff which is not extra hard.

If you get stuck and / or end up doing something cool, shoot me an email at hn at my website on my profile!


I remember seeing a similar layout used for an article or blog post about reverse engineering a binary format. I’ve tried finding it later, but couldn’t. Anyone know of such a tool that lets you describe the various headers, fields, etc of a binary format with a similar visualization?


I recently did a very similar thing, but for visualizing DNS records on IPv4: https://reversedns.space/



The default view reorders the bytes as far as I can tell. It's mentioned in the help, but I don't really get how to make use of this feature. Maybe I tried it on the wrong kinds of files, but I found the default view confusing.

But it is a really cool tool, if you have to figure out a binary file format looking for patterns is certainly very useful.


I would love to take an binary I worked on and feed it to this and get it printed and framed and hang it up somewhere.


I also recommend checking this: https://codisec.com/binary-data-visualization/ - visualization of binaries by mapping n-grams into n-dimensional space.


This reminds me of the old Amiga tool where one could visualize the RAM contents. As RAM wasn't cleared after a CTRL+A+A reset, one could find the graphics of the game one played before rebooting.

Good times, but I can't seem to remember the tool's name.


A video of someone implementing a project of a related topic, for identifying binary patterns based on visualization. Interesting stuff.

https://www.youtube.com/watch?v=AUWxl0WdiNI


there was a time on linux when you could cat /proc/kcore > /dev/video or something like that, some kind of old framebuffer device.


I remember someone made scarves/blankets with certain binary visualizations. Very cool stuff.


How does the mouseover animation of that topmost image work? It seems random yet deterministic.


The mousover always shows a contiguous section of the underlying file. So you're seeing a constant-length section of the space-filling curve centered on the cursor as you move the mouse about. The hex to the right is that same contiguous piece of the file. Sort of neat, actually.


I think maybe they're referring to the logo on the main page, not on a binary file? But that's an answer to another question I definitely had.


Oh, right! That's just a bit of fun - a random pattern (picked to be vaguely visually pleasing), which is laid out on the Hilbert curve, and then offset by a calculated amount depending on the mouseover. So I'm just shifting the same pattern forwards and backwards in a ring buffer, basically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: