In all seriousness, though, I wanted this so badly that I started (and failed) a startup with 12 employees nine years ago to build it. It was conceived for use in working with "big data" but the system essentially provided Etherpad-like scrubbable versioning of all common office document formats as a side-effect. All of it was structured in an environment more similar to the social aspects of Github than Dropbox, but you could sync up to your filesystem via a FUSE wrapper. That is, people could easily follow or fork your work in progress. If we'd continued, you'd have been able to accept the equivalent of PRs on your Word docs.
It was so awesome that we couldn't find anyone to pay for it, sadly. Armchair quarterbacks would fairly accuse us of failing to do proper customer development.
I can't speak to the technical limitations of Dropbox's versioning implementation, but given that they already have both viewing AND versioning running for a decade, I honestly can't believe it would take more than a few months for a small team to implement Etherpad-like editing functionality for the office suite document formats.
Just going to go out on a limb and say the IP is owned by an investor. Paying 12 people isn't cheap, so that's why I assume there was an investor involved.
That sounds pretty cool, though I'd have to think about how such a thing would function with the workflow of my team. I think it could be done though! It sounds like a more reliable way to do shared work on office docs.
I see some issues with it. No matter how smooth you make this, any software with pull requests is going to be considered "technical". Christ, people think basic excel skills are "technical". SO you have to get over that hurdle. But personally, if I've already gotten people over that hurdle, I might as well just use git and LaTeX documents.
I don't know, I think it sounds awesome, but I also think it might be tough to sell.
We tried to attack the market with a Github model: share your open source data and docs publicly (our true passion) and pay to have private repos.
Our mistake, as I said elsewhere, is that there was no market. We had many sales conversations and zero takers. For context, we were fully under the spell of The Great Big Data Hype of 2011 (see the current AI hype for reference) and convinced ourselves that there would be so many opportunities that we'd have our pick of which path to take.
In fairness to our past selves, for a while this seemed true; valuations were insane for companies with vague value props in the space. And we met with dozens of influencers in the data world and they all professed to be excited to use it. Most of them ultimately logged in once, realized they had no hair-on-fire problem for us to solve, and stopped returning our emails. It was frustrating in the extreme.
Also, we actually did make everything available via git alongside the web frontend, API access, downloadable formats and even a beta Google Docs live connection integration. We had it set up so that you could hit save in Excel and it would commit a new version for everyone following it to access.
git wasn't yet supporting large binary commits very well (not that it's amazing support today) but I remember us going very deep on this python library that fudged support for large binary repos with the glorified equivalent of symlinks. I'm not sure we ever really got this working well and my memory is honestly fuzzy enough that it's getting harder to sort what we did from what I desperately wanted to do before we ran out of cash and I got depressed enough to take two years off and run away to Europe with a crazy person.
> but you could sync up to your filesystem via a FUSE wrapper. ... follow or fork ... accept the equivalent of PRs on your Word docs
> It was so awesome that we couldn't find anyone to pay for it, sadly. Armchair quarterbacks would fairly accuse us of failing to do proper customer development.
I get the impression that it was extremely complicated and didn't fit into anyone's workflow. IE, if you approached someone using Dropbox, they'd have to change far too many habits just to switch to you.
It was partially this, although Dropbox was not nearly as entrenched in 2011 as it is today... which by extension means that cloud storage was not a given like it is today.
It's actually far more appropriate to say that we were competing with a culture where it's engrained in people to make those FINAL_FINAL2 versions on Samba shares. Or worse, to email them.
I am biased but I'd give our UX a 7.5/10, and if we'd have continued it would have gotten smoother. The FUSE wrapper was not the primary interface by a long shot, though... in fact, I'm not sure it was used by many people outside of our team, in the end.
The irony here, is that Drew jokes about how Dropbox is going to solve these ridiculous file name versioning convention with their product in their famous YC application:
> Please tell us something surprising or amusing that one of you has discovered.
(The answer need not be related to your project.)
> The ridiculous things people name their documents to do versioning, like "proposal v2 good revised NEW 11-15-06.doc", continue to crack me up.
And yet here we are a decade and change later and Dropbox, while having solved "a" problem, sits like a ridiculous behemoth leaving it's users hungry for so many other pain points to be addressed by another savior, including especially this one problem they said they're gonna solve.
Dropbox still hasn't solved many of its core issues but it has been investing in Paper (which I personally have never seen anyone using) and all that design crap from a couple of years ago.
I introduced a lot of people to Dropbox like 8-9 years ago and after using it to share files with other people I found out the hard way it's a terrible tool for that. I then used it for a couple more years to share files between my machines but they haven been introducing so much crap in their desktop app that I moved to sync.com.
Former dbx employee here— they always wanted to do it but it is technically challenging to build a fully functional product here accounting for things like formatting/comments/etc. when you have such a large enterprise user base there are often trade offs - ship a basic prototype and risk customer confusion/complaints or invest lots of resources and draw away from other projects
This is such a non-argument. They could've easily just started with text-diffs, then photo diffs. And later do doc diffs. Perhaps with some disclaimers. Heck, they don't do that for their main product, so why would they even do that for such a product. It's probably in the terms somewhere.
The reason they're not doing it is because they want a piece of the productivity pie.
It is challenging, very much so, but it can can be done. I built a prototype for Word files based on Git (can also use the GitHub
API, so making it work with the Dropbox API should be doable). I implemented sort of a blame function as well: Jump to the previous version of a paragraph with just one click.
As OP said, it took a lot of effort to get the UI ok. Probably takes even more effort to create a great UI, but I guess Dropbox has some resources, right? Shameless plug: Landing page at https://julesdocs.com
If anyone is interested in pushing this forward, I'd love to hear from you (mail address on the landing page)!
They could have released it to personal accounts? At the rate they have embraced and catastrophically abandoned other vastly more fundamental features (packrat? Photos?) This seems far more easier to roll out slowly. Seems more like they've lost their way.
And I will never ever forgive y'all for what you did with mailbox! (Like seriously what did they do?)
> They could have released it to personal accounts?
Enterprise customers pay more money to have more features with checkboxes in the feature matrix. Telling them they don't get a feature because their needs are too complicated is a tough sell. (something, something, opens up opportunities for low-end disruption, something, something)
Google does it all the time with GSuite. I doubt customers want buggy features. Beta testing new functionality on your free/personal user accounts before rolling it out to larger business customers is pretty standard.
A feature like this is VERY application specific for a lot of files since you can't just take out the rendering engine and would need to usually have a third party make the software to render to web-views, whether opensource or proprietary. It's not even as if first party software is allowed to run as server. Example, psd rendering to web. AFAIK photoshop has no server license. Pretty much all services that need to render psd files use ImageMagick afaik. I looked it up and iirc. Photoshop's own api is pretty terrible to interact with and iirc licensing for servers is weird and expensive even if available.
EDIT: This comment is almost a word salad, I need to sleep lol.
While true, as others have said it could be rolled out piece-meal. For me, for example, simple text diffing would scratch a major itch, both for text and code.
Right, but as the article points out - github already has the feature for some file formats, but it's not good enough for him because he wants it specifically for powerpoint and word.
In other words, someone wants a diff tool for Microsoft products but specifically wants Dropbox to implement it.
Yes. By default, every rsync.net account has 7 daily snapshots that are created and rotated automatically - no intervention on your end is required.
You may optionally set any arbitrary schedule you like (day/week/month/quarter/year) and you simply pay for the bits on disk that those (efficient, changes only) snapshots take up. Sometimes they take up almost nothing.
My favorite part of all of this is that the snapshots are immutable, or read-only. No matter who attacks your rsync.net login or what password you lose, the snapshots cannot be destroyed by any outside action.
This allows for some interesting insurance against ransomware / Mallory ...
Am I missing something? This already exists in Google Docs. Much easier to implement when the doc is database-backed (recording every keystroke for OT) vs file-based.
The Docs versión history tooling is pretty weak, though — you have to repeatedly select a timestamp. You can’t scrub, and you can’t even click on text and get “who put this here, when?”
Yeah, the thing is, you can't do this in a file-format-agnostic way (you need to know what a Word doc or an Excel sheet is), which makes the file system layer the wrong level of abstraction to consider.
The diff when database-backed is still going to be file-format specific because a text-based document will not be stored in a DB identically to a spreadsheet.
Word already has a diff view implementation that is pretty robust - it’s very useful for figuring out what changed across manually-versioned documents. This is in addition to classical track changes feature.
Adobe Acrobat also has a diff (including visual diff) feature that can be used to do advanced comparisons if necessary.
Granted, author’s suggestion is more user friendly and integrated.
my solution is to use pandoc to generate the diffs. Combines the benefits of word formatting but allows me to see the changes in git. (I use it mainly for my resume)
Have you seen any options take advantage of the fact that docx files are just zipped xml files? I can see the git repo ballooning if you have a few images and you commit frequently!
> ...but this is useless. Timestamps??? Tell me what changed! Let me see the changes over time. Word has a change tracking feature, but my PhD in computer science isn't enough for me to figure it out.
> But but but Austin, you should be using a proper version control system! Just use Git and GitHub!
Found that aside curious, as track changes in Word is a first class versioning implementation with word processing and editors savvy, just as Git is a first class versioning implementation that's code lines and commits savvy.
Surely headspace around track changes is less "PhD" than git.
Track changes works if you load the correct files yourself, it doesn't put all your versions in a timeline for you. It's literally just the diff visualization part of the pie, nothing else.
I can see a company like github or dropbox developing visual versioning and promoting it to make users dependent upon it. It would be an extremely sticky feature that made it hard for users to like competing products.
Imagine how github could push for MS Office integration and become a versioning powerhouse for non-code-stuff.
But I can't see it standing as a stand-alone product that people would really pay for. It has to be part of something else.
Version handling is built into Office 365, and many comments here indicate it's even in the relatively crappy Google Docs, but I'm sure there's a market for pretending it isn't and selling incredibly shitty half-baked attempts as a B2B SaaS offering (this is not a sarcastic "I'm sure", I know about this market space and it disgusts me on a deep level)
I’m a lawyer who uses Word’s track changes as an integral component of my work. I haven’t seen a single meaningful improvement in that feature in 20 years. Right down to the fact that I have to open “compare...” from within a document, but then have to go hunt the same document down in the file system to set it as the original. Don’t get me started on every other reason that Word has failed to innovate on this front.
The solution is to dump Office and use text files, if you can get away with it.
What’s wild to me is that a third-party product, Workshare Compare, actually does a better job of this than Microsoft does with its own product. Workshare is used widely in the AmLaw 100.
We use Office 365 at my workplace and I have found the version handling to be lacking. AFAIK it can’t diff two versions of a document in a convenient way. Another gripe I have is that it is not, AFAIK, possible to tag/name versions.
Huh, Etherpad was acquired by Google in 2009 and a fork of Etherpad, Hackpad, was acquired by Dropbox in 2014 [1]. Both projects got folded however: Etherpad into Google Wave, and Hackpad into Dropbox Paper.
This looks similar to redline and blackline document comparisons[1]. We do this on our site[2] where we display large financial documents that average 100 pages. Identifying what text and tables were removed, added and changed from one year to another is useful information for predicting future company earnings[3]
Sorry, I just updated the comment with a direct link to the site which is a freemium SaaS. The other site link is an animated gif that shows how to toggle between the redline and blackline views.
Modern MS documents files are zipped XML. To do this comparison they would need to unzip each file, run it through a rendering engine and hold it in memory, and then do version comparison. For this to be feasible you would need to use a file format that supports this sort of comparison in a way that isn't very resource intensive.
It's not that, it's not like 100% of your users will be diffing documents 100% of the time. The real reason is that office formats are super, super complex and diffing them is a hard problem, even more so for the proprietary Microsoft formats.
The "zipped XMLs" you mention are basically XML dumps of the former binary format that evolved organically from the 1980s, when resources were scarce and they had to hack together a working office solution.
If you just want a content-aware diff (never mind formatting), it's not actually that difficult to diff; read the stylesheet so you can understand the style refs, then parse the workbook sheets and look up style refs on demand.
AFAIK there are no ready-made solutions for that so far. Would be very useful![2]
[1] It would be interesting to dive further in to this subject but personally I can’t currently find the time for that.
[2] Now that I think of it, this might be an interesting project for someone participating in Google Summer of Code. Not sure if the Git project will participate this year or not.
The proper way to diff .docx documents would be to Microsoft release a diff tool for .docx documents. If they released a three-way merge tool as well then it could be used in git too. git supports 3rd party diff and merge tools for specific file formats.
They already got the functionality to diff between two documents in Word. I use it all the time to see if legal made any changes while "forgetting" track changes.
Maybe libreoffice should work on it for .odt then, together with various VCS plugins (mainly git and maybe hg). It could be an interesting differentiator feature.
Sure, it's hard to diff and merge tree data structures, but it doesn't have to be perfect. Text diffing and merging is already imperfect anyway, yet it's very useful.
Not all of them. I believe Microsoft uses a special format for Office documents in OneDrive. (These files are converted to xml when you access them with non-Microsoft software)
I’d also like to add on a different note, I don’t really get why git can’t support docx, pptx, and xlsx. They’re open standards not binary blobs. Basically just zipped xml.
Can you explain how to do that? I must be missing something because all I could find was merge tool support but no mechanism to tell Git globally that all files with extension "xyz" should use this specific merge tool.
According to this StackOverflow post [1], you'll need to write merge driver. I was looking at this when trying to hack git to use conflict-free replicated data types (CRDTs) as a fallback for a specific document type.
If you write VBA in Excel, there is a free Git extension called Git XL that we maintain which is able to properly Git merge your code directly in Excel: https://github.com/xlwings/git-xl
I don't like the example. Unless I'm missing something, all three of these are exactly equivalent, so you could accept any of them as the result of a merge.
But the problem with that idea is that two different people explicitly made a change that looks meaningless. That tells us that we're evaluating "equivalent" incorrectly, which means we don't actually have any remaining justification for picking one over another, and the conflict is hopeless without further input.
Yep, I wish GitHub made it easier to do diffs between a specific commit and many other commits, like with a timeline. Would be great for visually tracking down when a change was introduced or how the code has evolved over time.
That was an inspiration for a tool I built called Yestercode [1] (though it uses undo history, not version control).
Cloud word processors like Zoho Writer & Google Docs already have version comparison features. But this idea of a sliding time traveler for documents is very intuitive!
Also Zoho Writer has a combine feature, that lets you upload a docx and combine it with another docx - with the changes highlighted as tracked-changes. Pretty handy for comparing docx files.
I've always wanted to be able to right-click on a file that is synced in Dropbox and either have a submenu with versions to select or an option that pops up a window with the file's version history. Without having to open the Dropbox web app.
What if it’s time to add features to the undo/redo construct as a whole? Maybe not discarding redo history when a modification is made in the past for example. Computers have improved a giant amount since undo was designed (clipboard too, for that matter). We should be redesigning these common* features to keep up with the times.
*often (not in MS Office probably) the undo buffer is managed by the OS. It’s conceivable that some rethinking could happen at the OS level.
vim 7+ keeps an ‘undo tree’ that permits this (it can also be instructed to keep it in a file to persist across sessions). It’s helpful to install a visualization extension to seek around easier (https://stackoverflow.com/questions/1088864/how-is-vims-undo...)
Wasn’t the timeline concept a core part of Google Wave? If I recall correctly you were able to scrub through the entire history of a document in a similar way.
I remember ~4 years ago, I needed to retrieve an old file, and I discovered that DropBox does write a diff-based repo, which you can restore at different points. I don't remember the details, but I needed to use some sort of CLI to access/navigate it on the host system.
In short, it is probably possible now, using what DropBox already exposes.
> So why isn't this built directly into Dropbox, Google Docs, and Microsoft Office?
macOS includes a built-in version history since OSX Leopard. Sadly the flashy version UI with 3D effects is not the best to find differences and many programs doesn’t use the native frameworks that bring this feature (e.g MS apps, Adobe apps)
Since Snow Leopard MacOS has also had a local versioned file system (separate from Time Machine) but not all apps use its API and since it auto-saves and screws up "Save As..." capability I find it annoying and usually just turn it off and pretend it doesn't exist. It's one of those features Apple added that could have been useful but since they never got the UI right, most people don't know it exists.
I believe the grandparent is referring to Time Machine. Some of its versioning features may have been integrated into iCloud storage as well (not sure).
Yes, I was referring to the file version history. Sorry I couldn’t recall if it was introduced in Leopard or Snow Leopard.
I totally agree with the comment from dreamcompiler, the file history feature is a great idea but the execution needs a lot of improvement.
The main issue to me is not the change to Save as.. behavior. To me the problem is that most of the apps didn’t adopted it. In particular cross platform apps ignore it.
So you never get used to the behavior change.
The version history UI, is also too “heavy”. It has a slider to go back in time, but surrounded by a faux app window simulating traveling in time with your app state... sounds cool but is distracting and not so useful to find differences.
Wrike (project management software) actually does this, but it is for only text descriptions which makes it easier. Wrike also color codes changes by user.
In all seriousness, though, I wanted this so badly that I started (and failed) a startup with 12 employees nine years ago to build it. It was conceived for use in working with "big data" but the system essentially provided Etherpad-like scrubbable versioning of all common office document formats as a side-effect. All of it was structured in an environment more similar to the social aspects of Github than Dropbox, but you could sync up to your filesystem via a FUSE wrapper. That is, people could easily follow or fork your work in progress. If we'd continued, you'd have been able to accept the equivalent of PRs on your Word docs.
It was so awesome that we couldn't find anyone to pay for it, sadly. Armchair quarterbacks would fairly accuse us of failing to do proper customer development.
I can't speak to the technical limitations of Dropbox's versioning implementation, but given that they already have both viewing AND versioning running for a decade, I honestly can't believe it would take more than a few months for a small team to implement Etherpad-like editing functionality for the office suite document formats.