Hacker News new | past | comments | ask | show | jobs | submit login
VFS shim that allows a SQLite database to be appended to another file (sqlite.org)
176 points by blacksqr on June 8, 2018 | hide | past | favorite | 50 comments



For a bit of context, in the Tcl ecosystem there's the concept of "Starkits" which are executable scripts which contain a stub loader which mounts a virtual filesystem archive (which contains all the files the actual script needs to run).

Typically this has been either a Metakit database or a Zip file, but they both have disadvantages when compared to SQLite and we've been wanting to move to SQLite for a while but it didn't support being appended to things, so we couldn't. At the most recent Tcl Conference DRH agreed to fix this and with this change it looks like he has done it. Expect an SQLite-backed Starkit soon !

Also, and unrelated, Starkits can be converted to "Starpacks" which replace the script-based stub loader with a native executable, so you can distribute Tcl-based scripts as a single file.


Tcl has so many amazing ideas. I just wish it had higher adoption and everything about it wouldn't feel so... old. Look at the wiki for example. It should really be an actual wiki à la Wikipedia not some sort of strange Wikipedia discussion page masquerading as documentation.


> Look at the wiki for example. It should really be an actual wiki à la Wikipedia not some sort of strange Wikipedia discussion page masquerading as documentation.

You can complain about it looking "old" if you want, but it's hardly fair to complain that it's not an "actual wiki". It follows the style of the original WikiWikiWeb which predates Wikipedia by a number of years.


Expectations change. Scotsmen change.


But where else would you find gems like this?

https://wiki.tcl.tk/9832


FINALLY, someone actually defined -- in practice -- what "(E)nterprise ready" means.

Best definition yet, that I've seen.


On a related note, if you love Tcl but want a different syntax there's Little, which is driven by Tcl, but has a C/Perl/awk syntax:

http://www.little-lang.org/


Do you think it's an actually good language though? I only tinkered around with Tcl (long ago when writing eggdrop bots and more recently while messing around with FPGA and ASIC build scripts) and while it's serviceable it never struck me as a particularly interesting or elegant scripting language. When given the choice I'll always pick Python or Perl over Tcl. Am I wrong? What did I miss?


I think Tcl is incredibly elegant. It's got a set of very simple syntax rules that compose together to govern everything. You can use Tcl to create your own keywords that are used exactly like the built-in ones and your caller won't ever tell the difference (well, unless they inspect an error backtrace). For example, a try/catch syntax was proposed as TIP 89¹. And many many years ago, as part of the MacPorts project, I implemented the entirety of TIP 89 in pure Tcl. The TIP was proposing actual Tcl keywords, but I implemented exactly the proposed syntax and semantics without a single keyword. I have no idea if that code exists in any form anymore (I haven't been involved with the project in something like a decade), but I'm pretty sure the only observable difference between that and TIP 89 is error backtraces² that cross a frame involving try/catch.

Also, virtually every type³ in Tcl can be expressed as a string, and so you can model the entire language mentally as if every value was in fact a string. If you wanted to write your own Tcl interpreter it would be pretty easy to actually implement it this way, though obviously performance would suffer quite a lot.

¹Later superseded by TIP 329 in response to the error options dictionary in Tcl 8.5.

²And FWIW I believe there's no actual spec for what error backtraces look like anyway, I think it's implementation-defined, though don't quote me on that.

³The only type I know of offhand that can't be expressed this way is an array (which is distinct from a list, that is a fundamental building block of Tcl and can absolutely be expressed as a string)


anyone interested in starting to play around with languages would do well to write out the 'everything is a string' version. its very short and understandable.

plus the result is really amenable to inclusion in another program as an extension language (tcl is too, but it has a much bigger footprint)


I think it's a good language. It has simple syntax (only 13 syntax rules), great documentation for the API, the language, core commands, and all the packages shipped with the core release. It's very flexible so it can be used in a lot of situations: Embedded into a program as a domain-specific language (DSL), scripting engine in its own right with very complex software written in many different styles, evaluating untrusted code (with Tcl Safe Interpreters).

Really the only reason to pick Python over Tcl is that more people have picked Python over Tcl in the past.


I don't think Tcl is in competition with Python or Perl. It is in competition with bash, PowerShell, etc. Its syntax is quicker to type which is good for interactive (REPL) use.

Sometimes I'm thinking about a simple build system which is very much like Make but uses Tcl to be cross-platform and scales better to advanced uses.

Tcl could also work well in Jupyter notebooks.

Once you think about functions, classes, and modules, you better use Python.


You can help improve adoption by using Tcl in your next project -- there's no reason not to, right ? :-)

P.S. We just setup a Slack interface to the Tcl'ers Chat, so now all you Slackers can join in to the hundreds of other Tcl'ers.

https://slack.tcl-lang.org/


I like the quirky feel of Tcl'ers wiki. It gives it an unique playful character. I would like it to stay that way.


For general style improvements to the Wiki there's progress on moving the Wiki to "Nikit":

https://nikit.tcl-lang.org/


If your database is readonly after being appended, you could also use memvfs. I've used that on mmapped files to support similar archive ideas: https://www.sqlite.org/src/file/ext/misc/memvfs.c


These sound a lot like Erlang escripts as well.


Heh, this reminds me of a trick I was working on right about the ruby 1.8->1.9 transition: import the tree of .rb files (including gems) into a sqlite database, mess with the `require` mechanism to read from it instead of the actual filesystem, and make a static build of ruby with sqlite itself, the ruby code database packaged as a .text area, and any binary extensions all linked up into a single file. Ta-da, single file static ruby apps and trivial deployments: a truly useful thing back when Capistrano was the latest and greatest.

Unfortunately, 1.9 changed the ruby build system enough that I couldn't quite get it working, and I've not got round to looking at it since.


For what it's worth, you can realize your dream in Tcl, since this has generally been possible (and utilized) for a long time !

Tcl supports the concept of "Virtual Filesystems" that an interpreter can "mount" wherever it chooses, and these can be backed by anything. So you can mount a zip file or an HTTP resource, or stack them and mount a zip file over a mounted HTTP resource, etc.

One of the virtual filesystems I've written is called "C-VFS" which takes files and converts them into object code in a shared object or archive (.a) and you can then link against 1 or more of those and mount them at runtime from the running process.


That is very neat! I always thought about expanding that idea to most programming languages. Most of the startup time is probably spend stating and reading files.

Just from a quick experiment

    time python -c ''; time python -S -c ''

    real	0m0.079s
    user	0m0.021s
    sys 	0m0.030s

    real	0m0.017s
    user	0m0.007s
    sys 	0m0.007s
From the manpage:

    -S     Disable the import of the module site and the site-dependent manipulations of sys.path that it entails.

I also considered doing the same thing for a linux distro. Just out of pure curiousity to what would happen to the boot up times. Wish I could have time to try it one day :(


What about going all the way and using an unikernel (or something approaching one)? Statically link suitable readonly data/code into a single global address space (to make use of optimizations possible by combining binaries, as apparent in e.g. busybox, and remove the overhead of code duplication and dynamic linking, if possible even skipping some overhead with syscalls, by allowing the compiler to reduce spilling registers to stack in an effort to prevent clobbering), and just make sensitive and modifiable data unavailable to or at least not writable by other/unprivileged processes.

This would probably require some modification to GCC or LLVM, if the latter even supports building a normal Linux.


Madness! Or actually really similar to an idea that Joe Armstrong presented in Strange Loop one time (just the idea no demo or anything)

https://www.youtube.com/watch?v=lKXe3HUG2l4


There exists an Erlang unikernel (of sorts): http://erlangonxen.org/


Well yeah, but that's like saying there exists hardware that eats java bytecode. Yes, it does, but it brings _large_ restrictions with it.


https://github.com/regularfry/carton is where I got to. Might give someone some ideas.


It seems like the intended use-case would be for the database to be appended after the executable that is using it. But writing to a file that's being executed seems (to my naive view - I'm not terribly experienced) like it could cause issues.

What am I missing?


I'm going to have to try this, but I suspect such writing won't work on Windows, which aggressively locks executables. Maybe it just locks the in-use range and allows appending.

On Linux this probably works fine because writing past the executable part of the file does not change any offsets beforehand.

(On NFS systems you can definitely overwrite in-use executables at any point, which works until the relevant bit gets paged out then back in)

Reading, on the other hand, ought to work fine everywhere. This continues "SQLite is a replacement for fopen()" - you can use it in similar cases to self-extracting ZIP files or self-executing JAR files.


I expect that even if it works on 'naked' Windows, many antivirus systems will take a dim view of it.


Many Windows programs embed resources (such as icons) in the executable; perhaps this access pattern would be indistinguishable from what common programs already do?

I'm honestly not sure - for instance maybe the embedded resources get loaded into RAM not read from the file. But if that is the case couldn't the embedded SQLite file be marked as a resource and loaded by the same mechanism?


I'd expect reading to be fine, but probably not writing. I could be wrong though, I'm just speculating.


In DOS days AVG's heuristic detection engine had flag for programs that did write into their own binary (it was reported as +i AFAIK), but it didn't have much weight as essentially anything compiled with Borland Pascal was flagged in this way (IIRC including AVG itself).


I suspect the intent would be an extension of the idea of using SQLite as the base for a custom binary file format. One use case might be to embed non-code resources (graphics, audio, localization strings, etc.) such that they can be easily queried and loaded via SQLite instead of some other method.


I confirm your suspicions.

People have been appending ZIP archives to executables, in order to hold non-code resources, for time out of mind. The appendvfs extension makes the same thing possible for SQLite databases. An SQLite database has advantages over a ZIP archive in that SQLite supports a far richer data model, is far faster for random access, and has a query language. Both ZIP and SQLite are well-defined and very widely deployed formats. Many developers are more familiar with ZIP, but there are more than 1 trillion SQLite database files in circulation, and SQLite is a recommended storage format according to the US Library of Congress (https://www.sqlite.org/locrsf.html) so SQLite should not be dismissed as being too unfamiliar.

Example use cases:

(1) We have experimented with (but not published) putting all the SQLite documentation into an SQLite database and appending it to a special webserver app. Download the "EXE" and double-click on it and the SQLite docs automatically pop up in your web browser. This is better than a pile-of-HTML-files in that it can use server-side computing for things like the "Search".

(2) Not yet published or documented, but you can do "make sqltclsh" from the SQLite source tarball and generate a TCL interpreter with SQLite built in. If you also append an SQLite database to this interpreter, it reads its scripts from the database. Use this to build stand-alone Tcl/Tk/SQLite applications.

(3) By 3rd-party user request: the Fossil version control system allows a Fossil repository (which is just an SQLite database) to the end of the "fossil.exe" binary. This is being used (I am told) to provide a rich package of read-only but versioned content to non-technical users. The non-techies just put the "document.exe" file on there windows desktop and double-click, and a webserver pops up showing the reports they need, with complete historical versioning provided by Fossil.


"Not yet published or documented, but you can do "make sqltclsh" from the SQLite source tarball and generate a TCL interpreter with SQLite built in. If you also append an SQLite database to this interpreter, it reads its scripts from the database. Use this to build stand-alone Tcl/Tk/SQLite applications."

Holy shit. This will be a godsend. I've been searching for an easy and robust way to package Tcl scripts into executables independent of a full-blown Tcl installation (I've tried a couple different approaches, but they all tend to have issues on some platform or other). This answers my prayers rather thoroughly, provided it can be documented and made relatively easy to do.


That's a neat application for this. Don't use it as a live data store, but as a (much) better way to embed and access static assets and default data, maybe even embed the default schema for a new installation where it copies the data to a normal writable file on first run.

Many binaries are signed, and just having data appended to them arbitrarily would break signatures. But if it's only used for static data anyways that wouldn't be a problem.


Localization strings might make sense. But graphics and audio? Seems a little cumbersome to have to re-blob the graphics resources into the database each time you make an update.


This would be more for read-only assets embedded into the executable itself (e.g. the little icons on buttons and stuff in a GUI). You probably shouldn't be updating these at runtime, and at compile-time I'd assume there would be tools doing it for you.


The executable file have defined areas, as long as you append/write to that executable, you're working on an area of the executable file that doesn't interfere with anything else


And about virus protection? Am I missing something too, but this seems impractical in desktop apps or even production environment - you always can whitelist, but this a concern and trust doesn't come cheap...


Writing would be scary but reading should be fine. That’d allow importing an entire database worth of data into an app packaged in a single executable. Rather nifty!


So, resource fork 2.0?


I could use this! While converting someone's Win32 C program to C++, it struck me that most of the magic numbers, window class names, arrays of GUI control data etc. which are defined in code (in both windows.h and the user source code) could be defined in a read-only sqlite database instead. And with this link here, it wouldn't even have to be a separate file.


On the user side, most of that can be stored in the resource section (for PE format executables, so Windows only for all practical intents and purposes). It has annoyed me many times that no such thing exist in Unix land (ELF, basically), and instead I have to rely on 'bin2hex' style tools and store that data in the data section.


Elf has the concept of sections, and you can make as many of them as you want. The standard gnu Linker also automatically creates labels that you can use to get ahold of those sections. For example for a section titled XYZ, it'll automatically create to extern variables named __xyz_start and @__xyz_end sorry some such thing. You can place things into sections from source code, or add them later using objcopy


Well yes, let me rephrase then: it's a shame there is no standard API to categorize and access data in other sections.


Anything SQLite I upvote. ^_^


I encourage everybody to check out sqlite's ext/misc directory. There are lots of interesting extensions there, and most people don't know about it.


that's so cool, I don't understand if this is available in the standard distribution. it seem so, but when i tried to use the command-line tool i get "no such VFS", maybe i am missing something. Is there a tutorial/introduction on how to use it? is exactly what i needed for a couple of project

edit: found! i need to use the '--append' option


VFSes just a c object created as part of the sqlite runtime and are registered and activated ^1 with the c api. You can recompile the sqlite command line to open with a given vfs object, or enable the URI filenames feature ^2 which allows opening a file with a given vfs. For the appendvfs in th original post, you probably need to use a new version of sqlite with it in the sourcecode

^1 https://www.sqlite.org/c3ref/vfs_find.html

^2 https://www.sqlite.org/uri.html


That's the kind of amazing emergent use cases that you get by sticking to known, old technologies and not doing anything supposedly clever.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: