Hacker News new | past | comments | ask | show | jobs | submit login

That's great to hear! The clustered index sounds really cool. Especially since SQLite tells you about ORDER BYs in xBestIndex (with the nOrderBy[0]), so it would be super cool to have super-fast ORDER BYs with those.

Very interested to see how xFindFunction works for you. One limitation I've found is that you don't know if a user uses a xFindFunction inside of xBestIndex (ie at query time), unless 1) it's part of a WHERE clause and 2) only two arguments are provided, the first being a column value and the 2nd any literal. I've found this limiting in the past, only having 1 argument to work with in that narrow case. But I'm sure there's clever tricks there!

One trick I've noticed: You can kindof detect a COUNT(*) with the `colUsed` field in xBestIndex. In that case, `colUsed` will be 0 (ie 0 columns are requested), so you can use that as a signal to just iterate over N times instead of accessing the underlying data. Still slow, but you can probably do something like ~1 million/sec, but better than accessing the data that many times!

[0] https://www.sqlite.org/vtab.html#order_by_and_orderbyconsume...




I may be missing something, but I'm not sure you need to detect COUNT(*) specifically.

For a query that's just SELECT COUNT(*) FROM… SQLite will just iteratively call xNext/xEof to count rows. As long as you don't actually load data until some is requested by xColumn, what's the point?

And the above (lazily loading only the requested columns) is exactly what I'd expect this extension to be doing already.


That is a great trick for COUNT(*), thank you!

That's disappointing about xFindFunction. Once I start digging into it more, I will let you know if I find any other clever tricks that you might be able to use in your extensions as well.

Have you ever reached out to the sqlite team about limitations in the virtual table mechanism that you have encountered? I'm curious how open they are to extending what is possible with virtual tables.


Just one note on using the incremental BLOB I/O API that you might want to consider (if you're not aware of it, I wasn't, found it unintuitive) is that blobs larger than page size are stored in linked lists of pages, and there is no true random access; accessing a large blob at a large offset touches all database pages til that offset.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: