SQLite-HTTP: A SQLite extension for making HTTP requests

alexgarcia-xyz · on Aug 10, 2022

Author here, happy to answer questions!

Direct link to the project: https://github.com/asg017/sqlite-http

Few other projects of interest:

- sqlite-html: https://github.com/asg017/sqlite-html

- sqlite-lines: https://github.com/asg017/sqlite-lines

- various other extensions: https://github.com/nalgeon/sqlean

And past HN threads:

- https://news.ycombinator.com/item?id=32335295

- https://news.ycombinator.com/item?id=32288165

xnx · on Aug 10, 2022

Side note: I love the attention that SQLite is getting now. It's so refreshing after the dark ages of misapplied technology: microservices, Docker, NoSQL, and countless front-end frameworks.

emadehsan · on Aug 10, 2022

On a side note, Observable is stupendous. Interactive Network Graphs, Charts, Maps, Animations. Not sure why we don't see it used enough.

Mike Bostock himself is constantly adding latest work https://observablehq.com/@mbostock. Great learning resource.

tlarkworthy · on Aug 11, 2022

some backend stuff as well https://observablehq.com/@tomlarkworthy/redis-backend-1

simonw · on Aug 10, 2022

I picked up some very neat new SQLite tricks from this post independent of the extension itself (which is very cool) - I didn't know about the "define" module which lets you create new table-valued functions that are defined as SQL queries themselves.

1vuio0pswjnm7 · on Aug 10, 2022

Would be interesting if this can do HTTP/1.1 pipelining. Need to take a closer look.

What makes me hesitate to look closer is that I almost always am doing some text processing after the HTTP response and before the SQL commands. Due to resource constraints, I do not want to store HTML cruft.

I mostly am doing

HTTP response --> text processing --> SQLite3

rather than

HTTP response --> SQLite3

However I also have a need for storing different combinations of HTTP request headers and values in a database. I currently use the UNIX filesystem as a database (one header per file, djb's envdir loads the headers into environment from a selected folder), but maybe I could use SQLite3. For unprocessed text, e.g., from pipelined DoH responses, I use tmux buffers as a temporary database. Then do something like

   HTTP responses --> tmux loadb /dev/stdin
   tmux saveb -b b1 /dev/stdout|textprocessingutility --> ip-map.txt (append unique, i.e., "add unique")

The ip-map.txt file gets loaded into memory of a localhost forward proxy.

Databases, such as NetBSD's db, djb's cdb, kdb+, or sqlite3 can help with the "append unique" step if the data gets big.

Note: Any JSON I retrieve is "text", not binary data. Most HTTP responses are HTML, sometimes with embedded JSON. Pipelined DoH is binary but I still use drill to print the packets as text. (When I finally learn ldns I will stop using drill.)

alexgarcia-xyz · on Aug 10, 2022

re pipelining: unless Go's network package does this magically under the hood, then sqlite-http doesn't take advantage of that! If you think that would be a nice feature, file an issue and I'll see what can be done

How complex is your text processing? If I'm working with a small dataset, I typically just save the "raw" responses into a table and create several intermediate tables that extracts the data I need. Something like:

  create table raw_responses as http_get('https:/....');
  create table intermediate1 as select xxxx(response_body) as message from raw_responses;
  create table final as select message from intermediate;

(some CTEs[0] will be very useful here)

The "lambda" pattern described about half-way through this post might be useful as well, if you need to do some complex text processing.

[0] https://www.sqlite.org/lang_with.html

robmccoll · on Aug 10, 2022

Go's HTTP client doesn't support pipelining in HTTP 1.1. It will do keep alive / socket re-use. HTTP 2+ have support for request multiplexing which Go supports. Your code appears to create a new client on each request, so I'm not sure if the underlying transport will be re-used or if any multiplexing will take place. I would be surprised if it is.

iknownothow · on Aug 10, 2022

Wouldn't you be doing JSON processing instead of text processing?

I myself never had to do text processing but instead JSON processing and that was convenient to do in Postgres directly, and I assume it's convenient to do in SQLite given the recent advancements in SQLite's JSON capabilities.

1vuio0pswjnm7 · on Aug 17, 2022

djbdns instead of ldns

Look also at possibility of libpcap for printing catenated DNS packets.

dimgl · on Aug 10, 2022

While this is really cool (and a feat of engineering no less), I'm always really concerned when someone suggests to make an HTTP call from their database server.

Many years ago my boss told me to make a scheduled task in Windows to execute a SQL query to make an HTTP call and I asked him why we couldn't just use crontab/cURL? His response: "cURL? Like from the 90s?"

Anyhow I didn't last very long. Got fired shortly thereafter.

alexgarcia-xyz · on Aug 10, 2022

Yeah, "dont make network requests from the database" is a common antipattern, but it doesn't quite apply with sqlite-http. It's mainly meant for data processing scripts that use SQL, which is typically on a local database file with pretty low risk. You probably wouldn't want/need sqlite-http on a SQLite db that a live application uses, unless you use the no network[0] option

[0] https://github.com/asg017/sqlite-http/blob/main/docs.md#no-n...

hsbauauvhabzb · on Aug 10, 2022

How to get hacked: 2022 edition.

SSRF and lambdas directly in your db engine. What other antipatters could you possibly want?

alexgarcia-xyz · on Aug 10, 2022

On SSRF: There is a "no network"[0] option for the extension, if you don't trust SQL input. This is what the example Datasette server[1] uses

Also to note, this is mainly meant for use with local data processing, I doubt there's a need for sqlite-http in a SQLite DB used in regular applications. If you're just wrangling some data locally with SQLite with these extensions, I doubt there's much of a security risk

[0] https://github.com/asg017/sqlite-http/blob/main/docs.md#no-n... [1] https://sqlite-extension-examples.fly.dev/data?sql=select+*+...

hsbauauvhabzb · on Aug 11, 2022

So your basing your security model on the assumption that developers won’t use it for things you never intended it to be used for?

simonw · on Aug 11, 2022

I think that's absolutely fine.

The creators of Python still released Python, even though people could use it to build insecure software.

jayvanguard · on Aug 10, 2022

Just because you can do it, doesn't mean you should do it.

_boffin_ · on Aug 11, 2022

... but I'm so damn excited they did!

  Engineers doing engineering for the sake of engineering.

mayli · on Aug 10, 2022

Soon: Sqlite-OS

emptyparadise · on Aug 11, 2022

SQLite-based APIs, you ship your code as an SQLite extension and users interact with it via a virtual table.

comex · on Aug 11, 2022

osquery legitimately fits that description, and I wouldn't mind seeing more like it.