Hacker News new | past | comments | ask | show | jobs | submit login
SQLite-HTTP: A SQLite extension for making HTTP requests (observablehq.com)
65 points by b_mc2 on Aug 10, 2022 | hide | past | favorite | 21 comments



Author here, happy to answer questions!

Direct link to the project: https://github.com/asg017/sqlite-http

Few other projects of interest:

- sqlite-html: https://github.com/asg017/sqlite-html

- sqlite-lines: https://github.com/asg017/sqlite-lines

- various other extensions: https://github.com/nalgeon/sqlean

And past HN threads:

- https://news.ycombinator.com/item?id=32335295

- https://news.ycombinator.com/item?id=32288165


Side note: I love the attention that SQLite is getting now. It's so refreshing after the dark ages of misapplied technology: microservices, Docker, NoSQL, and countless front-end frameworks.


On a side note, Observable is stupendous. Interactive Network Graphs, Charts, Maps, Animations. Not sure why we don't see it used enough.

Mike Bostock himself is constantly adding latest work https://observablehq.com/@mbostock. Great learning resource.



I picked up some very neat new SQLite tricks from this post independent of the extension itself (which is very cool) - I didn't know about the "define" module which lets you create new table-valued functions that are defined as SQL queries themselves.


Would be interesting if this can do HTTP/1.1 pipelining. Need to take a closer look.

What makes me hesitate to look closer is that I almost always am doing some text processing after the HTTP response and before the SQL commands. Due to resource constraints, I do not want to store HTML cruft.

I mostly am doing

HTTP response --> text processing --> SQLite3

rather than

HTTP response --> SQLite3

However I also have a need for storing different combinations of HTTP request headers and values in a database. I currently use the UNIX filesystem as a database (one header per file, djb's envdir loads the headers into environment from a selected folder), but maybe I could use SQLite3. For unprocessed text, e.g., from pipelined DoH responses, I use tmux buffers as a temporary database. Then do something like

   HTTP responses --> tmux loadb /dev/stdin
   tmux saveb -b b1 /dev/stdout|textprocessingutility --> ip-map.txt (append unique, i.e., "add unique")
The ip-map.txt file gets loaded into memory of a localhost forward proxy.

Databases, such as NetBSD's db, djb's cdb, kdb+, or sqlite3 can help with the "append unique" step if the data gets big.

Note: Any JSON I retrieve is "text", not binary data. Most HTTP responses are HTML, sometimes with embedded JSON. Pipelined DoH is binary but I still use drill to print the packets as text. (When I finally learn ldns I will stop using drill.)


re pipelining: unless Go's network package does this magically under the hood, then sqlite-http doesn't take advantage of that! If you think that would be a nice feature, file an issue and I'll see what can be done

How complex is your text processing? If I'm working with a small dataset, I typically just save the "raw" responses into a table and create several intermediate tables that extracts the data I need. Something like:

  create table raw_responses as http_get('https:/....');
  create table intermediate1 as select xxxx(response_body) as message from raw_responses;
  create table final as select message from intermediate;
(some CTEs[0] will be very useful here)

The "lambda" pattern described about half-way through this post might be useful as well, if you need to do some complex text processing.

[0] https://www.sqlite.org/lang_with.html


Go's HTTP client doesn't support pipelining in HTTP 1.1. It will do keep alive / socket re-use. HTTP 2+ have support for request multiplexing which Go supports. Your code appears to create a new client on each request, so I'm not sure if the underlying transport will be re-used or if any multiplexing will take place. I would be surprised if it is.


Wouldn't you be doing JSON processing instead of text processing?

I myself never had to do text processing but instead JSON processing and that was convenient to do in Postgres directly, and I assume it's convenient to do in SQLite given the recent advancements in SQLite's JSON capabilities.


djbdns instead of ldns

Look also at possibility of libpcap for printing catenated DNS packets.


While this is really cool (and a feat of engineering no less), I'm always really concerned when someone suggests to make an HTTP call from their database server.

Many years ago my boss told me to make a scheduled task in Windows to execute a SQL query to make an HTTP call and I asked him why we couldn't just use crontab/cURL? His response: "cURL? Like from the 90s?"

Anyhow I didn't last very long. Got fired shortly thereafter.


Yeah, "dont make network requests from the database" is a common antipattern, but it doesn't quite apply with sqlite-http. It's mainly meant for data processing scripts that use SQL, which is typically on a local database file with pretty low risk. You probably wouldn't want/need sqlite-http on a SQLite db that a live application uses, unless you use the no network[0] option

[0] https://github.com/asg017/sqlite-http/blob/main/docs.md#no-n...


How to get hacked: 2022 edition.

SSRF and lambdas directly in your db engine. What other antipatters could you possibly want?


On SSRF: There is a "no network"[0] option for the extension, if you don't trust SQL input. This is what the example Datasette server[1] uses

Also to note, this is mainly meant for use with local data processing, I doubt there's a need for sqlite-http in a SQLite DB used in regular applications. If you're just wrangling some data locally with SQLite with these extensions, I doubt there's much of a security risk

[0] https://github.com/asg017/sqlite-http/blob/main/docs.md#no-n... [1] https://sqlite-extension-examples.fly.dev/data?sql=select+*+...


So your basing your security model on the assumption that developers won’t use it for things you never intended it to be used for?


I think that's absolutely fine.

The creators of Python still released Python, even though people could use it to build insecure software.


Just because you can do it, doesn't mean you should do it.


... but I'm so damn excited they did!

  Engineers doing engineering for the sake of engineering.


Soon: Sqlite-OS


SQLite-based APIs, you ship your code as an SQLite extension and users interact with it via a virtual table.


osquery legitimately fits that description, and I wouldn't mind seeing more like it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: