The way this analysis, and the original dataset were created, makes no sense. This is, in part, not the author's fault, since the original data [1, 2] is flawed.
First, the original data was constructed like this: "...The next step was to format the raw HTML files into the full chord progression of each song, collapsing repeating identical chords into a single chord (’A G G A’ became ’A G A’)..."
Already this makes no sense - the fact that a chord is repeated isn't some sort of typo (though maybe it is on UltimateGuitar). For example, a blues might have a progression C7 F7 C7 C7 - the fact that C7 is repeated is part of the blues form. See song 225 from the dataset, which is a blues:
A7 D7 A7
D7 A7
E7 D7 A7
Should really be:
A7 D7 A7 A7
D7 D7 A7 A7
E7 D7 A7 A7
With these omissions, it's a lot harder to understand the underlying harmony of these songs.
The second problem is that we don't really analyze songs so much by the chords themselves, but the relationships between chords. A next step would be to convert each song from chords to roman numerals so we can understand common patterns of how songs are constructed. Maybe a weekend project.
This is probably a silly question, but how do I use loops? For example, if my backend returns an array of TODO items, how can i iterate through that and display on the frontend?
I interned at IBM writing mainframe software in 2008 or so. One thing I remember them saying - there used to be TV commercials - that a single mainframe could replace a rooms worth of commodity hardware.
I would have assumed that someone would have started a cloud provider with Linux VMs running on mainframes instead of racks of pizza boxes. What was missing - are the economics of mainframes really that bad?
Mainframes are the polar opposite of commodity hardware. Those pizza boxes are commodity because they're plentifully available, and you can mix-and-match if needed, there's nothing cheaper for you to run your VMs on. Running them on a mainframe would put IBM as a middle man between you and your business.
Also, mainframes/midranges are all about stability, security, and robustness, not performance. For example, IBM i (the OS, not the machine) has a hardware dependent layer, and a hardware-independent one. This allows for drastic hardware changes without affecting the high-level applications. A monolith would be arguably more efficient, but it matters more that the hardware-independent layer stays rock-solid.
Ultimately, what keeps us going is we want these services to exist for our own side-project development and it's an extra boost of motivation when others use our services.
All of our marketing is through HN/lobsters/reddit since that's our target demo.
I am building an image gallery as a side project to play with AI coding tools. I got really far in just 2 days and plan to open source it soon. It:
- Uses an encrypted badgerdb to keep track of metadata
- Uses rclone (with an encrypted backend) for file storage in s3 or any backend rclone supports
- Automatically indexes and generates video thumbnails and transcodes to webm to be streamed in the browser
- Slideshows
- has a fairly decent ui that doesn’t look like it was developed by a backend engineer
The goal was to be able to attach my own s3 storage and keep all data encrypted at rest. It’s written in go and deploys as a single binary.
But for real, I've also been working on large-scale photo/video management, and the question of where to keep additionally metadata has bothered me. It seems right to want to keep it "in" the file, and so I want to write it into EXIF, or even steganography, or something.
This is really cool - something I've been looking for with Flask. Cleanest implementation with just the decorator that I've seen.
(As an aside, is there an open-source UI for docs that actually looks good - professional quality, or even lets you try out endpoints? All of the decent ones are proprietary nowadays.)
For a clean, documentation UI, the best open-source options right now are probably Swagger UI and ReDoc. FastOpenAPI uses both by default:
- Swagger UI: interactive, lets you try out endpoints live.
- ReDoc: more minimalist and professional-looking but static.
If you're looking for something different, you might check out RapiDoc, which is also open-source, modern, customizable, and supports interactive API exploration.
(Someone could write an actually open source UI extension for duckdb, but that would require a lot of investment that so far only motherduck has been able to provide.)
I've looked at quite a few options, and this one (the product of a single person) is a great base, and MIT licensed:
https://github.com/caioricciuti/duck-ui
I’ll never understand how any UI projects don’t include an actual screenshot of their project as the first thing on their landing page. It seems so obvious.
I find the SqlLab in apache superset to be very good, and I have duckdb as a data source (anything that supports SqlAlchemy works). It works very well. To be honest, when I first saw the screenshot, I thought it was SqlLab. I haven't actually tried the duckdb ui, though.
That is really interesting. I am thinking of toying with same combination for my small project. Can you share a bit on your use case? Would love to know more.
Super interested in your approach to error wrapping! It’s a feature I haven’t used much.
I tend to use logs with line numbers to point to where errors occur (but that only gets me so far if I’m returning the error from a child function in the call stack.)
Simply wrap with what you were trying to do when the error occurred (and only that, no speculating what the error could be or indicate). If you do this down the call stack, you end up with a progressive chain of detail with strings you can grep for. For example, something like "processing users index: listing users: consulting redis cache: no route to host" is great. Just use `fmt.Errorf("some wrapping: %w", err)` the whole way up. It has all the detail you want with none of the detail you don't need.
I used ULIDs for a time until i discovered snowflake ids. They are (“only”) 64 bits, but incorporate timestamps and randomness as well. They take up way less space than ULIDs for this purpose and offer acceptably rare collisions for things I’ve worked on.
The original snowflake id developed at twitter contains a sequence number so they should never collide unless you manage to overflow the sequence number in a single millisecond.
Also, you can store them as a BIGINT, which is awesome. So much smaller than even a binary-encoded UUID. IIRC the spec reserves the right to use the sign bit, so if you’re concerned, use BIGINT UNSIGNED (natively in MySQL, or via extension in Postgres).
I wish more people cared about the underlying tech of their storage layer – UUIDv4 as a string is basically the worst-case scenario for a PK, especially for MySQL / InnoDB.
First, the original data was constructed like this: "...The next step was to format the raw HTML files into the full chord progression of each song, collapsing repeating identical chords into a single chord (’A G G A’ became ’A G A’)..."
Already this makes no sense - the fact that a chord is repeated isn't some sort of typo (though maybe it is on UltimateGuitar). For example, a blues might have a progression C7 F7 C7 C7 - the fact that C7 is repeated is part of the blues form. See song 225 from the dataset, which is a blues:
A7 D7 A7 D7 A7 E7 D7 A7
Should really be:
A7 D7 A7 A7 D7 D7 A7 A7 E7 D7 A7 A7
With these omissions, it's a lot harder to understand the underlying harmony of these songs.
The second problem is that we don't really analyze songs so much by the chords themselves, but the relationships between chords. A next step would be to convert each song from chords to roman numerals so we can understand common patterns of how songs are constructed. Maybe a weekend project.
[1] https://arxiv.org/pdf/2410.22046 [2] https://huggingface.co/datasets/ailsntua/Chordonomicon/blob/...
reply