More

barakm · 2025-01-18T03:45:05 1737171905

The lack of self-awareness with this post…

Never mind the shameless self-promotion, doing it while decrying self-promotion means I’m never looking at your app, full-stop.

barakm · 2024-12-25T03:15:03 1735096503

Literally `jj undo`

There's a whole operation log (`jj op log`), as another sequence of actions, and you can undo them. It gets crazier from there, but I've also been enjoying jujutsu lately and I had to RTFM a good couple of times to get comfortable with it.

stouset · 2024-12-26T00:40:06 1735173606

Also cool: jujutsu remembers every previous state of all your changes. So you make some edits, run `jj status` (or anything), and keep working.

Two hours later you realize your experiment is a total failure and you wish you had the half-working changes from earlier. You don’t need to have actively commited anything, it’s all in the obsolte log and you can retrieve it.

I want to repeat that last bit. Unlike git, there was no point where you needed to finalize your changes with a commit in order for them to be persisted and archived for later retrieval.

xdfgh1112 · 2024-12-25T03:20:40 1735096840

You can even undo operations that aren't the most recent one.

corytheboyd · 2024-12-25T03:41:31 1735098091

Huh this comment thread may have convinced me to finally try it. I’m comfortable with git, and fixing mistakes in git, but not being able to trivially reverse any and every transaction is annoying.

swaits · 2024-12-25T07:42:51 1735112571

I made a mess of a jj repo recently. A big mess. Ran `jj op log`, found the operation I wanted, and just `jj restore`d my way back to that exact prior state. Then started what I was planning afresh and got it right. Felt magical.

And I didn't have to learn anything like the hell of git refs. The UI is self-explanatory as long as you know the feature exists.

Another very nice thing is that conflicts aren't roadblocks. They're legit changes, first-class jj citizens. So, when I do something that ends in a conflict, I can just move over to a different change, do the work I'm already trying to do, and then go back and resolve the conflict when I want.

corytheboyd · 2024-12-25T16:14:17 1735143257

Ah yes, I’ve heard it makes the hell of huge conflict ridden rebases a bit less annoying. Always wished I could just get to the end of the rebase to see it all in context, _then_ resolve conflicts, but with the option to resolve the obvious ones along the way too.

stouset · 2024-12-25T17:25:31 1735147531

Yep, this was the thing that encouraged me to look at jj. I am so done with rebases where I’m stuck in some meta-state where I need to figure out exactly what needs to be done—linearly, right now, and without making any mistakes—but all the regular tooling to jump around and explore is unavailable until I’m finished with the whole thing.

I would honestly use jj for just this even if that was all it offered.

barakm · 2024-12-25T04:11:20 1735099880

I hear ya. I've done some crazy things with `git reflog` (which I always pronounce git re-flog instead of ref-log) but it is not fun.

There's the usual cognitive shift, and in my case a bit of a best-practice shift I had to go through to get comfortable with `jj` -- specifically, keeping the repo directory pristine (and putting my temp output in a git-ignored dir, or outside the repo) because /everything/ is tracked. Flip side being, I've forgotten to `git add` new files at least a few times a year, and now that won't be an issue.

But yeah, can highly recommend, and I'm excited to start to jump between multiple open branches^Wbookmarks at will and learn more about the intricacies of conflict management (and the original link is a good glimpse at that!)

stavros · 2024-12-25T08:22:50 1735114970

Are new files tracked by default? If I forget to ignore a bunch of build outputs, will that make my repo huge immediately?

stouset · 2024-12-25T10:23:39 1735122219

There’s a configurable cap on max file size to auto-add, IIRC. It defaults to something “reasonable”.

If you do somehow add a terabyte of small files by accident, it’s still just git under the hood so you can make sure nothing active points at them and GC them.

stavros · 2024-12-25T10:28:10 1735122490

I'm thinking of something like node_modules, sounds like I'll need to not forget.

steveklabnik · 2024-12-25T14:20:04 1735136404

Those are usually in your gitignore, right?

Also, this behavior (automatically tracking) is configurable. I thought I would hate it but I actually really like it.

stavros · 2024-12-25T14:21:45 1735136505

Yeah they usually are, I'm sure it's great because I add files more often than I ignore, but I need to be careful with the temporary files I write left and right in the repo dir during development.

stouset · 2024-12-25T16:23:38 1735143818

Why? You can easily remove them if they’re accidentally added.

stavros · 2024-12-25T16:28:33 1735144113

Just because I might not remember/notice, and I don't want sensitive data to be committed by accident if I put an env var on disk for some reason.

stouset · 2024-12-25T16:32:13 1735144333

If you check `jj status` regularly and/or use `jj split` to build up your commits (like `git add -p`) then you’ll notice. They might end up in your local repo (until a gc) at worst.

stavros · 2024-12-25T16:36:09 1735144569

Ah, that's not too bad then, thank you!

barakm · 2024-12-19T06:26:08 1734589568

Oof. I’m sorry.

From someone acquired by ServiceNow… I hope your stock options are worth it

barakm · 2024-12-19T00:37:54 1734568674

Oh, dreidel as a drinking game sounds both horrific and hilarious. Here’s how I thought of it:

Everyone starts with infinite gelt — or at least a liver — and ante/paying into the pot is either taking a drink or paying a drink token. The four actions are then:

- Nothing

- Acquire a newly created token

- Everybody else: drink, or exile a token

- You: drink, or exile a token

Don’t need to run PRISM to figure how quickly that devolves.

barakm · 2024-10-10T04:24:27 1728534267

“One, two, three, four

Who’s punk what’s the score?”

Get outta here with the gatekeeping

barakm · 2024-09-23T19:10:38 1727118638

The Go is the more mature implementation; it's generally a lot easier to refactor Go as you're figuring things out and then can build the Rust version (which is a good bit faster)

Sharing e2e test suites (realistically, two different test binaries to run at CI time) is something I'm cleaning up right now.

barakm · 2024-07-09T03:15:58 1720494958

Wow, so many haters :(

I love Rye. It does what it says on the tin. It makes the whole venv/Python-version/packaging process actually pleasant, and it’s invisible to someone used to Python-official usage (pyproject.toml et al). And it makes Python feel like Cargo, which is a great thing to work with too.

leontrolski · 2024-07-09T06:02:06 1720504926

If like me, you've ignored poetry and friends and stuck with pip-tools (congrats!), uv (used by rye internally) is a drop in replacement.

IMHO pip-tools was always the far nicer design than poetry, pipenv etc as it was orthogonal to both pip and virtualenv (both of which have been baked into Python for many years now). I would argue Rye is the iterative, standards compliant approach winning out.

Beyond the speedups from Rust, it's nice to have some opinionated takes on where to put virtualenvs (.venv) and how to install different Python versions. It sounds small, but since wheels fixed numpy installs, sane defaults for these and a baked in pip-tools is basically all that was missing. Talking of which, what has been the point of anaconda since binary wheels became a thing?

ivirshup · 2024-07-09T19:41:58 1720554118

> what has been the point of anaconda since binary wheels became a thing?

When you need python + R + some linked or CLI binary in an isolated environment. Also you will use the same tool to manage this environment across multiple OSs (e.g. no OS specific `apt`, `brew`, etc).

pantsforbirds · 2024-07-09T21:44:20 1720561460

I still love miniconda for DS work. If you want to setup a project to process some videos using some python libraries, you can use conda to install a specific version of ffmpeg into the project without worrying about your system installation.

Lot's of random C/C++/Fortran libraries that can be used directly from conda and save a massive headache.

ranyml · 2024-07-09T08:50:04 1720515004

On Linux, binary wheels are unreliable and sometimes segfault.

brightball · 2024-07-09T04:23:54 1720499034

As somebody who tried to pick up Python after hearing there was one way to do everything…the installation and environment management experience was a train wreck.

Glad to hear it’s getting better finally.

rhizome31 · 2024-07-09T13:43:30 1720532610

What you heard is from the Zen of Python, a short text meant to express core ideas behind the design of the Python language. You can read it by typing `import this` in the Python interpreter. The exact sentence is:

    There should be one-- and preferably only one --obvious way to do it.

This sentence was coined as an answer to a catch phrase that was used to describe the Perl programming language: There Is More Than One Way To Do It. Giving programmers more freedom to express themselves in different ways was presented as a good thing by the Perl community.

Python was partly marketed as a replacement for Perl and the sentence from the Zen of Python expresses a difference from Perl. The idea is that having different ways to do things leads to confusion and code that is harder to maintain, problems that Perl was supposed to incur according to its critics.

The sentence was true to a certain extent when it came to the Python language. It don't think it has ever been true for the Python ecosystem. For example, during the early 2000s, there were a plethora of web back-end frameworks for Python. As the Python language has since gained a lot of features, I'm not even sure that this is true for the language itself.

Regarding package management, this has always been a weak point of the Python ecosystem. Python developers often make jokes between themselves about that. Unfortunately, I would be very surprised if this project was to put an end to this issue.

Despite all this, I encourage you to learn Python because it's a very interesting and powerful language with an extremely rich ecosystem. Yes, there are many ways to do the same thing with it. But on the other hand, there is a way to do pretty much anything with it.

dilawar · 2024-07-09T03:42:27 1720496547

> Python feel like cargo

I am sold. Was thinking of trying out pixie after poetry took whole day and still couldn't resolve deps.

Looks like there are more python package managers that chat apps from Google ?

amingilani · 2024-07-09T04:54:14 1720500854

> poetry took whole day and still couldn't resolve deps.

I hate doing this, but the solution is to reduce the search space for poetry to find a compatible version.

Verbosely install with poetry (-vvv) and note the package it gets stuck on. Find the currently installed version from the lock file and specify it as the minimum in your pyproject.toml file.

The time to find a solution went from 2-4 hours to <90 seconds when I did this a few months ago for a large complex work project.

ericjmorey · 2024-07-09T03:52:58 1720497178

Pixi is limited in focus to the Conda ecosystem within Python's ecosystem. Rye is not quite what Cargo is to Rust, it's more like a faster Poetry. Both Rye and Pixi are using uv, which aspires to close the gap for Python packaging tools to be the Cargo of Python. Rye will likely fold into UV at some point in the future.

bbor · 2024-07-09T03:56:44 1720497404

I was going to complain, but I’ll ask you/yall instead: what do you mean “makes it actually pleasant”? Is it too hard to summarize? Because I don’t think I ever identified anything about Anaconda or Poetry that felt like a particular choice, at least UX-wise. And curation-wise, it seems like a hard sell to trust a smaller org over the larger established group.

In other words: what does it say on the tin?? All I can read is “good package manager is good because good and fast and good”. Maybe there’s a comparison or ethos page…

dr_kiszonka · 2024-07-09T05:19:49 1720502389

A lot of data people use Anaconda. Anaconda is sooo slow. Even on a very beefy workstation, Anaconda often needs > 10 mins to solve an environment, and often fails. I would be excited to try something without these issues.

aldanor · 2024-07-09T10:19:22 1720520362

Mamba fully replaces anaconda and uses a sat solver in c++. IIRC, conda now uses libmamba under the hood as well. If you post a list of dependencies, I can time it on my box and post the timings here. (Not saying conda/mamba are best nor perfect but the last time I've seen 10m resolve times was a very long time ago)

emptysongglass · 2024-07-09T15:27:35 1720538855

Everyone using Anaconda should switch to Mamba or Pixi, if not for speed, then for Anaconda's licensing switcheroo. Their legal department will chase you to the ends of the earth to get their money.

Really horrific experience with the folks at Anaconda. Stay far away.

rat87 · 2024-07-09T06:33:04 1720506784

Speed for one thing. Rye also manages your python version by downloading a version and with a less finicky setup the pipenvs/pipenv virtualenv shell scripts(which take longer and are less reliable because they compile python from source instead of downloading it).

As someone who has had to deal with his teams python setup. Installing poetry and pipenv and compiling Python automatically on every users machine is a lot more finicky in practice. Plus poetry wasn't just much slower sometimes locking took many minutes to finish appearing to lock up.

There's also rye install/rye tool install works like pipx, install tools in a silo-ed virtualenv with a run file in the rye dir you've already added to $PATH (it also has parameters to pass in extra parameters such as installing db packages for slaacodegen, and optionally exposing their executables on your path). It bundles other tools from astral ie ruff which is the new hotness for python linting /auto formatting/import sorting that's also fast/written in rust.

I feel with rye/uv/ruff astral is finally making progress towards a fast useful all in one python package Manager/tool like cargo. And they keep on adding a lot of useful features, for example ruff is slowly working towards implementing both flake8 and pylint and other lints.

barakm · on May 3, 2024

> Thy are binary vectors with 768 dimensions, which takes up 96 bytes (768 / 8 = 96).

I guess I’m confused. This is honestly the problem that most vector storage faces (“curse of dimensionality”) let alone the indexing.

I assume that you meant 768 dimensions * 8 bytes (for a f64) which is 6144 bytes. Usually, these get shrunk with some (hopefully minor) loss, so like a f32 or f16 (or smaller!).

If you can post how you fit 768 dimensions in 96 bytes, even with compression or trie-equivalent amortization, or whatever… I’d love to hear more about that for another post.

Ninja edit: Unless you’re treating each dimension as one-bit? But then I still have questions around retrieval quality

alexgarcia-xyz · on May 3, 2024

Author here - ya "binary vectors" means quantizing to one bit per dimension. Normally it would be 4 * dimensions bytes of space per vector (where 4=sizeof(float)). Some embedding models, like nomic v1.5[0] and mixedbread's new model[1] are specifically trained to retain quality after binary quantization. Not all models do tho, so results may vary. I think in general for really large vectors, like OpenAI's large embeddings model with 3072 dimensions, it kindof works, even if they didn't specifically train for it.

[0] https://twitter.com/nomic_ai/status/1769837800793243687

[1] https://www.mixedbread.ai/blog/binary-mrl

barakm · on May 3, 2024

Thank you! As you keep posting your progress, and I hope you do, adding these references would probably help warding off crusty fuddy-duddys like me (or at least give them more to research either way) ;)

lsb · on May 3, 2024

Binary, quantize each dimension to +1 or -1

You can try out binary vectors, in comparison to quantize every pair of vectors to one of four values, and a lot more, by using a FAISS index on your data, and using Product Quantization (like PQ768x1 for binary features in this case) https://github.com/facebookresearch/faiss/wiki/The-index-fac...

barakm · on May 3, 2024

Appreciate the link; but would still like to know how well it works.

If you have a link for that, I’d be much obliged

lsb · on May 3, 2024

It depends on your data and your embedding model. For example, I was able to quantize embeddings of English Wikipedia from 384-dimensions down to 48 7-bit dimensions, and the search works great: https://www.leebutterman.com/2023/06/01/offline-realtime-emb...

queuebert · 2024-05-03T15:39:15 1714750755

BTW, the "curse of dimensionality" technically refers to the relative sparsity of high-dimensional space and the need for geometrically increasing data to fill it. It has nothing to do with storage. And typically in vector databases the data are compressed/projected into a lower dimensionality space before storage, which actually improves the situation.

barakm · on March 23, 2024

The article is right; I do not like ddevault, and refuse to use his software on my machines as a result. I will never pay him money.

The article is also right that Redis is kinda 'complete' and merely staying the course and slowly maintaining bugs/CVEs/etc. is a vaild type of fork.

If Redict happens to become the standard 'open' alternative (eg, OpenSearch, OpenTofu, etc), I'll suck it up and use his fork. We need less SSPL BS and more just 'complete' software. I may not like that jerk, but I don't like the jerks who made the re-licensing decision much, much more.

barakm · on Dec 3, 2023

This is my problem too. I love Ubiquiti's stuff when it works, but my home switches 10Gig at the moment, and 10Gig from my ISP is coming soon. Right now I run a full-on firewall 1U main server/router that can do that, but I'd drop it in a heartbeat if Ubiquiti had a reasonably priced (sub $1K) 10Gig version of exactly this product.