Hacker Newsnew | past | comments | ask | show | jobs | submit | jehna1's commentslogin

I've been experimenting with Claude Code and different code-to-cad tools and the best workflow yet has been with Replicad. It allows for realtime rendering in a browser window as Claude does changes to a single code file.

Here's an example I finished just a few minutes ago:

https://github.com/jehna/plant-light-holder/blob/main/src/pl...


You need to use another tool to do the actual renames, like HumanifyJS does:

https://github.com/jehna/humanify


Author of HumanifyJS here! I've created specifically a LLM based tool for this, which uses LLMs on AST level to guarantee that the code keeps working after the unminification step:

https://github.com/jehna/humanify


Would it be difficult to add a 'rename from scratch' feature? I mean a feature that takes normal code (as opposed to minified code) and (1) scrubs all the user's meaningful names, (2) chooses names based on the algorithm and remaining names (ie: the built-in names).

Sometimes when I refactor, I do this manually with an LLM. It is useful in at least two ways: it can reveal better (more canonical) terminology for names (eg: 'antiparallel_line' instead of 'parallel_line_opposite_direction'), and it can also reveal names that could be generalized (eg: 'find_instance_in_list' instead of 'find_animal_instance_in_animals').


Yes, I think you could use HumanifyJS for that. The way it works is that:

1. I ask LLM to describe what the meaning of the variable in the surrounding code

2. Given just the description, I ask the LLM to come up with the best possible variable name

You can check the source code for the actual prompts:

https://github.com/jehna/humanify/blob/eeff3f8b4f76d40adb116...


More tools should be built on ASTs, great work!

I'm still waiting for the AST level version control tbh


Unison supposedly has an AST-aware version control system: https://www.unison-lang.org/


content-addressed too, I think!


Wow this looks so cool.


Smalltalk envy source controll


What kind of question does it ask the LLM? Giving it a whole function and asking "What should we rename <variable 1>?" repeatedly until everything has been renamed?

Asking it to do it on the whole thing, then parsing the output and checking that the AST still matches?


For each variable:

1. It asks the LLM to write a description of what the variable does

2. It asks for a good variable name based on the description from 1.

3. It uses a custom Babel plugin to do a scope-aware rename

This way the LLM only decides the name, but the actual renaming is done with traditional and reliable tools.


This answer is reassuring.

Based on it, I went and read the readme. The readme was also excellent, and answered every question I had. Great job, thank you, I'll be trying this.


Does it work with huge files? I'm talking about something like 50k lines.

Edit: I'm currently trying it with a mere 1.2k JS file (openai mode) it's only 70% done after 20 minutes. Even if it works therodically with 50k LOC file, I don't think you should try.


It does work with any sized file, although it is quite slow if you're using the OpenAI API. HumanifyJS works so it processes each variable name separately, and keeps the context size manageable for an LLM.

I'm currently working on parallelizing the rename process, which should give orders of magnitude faster processing times for large files.


It has this in the README

> Large files may take some time to process and use a lot of tokens if you use ChatGPT. For a rough estimate, the tool takes about 2 tokens per character to process a file:

> echo "$((2 * $(wc -c < yourscript.min.js)))" > So for refrence: a minified bootstrap.min.js would take about $0.5 to un-minify using ChatGPT.

> Using humanify local is of course free, but may take more time, be less accurate and not possible with your existing hardware.


This only talks about the cost.

I'm more concerned about if it can actually deobfuscate such large file (context) and generate useful results.


Looks useful! I will update the article to link to this tool. Thanks for sharing!


Super, thank you for adding the link! It really helps to get people to find the tool


Finally someone else using ASTs while working with LLMs and modifying code! This is such an under-utilized area. I am also doing this with good results: https://codeplusequalsai.com/static/blog/prompting_llms_to_m...


Super interesting! Since you're generating code with LLMs, you should check out this paper:

https://arxiv.org/pdf/2405.15793

It uses smart feedback to fix the code when LLMs occasionally do hiccups with the code. You could also have a "supervisor LLM" that asserts that the resulting code matches the specification, and gives feedback if it doesn't.


It's a shame this loses one of the most useful aspects of LLM un-minifying - making sure it's actually how a person would write it. E.g. GPT-4o directly gives the exact same code (+contextual comments) with the exception of writing the for loop in the example in a natural way:

    for (var index = 0; index < inputLength; index += chunkSize) {
Comparing the ASTs is useful though. Perhaps there's a way to combine the approaches - have the LLM convert, compare the ASTs, have the LLM explain the practical differences (if any) in context of the actual implementation and give it a chance to make any changes "more correct". Still not guaranteed to be perfect but significantly more "natural" resulting code.


Depends on how many tokens you want to spend.

Making the code, fully commenting it and also giving an example after that might cost three times as much


As someone who has spent countless hours and days deobfuscating malicious Javascript by hand (manually and with some scripts I wrote), your tool is really, really impressive. Running it locally on a high end system with a RTX 4090 and it's great. Good work :)


how do you make an LLM work on the AST level? do you just feed a normal LLM a text representation of the AST, or do you make an LLM where the basic data structure is an AST node rather than a character string (human-language word)?


The frontier models can all work with both source code and ASTs as a result of their standard training.

Knowing this raises the question, which is better to feed an LLM source code of ASTs?

The answer is really it depends on the use case, there are tradeoffs. For example keeping comments intact possibly gives the model hints to reason better. On the other side, it can be argued that a pure AST has less noise for the model to be confused by.

There are other tradeoffs as well. For example, any analysis relating to coding styles would require the full source code.


It looks like they're running `webcrack` to deobfuscate/unminify and then asking the LLM for better variable names.


I'm using both a custom Babel plugin and LLMs to achieve this.

Babel first parses the code to AST, and for each variable the tool:

1. Gets the variable name and surrounding scope as code

2. Asks the LLM to come up with a good name for the given variable name, by looking at the scope where the variable is

3. Uses Babel to make the context-aware rename to AST based on the LLM's response


How well does it compare to the original un-minified code if you compare it against minify + humanify. Would be neat if it can improve mediocre code.


On structural level it's exactly 1-1: HumanifyJS only does renames, no refactoring. It may come up with better names for variables than the original code though.


Can it guarantee 1-1? Doesn't Javascript allow looking up fields using a string name? That string could be computed in a complex manner.


It does in fact change the structure, but only safe-ish AST transformations related to minifiers (e.g. `void 0` to `undefined`): - https://github.com/jehna/humanify/blob/eeff3f8b4f76d40adb116... - https://webcrack.netlify.app/docs/concepts/unminify.html

properties and strings aren't renamed


Is it possible to add a mode that doesn't depend on API access (e.g. copy and paste this prompt to get your answer)? Or do you make roundtrips?


There is a fully local mode that does not use ChatGPT at all – everything happens on your local machine.

API access of ChatGPT mode is needed as there are many round trips and it uses advanced API-only tricks to force the LLM output.


Thanks for your tool. Have you been able to quantify the gap between your local model and chatgpt in terms of ‘unminification performance’?


At the moment I haven't found good ways of measuring the quality between different models. Please share if you have any ideas!

For small scripts I've found the output to be very similar between small local models and GPT-4o (judging by a human eye).


Thanks for creating this megafier, can you add support for local LLMs?


Better yet, it already does have support for local LLMs! You can use them via `humanify local`


Came here to say Humanify is awesome both as a specific tool and in my opinion a really great way to think about how to get the most from inherently high-temperature activities like modern decoder nucleus sampling.

+1


An educated guess, but I'd think wifi signal object detection works on recording a baseline (empty room signals) and then doing a diff to the baseline when some object moves. If that's the case, you cound not use the wifi signal to map the static room itself.


you can have a monitor wifi card that moves on an axis


Polar asks you to grant "act on your behalf" permissions to your Github. As someone who has both professional and open source projects at Github, that's a definite deal breaker


Unfortunately, this is the default by GitHub and their OAuth prompt and nothing we can change. See https://github.com/orgs/community/discussions/37117

Most of our API integration is purely about synchronising repositories, issues & PRs (read-only). With the exception of 2 things. 1) Inject the Polar badge. We then edit the issue body to append it at the bottom. 2) You can explicitly post a comment as yourself via Polar when you badge an issue, but this is entirely optional and merely a convenience feature would you want to. Requires manual & clear action.


I did come here to post the same reaction. I saw the OAuth prompt with "act on your behalf" and intermediately closed the browser tab. When I want to take a look at something new I do not want ti give it what looks like nearly full access on my account.


Here's one for React:

https://github.com/jehna/nosx


Glad to see jsx-less approaches to declaring dom!

Very similar to: https://github.com/jehna/longwood


I think a big moment in a programming language's evolution is to become self-hosted; you use the programming language to write a compiler/interpreter for the said language.

Now I want to see Smol Developer to develop itself. Then you can call me a believer.


i am having that exact conversation with the modal cofounders themselves: https://twitter.com/swyx/status/1658147687408238593 but my subjective judgment is that it is a bit too early for quine smol developer.

i think self hosting/quines are a nice curiosity, but not really a practical need since this isnt a PL. i made this to build smol apps, lets keep it practical


Thank you! I added them both to Mastofeeder's readme

For a positive reference I love how simple https://bird.makeup (a Twitter to Mastodon bridge) is; that's the UX that I wanted to replicate with Mastofeeder.

IMO Mastofeeder is as easy to abuse as any other Mastodon server (or spinning your own), so I wanted to make the service to be as easy for legitimate end users as possible.

And there's always technical measures to implement if I start seeing abuse: Throttling requests, limiting maximum feeds per users etc


Thanks for adding references !

I am also thrilled by how nice bird.makeup ux is, although it's a bit unnerving to go to an account from your mastodon client, click on the "open in browser" and land on the homepage, not the page of the Twitter user. I'd hope it would give me the last few tweets, and the same for mastofeeder.

> IMO Mastofeeder is as easy to abuse as any other Mastodon server (or spinning your own)

That's the thing, reading RSS should be extremely simple but the ActivityPub way means there is a mandatory gateway, and that gateway is then too easy to abuse. My issue is not necessarily with mastofeeder per se but with the high burden such a gateway means, and the power imbalance that comes from it.

But even though I may sound negative, I really like your project ! Thanks for doing this for the community !


Thanks, good point! Makes sense to not flood the federated timeline.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: