Show HN: FileKitty – Combine and label text files for LLM prompt contexts

causal · 2024-05-01T22:26:08 1714602368

This kind of reflects the fact that a lot of working with LLMs is just organizing text, and prompts can become a real engineering problem when you are orchestrating pipelines of dozens or more files with completions at various points with context windows of 100K tokens or more.

I've not found a satisfying framework yet, generally find raw Python best. But I still spend too much time on boilerplate and tweaking formatting or samplers and chunking for context windows.

If anyone knows of a better tool for abstracting that away (LangChain is not it IMO) please let me know.

alchemist1e9 · 2024-05-02T04:45:42 1714625142

txtai?

https://neuml.github.io/txtai/

stavros · 2024-05-01T22:41:03 1714603263

DSPy?

_andrei_ · 2024-05-01T23:56:21 1714607781

Pretty nice, made a CLI app for this as well, seems like a common need: https://github.com/3rd/promptpack

But sending whole files isn't always optimal, I'm thinking there has to be a better way, like picking workspace symbols and pulling in only the code they depend on from other files. Something something LSP/tree-sitter-based.

anotherpaulg · 2024-05-02T00:48:54 1714610934

This is what aider does, using tree sitter to extract the AST from each source file. It uses the ASTs to build a call graph. And then does a graph optimization to identify the most relevant parts of the code base, given the current state of the LLM chat.

There’s more details in this article:

https://aider.chat/2023/10/22/repomap.html

_andrei_ · 2024-05-02T11:53:38 1714650818

Ah wasn't aware, nice work!

arthurcolle · 2024-05-02T02:43:17 1714617797

aider is super limited. solid approach but needs a lot of work to make it usable

bredren · 2024-05-02T00:06:01 1714608361

Cool, thanks for sharing.

That's a great point.

A tree-based file path browser with ability to select all or individual functions or classes would be cool.

Jetbrains IDEs have a good interface for symbols via the refactoring UI. Maybe I'll look there for some inspiration.

Jimmc414 · 2024-05-02T04:48:08 1714625288

This is nice. I created something similar, https://github.com/jimmc414/1filellm

It converts papers, repositories, PRs, YT transcripts and web docs into one text file in the clipboard for llm ingestion

Sakos · 2024-05-02T08:17:40 1714637860

This stuff sounds cool, but doesn't it quickly run into token/context limits on the models?

Jimmc414 · 2024-05-02T11:42:51 1714650171

Not anymore in the subscription LLM offerings. Claude seems to allow 70k tokens or more in their paid UI, ChatGPT seems to be about half of that while custom GPTs allow well over 100k.

smcleod · 2024-05-02T11:48:42 1714650522

I use code2prompt (https://github.com/mufeedvh/code2prompt) with the following zsh wrapper:

  function code2prompt() {

    # wrap the code2prompt command in a function that sets a number of default excludes
    # https://github.com/mufeedvh/code2prompt/

    local arguments excludeFiles excludeFolders templatesFolder excludeExtensions
    
    templatesFolder="${HOME}/git/code2prompt/templates"
    excludeFiles=".editorconfig,.eslintignore,.eslintrc,tsconfig.json,.gitignore,.npmrc,LICENSE,esbuild.config.mjs,manifest.json,package-lock.json,\
    version-bump.mjs,versions.json,yarn.lock,CONTRIBUTING.md,CHANGELOG.md,SECURITY.md,.nvmrc,.env,.env.production,.prettierrc,.prettierignore,.stylelintrc,\
    CODEOWNERS,commitlint.config.js,renovate.json,pre-commit-config.yaml,.vimrc,poetry.lock,changelog.md,contributing.md,.pretterignore,.prettierrc.json,\
    .prettierrc.yml,.prettierrc.js,.eslintrc.js,.eslintrc.json,.eslintrc.yml,.eslintrc.yaml,.stylelintrc.js,.stylelintrc.json,.stylelintrc.yml,.stylelintrc.yaml"
    excludeFolders="screenshots,dist,node_modules,.git,.github,.vscode,build,coverage,tmp,out,temp,logs"
    excludeExtensions="png,jpg,jpeg,gif,svg,mp4,webm,avi,mp3,wav,flac,zip,tar,gz,bz2,7z,iso,bin,exe,app,dmg,deb,rpm,apk,fig,xd,blend,fbx,obj,tmp,swp,\
    lock,DS_Store,sqlite,log,sqlite3,dll,woff,woff2,ttf,eot,otf,ico,icns,csv,doc,docx,ppt,pptx,xls,xlsx,pdf,cmd,bat,dat,baseline,ps1,bin,exe,app,tmp,diff,bmp,ico"

    echo "---"
    echo "Available templates:"
    ls -1 "$templatesFolder"
    echo "---"

    echo "Excluding files: $excludeFiles"
    echo "Excluding folders: $excludeFolders"
    echo "Run with -nn to disable the default excludes"

    # array of build arguments
    arguments=("--tokens")

    # if -t and a template name is provided, append the template flag with the full path to the template to the arguments array
    if [[ $1 == "-t" ]]; then
      arguments+=("--template" "$templatesFolder/$2")
      shift 2
    fi

    if [[ $1 == "-nn" ]]; then
      command code2prompt "${arguments[@]}" "${@:2}" # remove the -nn flag
    else
      command code2prompt "${arguments[@]}" --exclude-files "$excludeFiles" --exclude-folders "$excludeFolders" --exclude "$excludeExtensions" "${*}"
    fi
  }

mutant · 2024-05-04T00:10:58 1714781458

https://simonwillison.net/2024/Apr/8/files-to-prompt/

Should check his out

acbart · 2024-05-02T00:37:14 1714610234

"Isn't this just a GUI for the cat command" "Oh. That's the joke."

reidbarber · 2024-05-01T22:14:16 1714601656

Nice! I made something similar but for the browser recently: https://files2prompt.com

I think there some CLI tools out there as well.

levysoft · 2024-05-02T13:30:52 1714656652

Thank you for sharing, I found it really useful and well done! You did a really great job!

reidbarber · 2024-05-05T21:59:25 1714946365

Thanks, I appreciate it! I hope to keep adding features. If there are missing features that you'd use, feel free to leave them in a reply.

Birdguy05761 · 2024-05-01T23:25:53 1714605953

This is sweet!! Organizing source docs manually is so tedious

bredren · 2024-05-01T18:11:31 1714587091

If you need to feed multiple files to chatgpt or another LLM, this makes it way easier than manually copy and pasting.

This app shows you a file modal. Use Shift or Option keys to select multiple text files across one or more directories.

All of the selected files will be concatenated for easy select all / paste into your LLM conversation.

Output format of selected files is:

  ### `[filepath]`
  [file contents]
  ### `[filepath]`
  ... and so on.

  - Output is in a text field for easy copy-pasta.
  - File path starts at the common parent of all selected files

teruakohatu · 2024-05-01T22:07:50 1714601270

It looks very useful. Are you going to release a binary or put it up on brew?

bredren · 2024-05-01T22:57:37 1714604257

Thanks. I'll take feature requests!

I added a compiled version targeting macos arm64 here, though it is not signed.

https://github.com/banagale/FileKitty/releases/tag/0.1.0

I'll see if I can get a brew version up.