Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: FileKitty – Combine and label text files for LLM prompt contexts (github.com/banagale)
69 points by bredren 8 months ago | hide | past | favorite | 21 comments



This kind of reflects the fact that a lot of working with LLMs is just organizing text, and prompts can become a real engineering problem when you are orchestrating pipelines of dozens or more files with completions at various points with context windows of 100K tokens or more.

I've not found a satisfying framework yet, generally find raw Python best. But I still spend too much time on boilerplate and tweaking formatting or samplers and chunking for context windows.

If anyone knows of a better tool for abstracting that away (LangChain is not it IMO) please let me know.



DSPy?


Pretty nice, made a CLI app for this as well, seems like a common need: https://github.com/3rd/promptpack

But sending whole files isn't always optimal, I'm thinking there has to be a better way, like picking workspace symbols and pulling in only the code they depend on from other files. Something something LSP/tree-sitter-based.


This is what aider does, using tree sitter to extract the AST from each source file. It uses the ASTs to build a call graph. And then does a graph optimization to identify the most relevant parts of the code base, given the current state of the LLM chat.

There’s more details in this article:

https://aider.chat/2023/10/22/repomap.html


Ah wasn't aware, nice work!


aider is super limited. solid approach but needs a lot of work to make it usable


Cool, thanks for sharing.

That's a great point.

A tree-based file path browser with ability to select all or individual functions or classes would be cool.

Jetbrains IDEs have a good interface for symbols via the refactoring UI. Maybe I'll look there for some inspiration.


This is nice. I created something similar, https://github.com/jimmc414/1filellm

It converts papers, repositories, PRs, YT transcripts and web docs into one text file in the clipboard for llm ingestion


This stuff sounds cool, but doesn't it quickly run into token/context limits on the models?


Not anymore in the subscription LLM offerings. Claude seems to allow 70k tokens or more in their paid UI, ChatGPT seems to be about half of that while custom GPTs allow well over 100k.


I use code2prompt (https://github.com/mufeedvh/code2prompt) with the following zsh wrapper:

  function code2prompt() {

    # wrap the code2prompt command in a function that sets a number of default excludes
    # https://github.com/mufeedvh/code2prompt/

    local arguments excludeFiles excludeFolders templatesFolder excludeExtensions
    
    templatesFolder="${HOME}/git/code2prompt/templates"
    excludeFiles=".editorconfig,.eslintignore,.eslintrc,tsconfig.json,.gitignore,.npmrc,LICENSE,esbuild.config.mjs,manifest.json,package-lock.json,\
    version-bump.mjs,versions.json,yarn.lock,CONTRIBUTING.md,CHANGELOG.md,SECURITY.md,.nvmrc,.env,.env.production,.prettierrc,.prettierignore,.stylelintrc,\
    CODEOWNERS,commitlint.config.js,renovate.json,pre-commit-config.yaml,.vimrc,poetry.lock,changelog.md,contributing.md,.pretterignore,.prettierrc.json,\
    .prettierrc.yml,.prettierrc.js,.eslintrc.js,.eslintrc.json,.eslintrc.yml,.eslintrc.yaml,.stylelintrc.js,.stylelintrc.json,.stylelintrc.yml,.stylelintrc.yaml"
    excludeFolders="screenshots,dist,node_modules,.git,.github,.vscode,build,coverage,tmp,out,temp,logs"
    excludeExtensions="png,jpg,jpeg,gif,svg,mp4,webm,avi,mp3,wav,flac,zip,tar,gz,bz2,7z,iso,bin,exe,app,dmg,deb,rpm,apk,fig,xd,blend,fbx,obj,tmp,swp,\
    lock,DS_Store,sqlite,log,sqlite3,dll,woff,woff2,ttf,eot,otf,ico,icns,csv,doc,docx,ppt,pptx,xls,xlsx,pdf,cmd,bat,dat,baseline,ps1,bin,exe,app,tmp,diff,bmp,ico"

    echo "---"
    echo "Available templates:"
    ls -1 "$templatesFolder"
    echo "---"

    echo "Excluding files: $excludeFiles"
    echo "Excluding folders: $excludeFolders"
    echo "Run with -nn to disable the default excludes"

    # array of build arguments
    arguments=("--tokens")

    # if -t and a template name is provided, append the template flag with the full path to the template to the arguments array
    if [[ $1 == "-t" ]]; then
      arguments+=("--template" "$templatesFolder/$2")
      shift 2
    fi

    if [[ $1 == "-nn" ]]; then
      command code2prompt "${arguments[@]}" "${@:2}" # remove the -nn flag
    else
      command code2prompt "${arguments[@]}" --exclude-files "$excludeFiles" --exclude-folders "$excludeFolders" --exclude "$excludeExtensions" "${*}"
    fi
  }



"Isn't this just a GUI for the cat command" "Oh. That's the joke."


Nice! I made something similar but for the browser recently: https://files2prompt.com

I think there some CLI tools out there as well.


Thank you for sharing, I found it really useful and well done! You did a really great job!


Thanks, I appreciate it! I hope to keep adding features. If there are missing features that you'd use, feel free to leave them in a reply.


This is sweet!! Organizing source docs manually is so tedious


If you need to feed multiple files to chatgpt or another LLM, this makes it way easier than manually copy and pasting.

This app shows you a file modal. Use Shift or Option keys to select multiple text files across one or more directories.

All of the selected files will be concatenated for easy select all / paste into your LLM conversation.

Output format of selected files is:

  ### `[filepath]`
  [file contents]
  ### `[filepath]`
  ... and so on.

  - Output is in a text field for easy copy-pasta.
  - File path starts at the common parent of all selected files


It looks very useful. Are you going to release a binary or put it up on brew?


Thanks. I'll take feature requests!

I added a compiled version targeting macos arm64 here, though it is not signed.

https://github.com/banagale/FileKitty/releases/tag/0.1.0

I'll see if I can get a brew version up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: