Hacker Newsnew | past | comments | ask | show | jobs | submit | jalammar's commentslogin

Hi HN,

This is the first in a series of articles I'm writing to introduce devs to practical applications of large NLP language models (for text generations like GPT and for language understanding like BERT).

I have been connecting the dots between the capabilities of these models and their business application. I still believe we're in the beginning of grasping the amount of potential value we can extract from these models. Happy to get to share these as I learn them from my exposure to the problem space.

Some of the key visual language I'm aiming to simplify is that of "prompts" and their use to shape model output (leading to practical applications). In this post, a key visual is [1] which shows an example of a summarization prompt and [2] showing a high-level process of "prompt engineering".

Would appreciate your feedback!

[1] https://docs.cohere.ai/img/intro-llms/language-model-prompt.... [2] https://docs.cohere.ai/img/intro-llms/prompt-engineering-and...


There is a real gap between the development and techniques on how to use them practically and where to use them(use cases). Thanks for filling in the gap


It's widely used in recommenders based on embeddings. See:

https://github.com/spotify/annoy

https://github.com/facebookresearch/faiss




Hi HN, I created this cheat sheet and video as high-level guidance to the major categories of ML explainability research and techniques. It's an area I've been working in the last year and this is how I synthesized the lay of the land in my head based on excellent references listed in the page. I hope you find it useful. It's non-exhaustive and will be updated, so please let me know if you have any feedback for future revisions. Thanks!


I wouldn't trust any model to generate text for customers yet. Not even the largest GPT3. There are no guarantees on what they will output and could be damaging to your business.

You're better off either: 1- Defining common "intents" that a lot of customer queries are categorized into, and having a model map the incoming message to the appropriate canned response. Look at Rasa, for an example of this.

2- if you insist on generating the text, have it be a recommendation to a human agent that either chooses to send it or writes their own response.


Thanks for the advice.


Hugging Face has that service

https://huggingface.co/pricing


I didn't come across one yet personally


Hello HN, author here. Language models are absolutely fascinating tools. I believe it would pay for software engineers to have a sense of their capabilities and how they function. The article showcases a few views to expose the inner workings of the model, but also simple UI for interacting with a language model to get a sense for how they work and generate words.

If you prefer video, I have also recently released a video [1] with PyData to provide an intro to language models and their applications and how we're trying to make Transformer-based ones more transparent with Ecco[2]. Contributors are welcome!

[1] https://www.youtube.com/watch?v=rHrItfNeuh0

[2] https://www.eccox.io/ and https://github.com/jalammar/ecco

Thanks mods for merging submissions. Happy to get feedback , thoughts, or questions.


Nice article, thanks for posting :-)


I actually started with PCA. But NMF proved more understandable since negative dimensions in PCA are hard to interpret. I didn't consider UMAP, but would be interested to see how it performs here.

It should be easy, yeah. for NMF, the activations vector is reshaped from (layers, neurons, token position) down into (layers/neurons, token position). And we present that to sklearn's NMF model. I would assume UMAP would operate on that same matrix. That matrix is called 'merged_act' and is located here: https://github.com/jalammar/ecco/blob/1e957a4c1c9bd49c203993...


Interesting. Thanks for sharing your notes on the higher layers. Allow me to repost that to the discussion board on github.

I do get your point on interpretation. This work is just a starting point. I'm curious to arrive at ways to automatically select the appropriate number of factors for a specific sequence. Kind of like the elbow method for K-means clustering.


These are AI explanation methods. They belong to the toolbox which would include LIME, Shapley values...etc. Input saliency is a gradient-based explanation method.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: