Phind Model beats GPT-4 at coding, with GPT-3.5 speed and 16k context

zoogeny · on Oct 31, 2023

I just spent a few minutes doing a comparison between Phind and GPT-4 for a very high-level question on a distributed job queue. I gave them both the same fairly vague sketch of a kind of system I would like to build. Here are my impressions:

In the positives of Phind:

* Phind was able, even eager, to recommend specific libraries relevant to the implementation. The recommendations matched my own research. GPT-4 takes some coaxing to get it to recommend libraries. Phind also provided sample code using the libraries it recommended.

* Phind provides copious relevant sources including github, stackoverflow and others. This is a major advantage, especially if you use these AI assistants as a jumping off ground for further research.

* Phind provides recommendations for follow on questions that were very good. One suggestion to the Phind team: don't remove the alternate follow on questions once I select one. A couple of times it recommended a few really good follow up questions but as soon as I selected one the others disappear.

In the positives of GPT-4:

* GPT-4 gave better answers. This is my subjective opinion (obviously) but if I was interviewing two candidates for a job position and using my question as the basis for a systems-design interview then GPT-4 was just overall better. In many cases it added context beyond my question, recommending things like logging and metrics for example. It seemed to intuit the "question behind the question" in a much better way than the literal interpretation of Phind. This is probably highly case-dependent, sometimes I just want an answer to my explicit question. But GPT-4 seemed to understand the broader context of the question and replied with that in mind leading to an overall more relevant response.

* GPT-4 handled follow-up questions better. This is similar to the previous point - but GPT-4 gave me the impression of narrowing down the scope of the discussion based on the context of my follow-up question. It seemed to "understand" the direction of the conversation in a way that felt like it was following context.

NOTE: this was not a test on coding capability (e.g. implementing algorithms) but on using these AI coding assistants as sounding boards for high-level design and architecture decisions.

idonotknowwhy · on Nov 1, 2023

This is a good point about GPT-4, it can intuit the "question behind the question" really well compared with other models. And it's been profoundly useful for me with the most random tasks I knew nothing about prior (like fixing a wall in my house), etc.

Ghexor · on Nov 1, 2023

That's probably because OpenAi can train on the (succesfull) conversations we have with ChatGPT!

X6S1x6Okd1st · on Oct 31, 2023

> * Phind provides copious relevant sources including github, stackoverflow and others. This is a major advantage, especially if you use these AI assistants as a jumping off ground for further research.

Did you find them to be correct?

pbhjpbhj · on Oct 31, 2023

I don't use Phind for coding, except occasionally, but I like it best for generalised tech search because each para has a reference and there's a list of references down the side -- often the references would really be sufficient for me on their own.

I've had one glaring error, I can't quite remember the details, but it switched the names/characteristics of two different processes (ie was exactly opposite in what it said); it was something to do with instruction caching and TLB, IIRC. I assumed you'd was a problem with the input corpus not allowing antonyms to be disambiguated.

Anyway, for me it's the best of the LLM tools I have access to and had mostly replaced search engine (Google, Dukgo) for my tech-related work.

I've only used chat.openai.com (free), bing chat, HuggingChat.

zoogeny · on Oct 31, 2023

I don't think "correct" is the right word since these were open ended systems design type questions. There are many ways to accomplish the same task.

I also spent about 20 minutes on this which is why I mentioned this is a first impression. I'll leave it to researchers to develop a "relevancy" metric and objectively apply it.

In my experience, the sources were sufficiently relevant based on its responses. They were about as relevant as equivalent Google queries. Some tiny, tiny niggles, like I was explicit I wanted it to recommend approaches in Go and for one reference I recall related to distributed locking mechanisms it provided a reference to an implementation in Java. However, that is completely fine for me since the context was more about the locking on the database side and not really the implementation in a specific language.

foobarbecue · on Oct 31, 2023

And the sources actually existed? i.e. there weren't any made-up ones?

zoogeny · on Oct 31, 2023

The sources are urls to the cited page (e.g. stackoverflow.com, pkg.go.dev). In the side-bar next to the answer is a more standard search-result style link list with pulled quotes from the pages (like a Google search).

I didn't click every single link (as I mentioned, the citations are copious) but the few I did follow went to relevant articles. I just went back and randomly clicked several more and they all went to pages that exist and mostly relate to the content of the answer. The inline citations seem a bit more on-topic compared to the side bar which does seem more like the links were lifted directly from a search engine.

To be fair there are some lower-quality blog-spammy kinda stuff - more or less the same kind of thing you would get out of Google. But compared to GPT-4, which provides no sources whatsoever, it is an advantage IMO.

webappguy · on Oct 31, 2023

Do you have custom instructions? Everyone needs to mention and post prompts else entirely antidotal

rushingcreek · on Oct 31, 2023

We support custom instructions at https://phind.com/profile.

bredren · on Oct 31, 2023

I’m trying to get it to answer only in executable Python. I used the template with instructions I use for my system prompt on gpt4. And I tried using the additional context field for the same.

It gets to writing the expected code but it still wants to include formatted headings instead of commenting those out so the entire response is executable Python.

As a follow up I provided an example heading with the hash out front. It didn’t work.

Any ideas on how to get it to this? Fwiw, gpt4 often if ignores this request, but only about half the time. When it does it is typically a single block of explanatory text.

For that, I include prose detection and commenting as part of my post processing.

Also, I don’t see it easily, but do you have an API for this or is it intended to be run by the user?

rushingcreek · on Oct 31, 2023

Getting it to not output additional text is not something that it can do super well at the moment, unfortunately. We'll work on that.

soulofmischief · on Oct 31, 2023

My trick for this has been one-shot training + regex. I tell the model to produce executable code within triple backticks suffixed by a keyword, like:

```keyword // code ```

and then I just ignore anything outside of those blocks.

zoogeny · on Oct 31, 2023

I did not have custom instructions for either assistant. You can see the full conversation logs which I posted as a reply to another comment.

m3kw9 · on Oct 31, 2023

The “give context” part has a lot to do with prompting well based on the model. To have a fair comparison there should be just code and see what that come up with

fthd · on Oct 31, 2023

mind providing some of the prompts you use to question them?

zoogeny · on Oct 31, 2023

Here are the conversation logs:

https://chat.openai.com/share/867ff0c4-d4cf-4af9-a785-31a599...

https://www.phind.com/search?cache=ej8pn1dfjjwfr1tgc6ybwhlg

NOTE: there are a few more question/answer blocks in the phind conversation since I was testing out the follow up question feature.

fartasanelk · on Nov 1, 2023

would you be able to share your prompt(s)?

Edit: they are already posted as comment in this thread.

alex-moon · on Nov 1, 2023

I tried my standard "trick" question I use for LLMs:

"Give me five papers with code demonstrating the state of the art of machine learning which uses geospatial data (e.g. GeoJSON) as both input and output."

There is no such state of the art. My hand-wavey understanding is that GIS data is non-continuous, which makes it useless for transformers, and also contextual, which makes it useless for anything else. Will defer to actual ML people for better explanations.

Point is, LLMs invariably give five papers with code that don't actually exist - it's a guaranteed hallucination.

Phind was able to give me five links that do in fact exist, as well as contextual information as to why these five links were not papers with code doing ML with GIS data. This is by far the best answer to this question from an LLM I've received yet.

geedzmo · on Nov 1, 2023

I don't see how this would be relevant for a code model?

The code model isn't trained to retrieve papers/articles, it's meant to complete code. Whether or not you find hallucination in a unrelated task isn't particularly interesting.

alex-moon · on Nov 2, 2023

Damn, this is how I learn that HN doesn't have a block function. What a shame.

My friend, can you do me a favour and actually click the link and have a play with the app? If you do, you will discover that what you're dealing with there is an LLM. That's literally why it's being compared to other LLMs.

No idea what you were trying to achieve with this comment. "The code model isn't trained to retrieve articles." a) neither is any other LLM, what's your point? and b) the app on the other end of that URL retrieves articles - it's not even tangential to the app, it's key functionality.

NoNoisle · on Nov 5, 2023

jstummbillig · on Nov 1, 2023

ChatGPT 4 with web browsing: https://chat.openai.com/share/19a425b5-ed37-469e-860d-65ee70...

ChatGPT 4 without web browsing: https://chat.openai.com/share/7e11b4a6-52f2-441a-8614-7266c3...

alex-moon · on Nov 1, 2023

ChatGPT 4 seems to be better than it was when I was using it (mere months ago)!

lucubratory · on Nov 1, 2023

Yeah, all of OpenAI's stuff gets better much quicker than I'm used to. So does the general performance ceiling of all open source models, even if individual models don't improve as much.

jamessb · on Nov 1, 2023

> the state of the art of machine learning which uses geospatial data (e.g. GeoJSON) as both input and output

> There is no such state of the art

Some GIS work uses vector data: points/lines/polygons representing features (e.g., the location of roads or the outlines of buildings), which can be stored in formats like GeoJSON or WKT. But other work uses remote sensing data/satellite imagery that can be stored in raster formats like GeoTIFF - essentially TIFF image files with additional information stored to georeference them.

You can totally do machine learning on satellite imagery where both the input and output are geospatial data (e.g. to categorise land use - the inputs are multispectral images and the outputs can be images where the value of each pixel represents the identified land use).

You can also use machine learning for tasks like building footprint detection/delineation (e.g., [1]) based on satellite imagery. The output from such a pipeline can be a set of polygons, which could be saved as GeoJSON.

I'd consider either of theses to be examples of "machine learning which uses geospatial data (e.g. GeoJSON) as both input and output".

[1]: https://azure.microsoft.com/en-us/blog/how-to-extract-buildi...

Smith42 · on Nov 1, 2023

Check out EarthPT! https://arxiv.org/abs/2309.07207

cccybernetic · on Oct 31, 2023

I'm glad there's growing competition, but GPT4 still outperforms. Here's a recent question I asked:

Could you please provide me with a query for my PostgreSQL table? It consists of three columns: 'id,' 'teaser,' and 'full_text,' all of which are of type 'text.' Most 'full_text' entries are articles around 1000 words in length. I'd like to update the 'teaser' column for each row with the first approximately 200 words from the 'full_text' column. How can I achieve this?

Phind:

  CREATE OR REPLACE FUNCTION get_first_n_words(text, integer)
  RETURNS text AS $$
  DECLARE
     word_count integer := 0;
     word_start integer := 1;
     word_end integer;
  BEGIN
     WHILE word_count < \$2 AND word_start <= length(\$1) LOOP
         word_end := position(' ' in \$1 from word_start) + 1;
         IF word_end = 0 THEN
             word_end := length(\$1) + 1;
         END IF;
         word_count := word_count + 1;
         word_start := word_end;
     END LOOP;
     RETURN substring(\$1 from 1 for word_start - 1);
  END;
  $$ LANGUAGE plpgsql;

GPT4:

  UPDATE your_table_name
  SET teaser = (
      SELECT STRING_AGG(word, ' ') 
      FROM (
          SELECT SPLIT_PART(full_text, ' ', i) AS word
          FROM generate_series(1, 200) AS i
      ) AS words
      WHERE word <> ''
  )
  WHERE full_text IS NOT NULL;

rushingcreek · on Oct 31, 2023

Running "Ignore Web Context" enabled can improve performance for design tasks like this. I just got a more plausible answer: https://www.phind.com/search?cache=f0fkv5mxscwvagxgkuwnwgtl. Consistency is something we're working on.

cccybernetic · on Oct 31, 2023

Thanks for sharing, you're right - that does improve performance!

ta8645 · on Oct 31, 2023

How do you enable "Ignore Web Context"? I don't see that option anywhere on the page you linked, am I just being blind?

rushingcreek · on Oct 31, 2023

It's in the model dropdown under the search bar.

raducu · on Nov 1, 2023

You mean "Ignore Search Results" ?

riku_iki · on Oct 31, 2023

One example is not enough for performance conclusions

cccybernetic · on Oct 31, 2023

Obviously not. Perfectly reasonable to share anecdotes though.

Also, I ran a few different tests, and every GPT-4 response was superior, but I didn't want to clutter my comment with queries and code.

Wytwwww · on Oct 31, 2023

There is a performance conclusion in the title though.

riku_iki · on Oct 31, 2023

That conclusion is based on benchmark with many examples in different tasks.

emptysongglass · on Nov 1, 2023

That conclusion is based on their benchmarks. I'm not interested in those. I'm interested in community benchmarks, like those we're seeing in the comments. Lo and behold, GPT-4 is still king. The claims of any company should be taken with exactly a pinch of salt.

riku_iki · on Nov 1, 2023

that benchmark(HumanEval) is some public benchmark built by others.

PoignardAzur · on Nov 1, 2023

That kind of benchmark is a lot more reliable for models published before the benchmarks; models published afterwards have more opportunity to "study to the test". That's especially a concern when a company explicitly uses its score on that benchmark as a marketing point.

riku_iki · on Nov 1, 2023

sure, but it is the best thing we have.

emptysongglass · on Nov 1, 2023

Well no we have the anecdotes of all the HN folks which I trust many, many times more than a benchmark.

riku_iki · on Nov 1, 2023

lol, you can continue trusting anecdotes from internet. Industry prefers more scientific methods.

emptysongglass · on Nov 2, 2023

So Paul Graham posted that Phind is better and got absolutely destroyed in the comments

https://twitter.com/paulg/status/1719657855240815026

No, I do not take these benchmarks seriously and for good reason. They're benchmarks. The only thing that matters is the user's direct experience of the product. And Phind isn't there.

riku_iki · on Nov 2, 2023

> got absolutely destroyed in the comments

by tweeter trolls?..

spmurrayzzz · on Oct 31, 2023

AFAIK they haven't released the dataset they fine-tuned on, so we can't be 100% there wasn't benchmark contamination. Agree that we definitely need more than N=1 to challenge the performance claims, but I still think its valid to call it out given how much benchmarking-gaming we've seen in this space.

riku_iki · on Oct 31, 2023

I think you can bring contamination claim to every public benchmark results nowdays: models are trained on TBs of data crawled from internet, and there is no guarantee benchmark is not leaked in some way.

spmurrayzzz · on Oct 31, 2023

With respect to the pretraining data, its true that we're probably SOL there in terms of verification. But for fine-tuning, they could still publish the dataset and see if others can reproduce their results as well as audit for contamination.

If we're comparing benchmark deltas between different fine-tuned variants that share the same base models, that seems like the bare minimum we should expect to come along with performance claims.

riku_iki · on Nov 1, 2023

I think both pretraining and finetuning datas are essential secret information for commercial models/services.

spmurrayzzz · on Nov 1, 2023

In the case of Phind though, they also publish their models on HF with similar bold performance claims without publishing the datasets: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2

Even I am to grant that their subscription product has some secret sauce they want to keep close to the chest (ignoring for a moment their paid product is GPT-4 based), not doing the same for all the models they release to the open source community free of charge with a commercially-permissible license seems suspect.

I realize this sort of open source contribution is mostly for marketing purposes, but being critical of the performance claims I think is still valid nonetheless.

Wytwwww · on Oct 31, 2023

From what I understand it's a single test suite? Of course I don't really mind the clickbait title that much, it's hard to attract attention otherwise.

riku_iki · on Oct 31, 2023

I think it is valid criticism that that HumanEval benchmark is not completely representative, they also say it in the post.

amelius · on Oct 31, 2023

Depends on the claims made.

gardenhedge · on Nov 1, 2023

With some simple clarification I got this

UPDATE your_table SET teaser = substring(full_text from '(\S+\s*){1,200}')

nofunsir · on Oct 31, 2023

I really dislike article teasers and "read more" buttons. Now I know it's intentional clipping of the corresponding articles.

brucethemoose2 · on Oct 31, 2023

> We can achieve up to 100 tokens per second single-stream while GPT-4 runs around 20 tokens per second at best.

Is that with batching? If so, thats quite impressive.

> certain challenging questions where it is capable of getting the right answer, the Phind Model might take more generations to get to the right answer than GPT-4.

Some of this is sampler tuning. Y'all should look at grammar based sampling (https://github.com/ggerganov/llama.cpp/pull/1773) if you aren't using it already, as well as some of the "dynamic" sampling like mirostat and dynatemp: https://github.com/LostRuins/koboldcpp/pull/464

I think these should work with nvidia's implementation if you just swap the sampling out with the HF version.

BTW, all this is a great advantage of pulling away from OpenAI. You can dig in and implement experimental features that you just can't necessarily do through their API.

rushingcreek · on Oct 31, 2023

We leverage Flash Decoding (https://crfm.stanford.edu/2023/10/12/flashdecoding.html) in TensorRT-LLM to achieve 100 tokens per second on H100s.

claytonjy · on Oct 31, 2023

is that impressive? I was thinking 100 tok/s on an H100 is really slow considering LMDeploy claims 2000+ on an A100 and a large batch size.

rushingcreek · on Oct 31, 2023

We get 100 tokens a second with batch size 1. Those 2000+ figures are for large batches.

claytonjy · on Oct 31, 2023

Ah, that's fair, and faster than any of the LMDeploy stats for batch size 1; nice work!

Using an H100 for inference, especially without batching, sounds awfully expensive. Is cost much of a concern for you right now?

lyjackal · on Oct 31, 2023

I don't think they're saying they're doing batch size of 1, just giving performance expectations of user facing performance

brucethemoose2 · on Oct 31, 2023

Yeah, and this is basically what I was asking.

100 tokens/s on the user's end, on a host that is batching requests, is very impressive.

claytonjy · on Oct 31, 2023

I think they _are_ saying batch size 1, given that rushingcreek is OP.

ErikBjare · on Nov 1, 2023

Yes they are saying batch size 1 for the benchmarks, but they aren't doing batch size 1 in prod (obviously).

claytonjy · on Nov 1, 2023

I don't think that is obvious. If your use case demands lowest latency at any cost, you might run batch size 1. I believe replit's new code model (announced about a month ago) runs at batch 1 in prod, for example, because code completions have to feel really fast to be useful.

With TensorRT-LLM + in-flight batching you can oversubscribe that one batch slot, by beginning to process request N+1 while finishing request N, which can help a lot at scale.

brucethemoose2 · on Nov 1, 2023

I'm not sure about TensorRT, but in llama.cpp there are seperate kernals optimized for batching and single use inference. It makes a substantial difference.

I suppose one could get decent utilization by prompt processing one user while generating tokens for another.

brucethemoose2 · on Oct 31, 2023

Without batching, I was actually thinking that's kind of modest.

ExllamaV2 will get 48 tokens/s on a 4090, which is much slower/cheaper than an H100:

https://github.com/turboderp/exllamav2#performance

I didn't test codellama, but the 3090 TI figures for other sizes are in the ballpark of my generation speed on a 3090.

100 tokens/s batched throughput (for each individual user) is much harder.

drcode · on Oct 31, 2023

I am a heavy user of GPT4, and Phind was surprisingly able to match GPT4 on several initial programming tasks I gave it. Given the large context window of Phind, it will likely be able to outperform GPT4 for some tasks.

That is quite an accomplishment, I am impressed

iandanforth · on Oct 31, 2023

FWIW The default context window of GPT-4 via ChatGPT is about to change to 32k.

jeswin · on Nov 1, 2023

Given the number of times it just fails with large prompts on 32k contexts, I'm not sure if they'ready for this. In my experience, if you're consuming 20k+ tokens failure rate is more than 50%.

ComputerGuru · on Nov 1, 2023

Well over 50%, at least via the api, for me.

ComplexSystems · on Oct 31, 2023

This would be great if true. Any source for this?

iandanforth · on Nov 1, 2023

https://twitter.com/DataChaz/status/1719660354743976342

drcode · on Oct 31, 2023

that would put them significantly ahead again, for my use cases

rushingcreek · on Oct 31, 2023

We will eventually increase the Phind Model to 100K tokens -- the RoPE embeddings in Code Llama were designed for this.

arugulum · on Oct 31, 2023

> the RoPE embeddings in Code Llama were designed for this.

The RoPE embeddings were not "designed" for that. The original RoPE was not designed with length extrapolation in mind. Subsequent tweaks to extrapolate RoPE (e.g. position interpolation) are post-hoc tweaks (with optional tuning) to an entirely vanilla RoPE implementation.

antupis · on Nov 1, 2023

100k tokens and good ide support would be great. Copy pasting back and forth with browser and IDE is kinda annoying and you always miss some context. I think model is now good enough but what is kinda missing is good developer experience eg what to load in that context window and how model integrates to IDE. But this is kinda missing with copilot and chatgpt4 as well.

m3kw9 · on Oct 31, 2023

Is it “100k” or really 100k there are so many ways to do context, I remember seeing 100k before but it was doing some cheap trick to get it

razodactyl · on Nov 6, 2023

What about ALiBi and Sliding Window Attention?

Additionally Apple researchers seem to be playing with "Attention Free" variants.

jstummbillig · on Nov 1, 2023

Source?

slowhadoken · on Oct 31, 2023

I love that Phind cites what it scrapes. This should be the obligation of all LLM. I always suggest people use it over ChatGPT.

make3 · on Oct 31, 2023

What they're citing isn't what the LLM "scraped", it's what the retrieval model fed to the LLM. You're not guaranteed that it's what it actually used to give you the output, and it's also definitely not all the text that it used to get appropriate knowledge to generate the answer, as this is split over whatever millions of examples for the language and for human language in a non human-understandable way

pbhjpbhj · on Oct 31, 2023

A couple of times I've had the reference not include the detail being mentioned in the foregoing paragraph; the citations are still highly relevant, but it wasn't quite what I expected.

slowhadoken · on Nov 1, 2023

I've heard this coldtake before but OpenAI's source code isn't open to academic scrutiny. So I don't understand why some people are so confident about how it works. It's certainly not magic and Phind seems to be capable of it citation.

make3 · on Nov 1, 2023

It's transformer based language-modelling 101, not really a take, just stating facts. It's highly unlikely Phind has completely fundamentally changed all the exact same problems that the whole field is working on simultaneously, single-handedly, in a purely novel way. It's just how transformers work.

slowhadoken · on Nov 3, 2023

Phind appears to be doing it though. LLMs are stochastic parrots, I don’t see a radical difference. Input goes in, output comes out. Neural network aren’t magic, they’re a complex function. 1 node or a billion we can track the data that’s changing the weights inside the network.

make3 · on Nov 4, 2023

I do this for a living

Racing0461 · on Oct 31, 2023

As a user, i perfer getting the right response compared to the thing spitting out a link. (not saying phind is bad). Lets focus on getting llm right before nerfing it in its baby stages.

ryanklee · on Oct 31, 2023

Who said anything about nerfing? Citation is just additive, no?

joshspankit · on Oct 31, 2023

In fact, I’d argue that citation makes LLM better. Kind of a “think carefully” indicator. When LLMs are able to verify those citations independently it’s going to level up again by skyrocketing the objective truthiness.

lsaferite · on Oct 31, 2023

Interestingly, I'd say that _not_ being able to give citations helps protect the LLM from copyright issues. That being said, I'm much prefer if the LLM could provide citations for every piece of information it was trained on and uses to provide an answer.

pbhjpbhj · on Oct 31, 2023

Citations are essential for me as I'm using Phind for work and can't rely on "trust me bro". It needs to confirm to my expectations or be confirmed in a couple of the citations that have trustworthy sources (eg are from known domains, well-cited journals, etc.).

slowhadoken · on Nov 1, 2023

I've found great sites and devs using Phind.

slowhadoken · on Nov 1, 2023

Yeah, I prefer the context provided by the original creator. If I'm writing code and I need to reference someone else's work I put their name in my comments. I was digging through Box2D for polygon vs ray intersections and in the comments of the source code Erin Catto cites Collision Detection in Interactive 3D Environments by Gino van den Bergen. It makes me respect him even more.

__jonas · on Oct 31, 2023

I find it often makes the responses worse when it's being pre-fed these search results, it was the case when I tried gpt-4 with web browsing enabled, and seems to be the case with this, since even the person from the Phind team in this thread pointed out that turning this feature off improves performance for some tasks:

https://news.ycombinator.com/item?id=38089888

https://news.ycombinator.com/item?id=38090442

Racing0461 · on Oct 31, 2023

Nerf is the wrong word, more like regulatory capture. If all llm had to quote their sources at this point, along with all the other for the human changes we want to do, only the big players would be able to do them effectively making it hard to enter and compete. The current big players want launching a new llm product to be more like opening a new bank than opening a lemonade stand based on the ai executive order released yesterday.

donmcronald · on Oct 31, 2023

Give me the citations every day of the week. The source of information matters. For example, I don't rely on any ZFS info or opinions I find online if I can't verify it came from a contributor or highly reputable person that has a lot of experience with ZFS.

If you want to show the warts of all these LLMs, ask it about ZFS if you know enough to spot the commonly parroted misinformation that plagues the internet.

IMHO, these systems look super useful if they're citing sources and they're worthless without.

slowhadoken · on Nov 1, 2023

Transparency is paramount. If OpenAI doesn't want to make it's proprietary software open to academic scrutiny I completely understand. However, if their app is going to play an educational role then sources and citation are mandatory in academic content.

ErikBjare · on Nov 1, 2023

Funny you bring up ZFS specifically. I embarrassed myself a couple weeks ago by parroting something GPT-4 told me about ZFS to someone on reddit, which turned out to be completely wrong.

slowhadoken · on Nov 1, 2023

why not both?

joaodias · on Oct 31, 2023

Asked it to write a program that I've written before, to compare with gpt4. Didn't really get what I was asking for, gpt4 understood it perfectly, and is ready to continue prompting toward completion.

https://www.phind.com/agent?cache=cloeowfla000dl1084ermly3c vs https://chat.openai.com/share/4147da33-3669-4657-88fa-3a9dfc...

Might not be representative of the whole thing, but it went on about random things I didn't ask about, and just basic information I already knew

rushingcreek · on Oct 31, 2023

The Pair Programmer mode currently either uses GPT-4 or GPT-3.5 (if you've run out). Please try again in the default search mode to use the Phind Model.

Using the Phind Model in the default search seems to work well: https://www.phind.com/search?cache=ln6dpdtv5auwn4cq1ofg3gs9

joaodias · on Nov 1, 2023

https://www.phind.com/search?cache=z5odlx0o9lspzpfm4sfpp131

Way way better, I'm stunned. Congratulations on this

Capricorn2481 · on Oct 31, 2023

Even though the phind model is selected? Is there a technical reason Phind doesn't do pair programming yet?

rushingcreek · on Oct 31, 2023

It's because we haven't updated the Phind Model to support function calling yet but we're working on it.

Capricorn2481 · on Oct 31, 2023

Can you share what your long term monetization model is? I'm noticing Phind is free to use right now.

rushingcreek · on Oct 31, 2023

We have a Pro plan where you can get (virtually) unlimited GPT-4 and soon, an even faster Phind model. https://phind.com/plans

Capricorn2481 · on Nov 1, 2023

Is there something you're doing with GPT-4 that would make me want to use it through you vs just using it myself?

TheGeminon · on Oct 31, 2023

The problem is that it’s doing a search of your relatively niche problem, and probably getting pretty poor results. The text from the search is then more highly weighted than the base model, but with relative junk, so it performs better without the additional (unhelpful) context.

You see this with Bing search on ChatGPT as well, and I’ve seen it in my own projects.

accrual · on Oct 31, 2023

> it supports up to 16k tokens

> Llama 1 supports up to 2048 (2K) tokens, Llama 2 up to 4096 (4K), CodeLlama up to 16384 (16K). [0]

This is wild to me.

The token window is one of the limiting factors for having an AI that can actually remember you and past conversations. Having a large window is key for future AI applications that involve long running conversations (weeks, months, years). The tech is already very impressive, but imagine it as it becomes more like an actual pair programmer and remembers all the various things it's learned and worked on with you in the past.

[0] https://huggingface.co/docs/transformers/main/model_doc/llam...

seydor · on Oct 31, 2023

640k is enough for anyone

happycube · on Nov 1, 2023

Extending that analogy, imagine what one could do with 128B tokens.

On cast off/cheap workstation/server hardware.

mycall · on Oct 31, 2023

Token window size is being virtualized with the like of MemGPT, so its effect will diminish.

Der_Einzige · on Oct 31, 2023

Still waiting for the day that medium term memory (token average pooling like in sentence transformers) becomes used for this. It's staring all of these companies in the face and apparently no one thinks to implement it.

heavyarms · on Oct 31, 2023

I've been thinking along the same lines. The token window IMO should be a conceptual inverted pyramid, where there most recent tokens are retained verbatim but previous iterations are compressed/pooled more and more as the context grows. I'm sure there's some effort/research in this direction. It seems pretty obvious.

matsemann · on Oct 31, 2023

But some of the earlier tokens are also the most important ones, right? Like the instructions and rules you want it to follow.

visarga · on Nov 1, 2023

Phrase embeddings could bring a 32x reduction in sequence length because:

> Text Embeddings Reveal (Almost) As Much As Text. ... We find that although a naïve model conditioned on the embedding performs poorly, a multi step method that iteratively corrects and re embeds text is able to recover 92% of 32-token text inputs exactly. We train our model to decode text embeddings from two state of the art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.

https://arxiv.org/abs/2310.06816

a_wild_dandan · on Oct 31, 2023

They are. Moreover, the idea that AI companies are missing and/or not implementing this “obvious” tactic is hilarious. Folks, these approaches have profound consequences for training and inference performance. Y’all aren’t pointing out some low hanging fruit here, lol

Der_Einzige · on Oct 31, 2023

Actually, yes I am pointing out low hanging fruit here. These approaches do not have "profound consequences" for inference or training performance. In fact, sentence transformer models run orders of magnitude more quickly. Performance penalties will be small.

Also, I actually have several top NLP conference publications, so I'm not some charlatan when I say these things. I've actually physically used and seen these techniques improve LLM recall. It really actually works.

Here's more examples of low hanging fruit. The proof in that they work is in the implementations which I provide. You can run them, they work!: https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

Check yourself before you try to check others.

a_wild_dandan · on Nov 3, 2023

> In fact, sentence transformer models run orders of magnitude more quickly. Performance penalties will be small.

They do not. Sentence transformers aren't new, and have well-known trade offs. What source or line of reasoning misled you to believe otherwise?

> Here's more examples of low hanging fruit. The proof in that they work is in the implementations which I provide. You can run them, they work!: https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

This...is your blog about prompt engineering. What do you believe this "proves"? How have you blown away current production encoding or attention mechanisms?

pbronez · on Nov 1, 2023

Concur. LLM are still very young. We’re barely a year out from the ChatGPT launch. Everyone is iterating like mad. Several stealth companies working on new approaches with the potential to deliver performance leaps.

You ain’t seen nuthin’ yet…

brrrrrm · on Oct 31, 2023

Out of curiosity, why do you think the answer would be so simple and also completely untested?

Der_Einzige · on Oct 31, 2023

Too much money being thrown around on BS in the LLM space, hardly any of it is going to places where it matters. Ignorance on the part of investors.

For example, the researchers working hard on better text sampling techniques (i.e. https://arxiv.org/abs/2202.00666), or on better constraint techniques (i.e. like this https://arxiv.org/abs/2306.03081), or on actual negative prompting/CFG in LLMs (i.e. like this https://github.com/huggingface/transformers/issues/24536) are doing far FAR more to advance the state of AI than dozens of VC backed LLM companies operating today. They are all laboring in relative obscurity.

HN, and the NLP community have some serious blindspots with knowing how to exploit their own technology. At least someone at Andreessen Horowitz got a clue and gave some funding to Oogabooga - still waiting for Automatic1111 to get any funding.

fullstackchris · on Oct 31, 2023

Another curiosity, what do we estimate (if it's even possible) the context window of a human? Obviously an extremely broad question, and of course it must have some sort of decay factor... but... would be interesting to get a rule of thumb number in terms of token count. I can imagine its massive!

travisjungroth · on Oct 31, 2023

Human memory, in my limited understanding, doesn’t have the bifurcation of weights and context that LLMs do. It’s all a bit blurrier than that.

Something interesting that I heard from people trying to memorize things better is that memory “storage space” limits for people are essentially irrelevant. We’re limited by our learning and forgetting speeds. There’s no evidence of brains getting “full”.

Think of it like a giant warehouse of plants, with one employee. He can accept shipments (learning). He can take care of plants (remembering). Too long without care and they die (forgetting). The warehouse is big enough that it is not a limiting factor in how many plants he can keep alive. If it was 10x bigger it wouldn’t make a bit of difference.

Filligree · on Oct 31, 2023

I don't think it's massive. In fact, since it's roughly equivalent to working memory, I suspect it's on the order of 100 tokens at most.

It's just that, unlike these AIs, we're capable of online learning.

xrd · on Oct 31, 2023

I know it isn't popular, but I wish there was a way to use this inside Emacs. Or, vim. I just don't want to use VS Code anymore.

freedomben · on Oct 31, 2023

The standardizing on VS code is one of the saddest developments over the last several years IMHO. I think it's great that VS Code exists, but we're headed for a world where you have to use VS Code if you want the best tooling because it won't support other options. The same thing happened with Java dev and IntelliJ, and IMHO it has been extremely unhealthy for the ecosystem. I'm immensely glad that Copilot supports vim, but I'm fearful that it soon won't.

papichulo2023 · on Oct 31, 2023

Didnt vscode standardise language servers making much easier for all the rest text-editor-close-almost-ides to integrate? Is it really that sad?

freedomben · on Oct 31, 2023

Very fair point. Vim has benefited tremendously from that effort.

FreezerburnV · on Oct 31, 2023

Same could have/could be said about Jetbrains products. People are likely always going to use vim/emacs and create tooling around whatever new hotness exists for them. And honestly? VS Code is just a new iteration on how vim/emacs work in a lot of ways: Providing a place to edit text and then a bunch of plugins that do things with that text.

And if you want vim/emacs to keep living, then you should spend time helping! Create your own extensions, maintain/contribute to existing ones, etc. They will only die out when the last person actively contributing to them stops, so keep the chain of people going :)

selfhoster11 · on Nov 1, 2023

> The same thing happened with Java dev and IntelliJ, and IMHO it has been extremely unhealthy for the ecosystem.

While I agree, at the very least IntelliJ stood up on its own as a good IDE. I cut my baby teeth on Eclipse, and as soon as I realised how good IntelliJ is, I jumped ship without looking back. The same can barely be said about VS Code.

Jeff_Brown · on Oct 31, 2023

If only the depth of our feelings for Emacs counted for more in the market.

There's an argument that music and the arts are dumbed down by the fact that, for instance, making an album worth $10 to millions of people pays way better than making an album worth a million dollars to tens of people, since the album is going to get priced at $10 one way or the other. It only just now occurred to me that the same phenomenon applies to tools.

mg · on Oct 31, 2023

In Vim, I tried to assign a shortcut to send the selected text to Phind (or any other LLM) and came up with this:

:'<,'>y|call system('firefox <url>?q='.shellescape(@*).' &')

The only problem left is that the text is not urlencoded.

There probably is some elegant way to urlencode it. But I did not come up with one yet.

wizzwizz4 · on Oct 31, 2023

https://stackoverflow.com/a/76488059 claims to have one, though it's not explained.

regularfry · on Nov 1, 2023

I've hacked together a basic Emacs ollama api integration that does simplistic code completion against a local LLM from someone else's copilot example. It's slower than I want (about 7 seconds per inference on my M1 mac, typically) and very stupid about what context it sends, but nevertheless: it's just, and only just, enough to be useful. Hadn't considered publishing it because it relies on a python façade to convert copilot-style requests and responses back and forth to ollama, but if there's interest I'll spruce it up and get it out.

regularfry · on Nov 1, 2023

From downthread, just use ellama. They're further ahead than me by the looks of things.

bigdict · on Oct 31, 2023

Pretty sure GitHub Copilot has emacs/vim integration.

freedomben · on Oct 31, 2023

It does, although not the most recent features. I use the compatible features in Vim and I really like it. Not enough to switch editors though.

ewokone · on Nov 4, 2023

I have been a vs code power user and switched to pycharm two years ago and will never go back because of the features for working with multiple environments and projects in pycharm.

Working with phind needs to be available in pycharm for me considering switching from gpt4 to phind. Chatting with phind on my local files is the feature I am looking for.

accoil · on Oct 31, 2023

Maybe ellama[1] would work? It doesn't support Phind yet, but a provider could be created for the underlying connection package llm[2].

[1]: https://github.com/s-kostyaev/ellama

[2]: https://github.com/ahyatt/llm

notpublic · on Oct 31, 2023

https://github.com/github/copilot.vim

https://github.com/huggingface/llm.nvim

haarts · on Oct 31, 2023

You and me both brother. LSP integration seems the way forward.

fictorial · on Oct 31, 2023

https://github.com/CoderCookE/vim-chatgpt

ojosilva · on Oct 31, 2023

Awesome model from a quick run-through comparison, it's comparable in results to GPT-4 with web search and references as a plus, but runs faster. Two small nitpicks:

- Dark mode is hard to read, the answer text font has too much weight and brightness which makes it hard to read long paragraphs of non-code text. Light mode is obviously too bright overall, but it's already nighttime where I'm at so maybe tomorrow at noon I'll have another opinion. I'd preferred gray (dark, ie OpenAI) and sepia (light, ie HN) as backgrounds when long lines of text are involved.

- Pricing page and ties to GPT-4: what does "500+ best model uses per day (GPT-4)" mean? What's the "GPT-4" part for? I saw I can pick GPT4 as a model on the landing page, but I just don't get the best model/GPT-4 thing. Is Phind announcing it's a competitor but also proxies GPT-4? Sorry, I'm not up-to-date on GPT-4 "resellers" and the story behind Phind, it's just weird when it announces it "beats GPT-4" then the pricing is about GPT-4 usage.

rushingcreek · on Oct 31, 2023

Thanks for the feedback. We also support GPT-4 as an answering model so users can pick and choose what's best for their use case, but we recommend the Phind Model for the majority of users.

donmcronald · on Oct 31, 2023

Why is there an 8x difference in price-per-search between Plus and Pro?

I always shy away from stuff like this because I view it as one of two things. Either I'm getting ripped off if I pay for Plus, because 8x the cost to me means your margin is huge, or I'm getting subsidized by you with the Pro version which means I can't rely on it lasting long term.

I also dislike daily limits for search. My search usage isn't uniform day-to-day. I might go most of the month without searching for anything and then do a ton of searching over 2-3 days when I'm trying to learn something. So I'll be idle most of the month and then not have enough searches on the days I actually want to use it.

I prefer the model used by a lot of pre-paid services. Let me deposit a chunk of money (ex: $20-50 minimum) and charge me per search until my money is gone. That way I'm not "losing out" if I don't use it every day and I can "burst" as high as I want when I'm trying to learn something.

If the pricing is based on a certain amount of loss (on my side) from the use-it-or-lose it model, I don't like that. I want simple, fair pricing, not a complex pricing scheme where the primary purpose is to get me to overpay for my usage.

ezekiel68 · on Nov 1, 2023

Plenty of people know their upper limit. The ability to pay 50% less if that limit applies is a feature, not a bug. (This applies to any service -- I am not affiliated with phind except as an occasional user).

rushingcreek · on Oct 31, 2023

Phind Plus is $15/month and Phind Pro is $30/month. There is a 2x price difference, not an 8x difference. And Phind Pro comes with (virtually) unlimited GPT-4 uses.

We understand that the incentives of setting daily limits for search aren't great, which is why the Phind model is unlimited for free. GPT-4, however, is unfortunately too expensive for us not to charge past a certain usage threshold.

donmcronald · on Oct 31, 2023

Plus costs $0.016 per search and Pro costs $0.002 per search.

https://www.phind.com/search?cache=wgyz13tg4jkbl9pklptmpds5

ojosilva · on Nov 1, 2023

To me the $15/mo plan is just bait so users pick the target $30/mo month. Why would you pay $0.016/search when you can pay 8x less and feel smart about making that choice?

edit: looking at it again, I think the $15/mo is actually just for people who wants Phind "private", so that their data is not used for training.

nick-sta · on Nov 1, 2023

Cost per search isn't really a great metric. For me, I hit the cap of 30 searches/day pretty easily, but 500 is pretty hard to hit. For me, its just a question of what tier matches my volume.

BugsJustFindMe · on Oct 31, 2023

> You can now get high quality answers for technical questions in 10 seconds instead of 50.

ChatGPT 4 does not take 50 seconds to answer, so I don't understand this comparison.

bethekind · on Oct 31, 2023

Recently I've used gpt 4 and yes it does take up to a minute even for easy questions.

I've asked it how to scp a file on Windows 11 and it'll take a minute to tell me all the options possible.

If this takes 1/5th the time for equivalent questions, I'd consider switching

joshspankit · on Oct 31, 2023

Not my experience at all. Are you counting the entire answer in your time?

If so, consider adding one of the “just get to the point” prompts. GPT4’s defaults have been geared towards public acceptance through long-windedness which is imo entirely unnecessary when using it to do functional things like scp a file.

theWreckluse · on Oct 31, 2023

LOL, it’s not just for “public acceptance”. Look up Chain of Thought. Asking it to get to the point typically reduces the accuracy.

freedomben · on Oct 31, 2023

> LOL, it’s not just for “public acceptance”. Look up Chain of Thought. Asking it to get to the point typically reduces the accuracy.

Just trying to provide helpful feedback for you, this would have been a great comment, except for the "LOL" at the beginning that was unnecesary and demeaning.

bigfudge · on Oct 31, 2023

You are being snarky but are right. I have scripts set up to auto summarise expansive answers. I wish I could build this into the ChatGPT ui though.

maccard · on Nov 1, 2023

I know this is silly, but I've had great success asking chatgpt to summarise chatgpt's answers.

idonotknowwhy · on Nov 1, 2023

Try the custom instructions feature

londons_explore · on Oct 31, 2023

The words "briefly" or "without explanation" work well.

By keeping the prompt short, it starts generating output quicker too.

phillipcarter · on Oct 31, 2023

Yeah, I would say this is a prompting problem and not a model problem. In a product area we're building out right now with GPT-4, our prompt (more or less) tells it to provide exactly 3 values and it does that and only that. It's quite fast.

Also, use case thing. It is very likely the case that for certain coding use cases, Phind will always be faster because it's not designed to be general purpose.

furyofantares · on Oct 31, 2023

This isn't a fair comparison because I have custom instructions that mention being brief but complete, but I did "how to scp a file on Windows 11"

ChatGPT4: 14 seconds

phind with "pair programmer" checked: 65 seconds

phind default: 16 seconds

heyodai · on Nov 3, 2023

Take a look at the AutoExpert custom instructions: https://github.com/spdustin/ChatGPT-AutoExpert

It lets you specify verbosity from 1 to 5 (e.g. "V=1" in the prompt). Sometimes the model will just ignore that, but it actually does work most of the time. I use a verbosity of 1 or 2 when I just want a quick answer.

BugsJustFindMe · on Oct 31, 2023

> I've asked it how to scp a file on Windows 11 and it'll take a minute

https://imgur.com/a/iqxOJUV was 6.5 seconds.

https://imgur.com/a/pQFfWli was 15.

You can tell they're GPT-4 because the logo is purple (the logo is green when using 3.5).

JoshGlazebrook · on Oct 31, 2023

ChatGPT4 is more often than not noticeably slow enough that I question why I pay for it.

shmoogy · on Oct 31, 2023

Sometimes it's insanely quick - like gpt3,5 turbo or a cached answer or something.

rushingcreek · on Oct 31, 2023

We find that it takes around a minute for a 1024-token answer. Answers to less complex questions will take less time, but Phind will still be 5x faster.

selfhoster11 · on Nov 1, 2023

That really depends on the complexity of your request and any prompt engineering techniques in use for that request. Especially with "think step by step" in certain contexts, it can improve answer quality at the expense of generation time (because more tokens are emitted).

eoinboylan · on Oct 31, 2023

Ran a quick test with a Rust async code snippet that contains an error. Compared with GPT-4 its gives a far clearer solution, with linked sources to learn more! Super impressive!

rushingcreek · on Oct 31, 2023

Amazing, that's great to hear.

passion__desire · on Oct 31, 2023

Is it possible to output all steps of solutions in a single copyable block? I don't want to copy 4 separate blocks.

ezekiel68 · on Nov 1, 2023

When I use it I often give a final prompt like "Now combine the above answers together into a function that accept the following arguments...". This has worked well for my use cases.

rushingcreek · on Oct 31, 2023

You can tell it that in a followup. Or, configure an answer profile and tell it to use that style: https://phind.com/profile.

buildbot · on Oct 31, 2023

Well, neither GPT4 or this Phind model where able to answer my torture test: "Write amaranth code that can be used to control the readout of a frame from a kodak CCD with 4096 columns and 2048 rows."

Which yes, is missing a lot of detail (you could/I have feed/fed in a datasheet).

But Phind goes off on using pyserial (?!), and GPT4 assumes amaranth is a hypothetical CCD control library and makes a useless class control CCD using the hypothetical library.

Edit - Phind at least acknowledged that amaranth exists, unlike GPT4 with this prompt: "Write amaranth code that can be used to control the readout of a frame from a kodak CCD using an lattice FPGA with 4096 columns and 2048 rows. Assume the design will be hooked up to a larger litex SoC "

mensetmanusman · on Oct 31, 2023

That’s torture for humans as well. The key to LLMs is communicating clearly to the information cloud.

buildbot · on Oct 31, 2023

Sure, but a good example of how far certain domains have to go still. These datasheets should be in the models training data, at least one CCD datasheet, and verilog & (migen | nmigen | amaranth) certainly are.

Controlling a CCD is actually pretty easy, I built (very simple, but working) controllers for several CCD chips in undergrad doing research for the ATLAS detector. You just clock a rows out basically, N columns times. Reset first. I'd expect an senior undergrad EE student to be able to design a simple core in a few class projects.

idonotknowwhy · on Nov 1, 2023

I have no idea what that means (even after googling it) lol. This is how my local WizardLM-70B responded to your prompt.

https://pastebin.com/BCAthV8y

tinco · on Oct 31, 2023

Will you be offering the model as an API service? The product my team is working on would benefit from a significantly faster and possibly better performing model than GPT-4. If you're planning on keeping pace with competitive models we'd love to integrate the use of your model into our service.

rushingcreek · on Oct 31, 2023

If we get enough demand that's definitely something we'll consider. We're still a small team, however, and we do everything in our power to not get distracted from our main mission.

ilaksh · on Oct 31, 2023

Please consider releasing an API. Having a faster alternative to GPT-4 would be amazing for so many use cases.

Especially for agents that do function calling.

mike_hearn · on Oct 31, 2023

If you offer an API then you can be used with tools like https://aider.chat/, which is the best way to use LLMs for coding. But if only available via the web it's not possible. BTW this is the main reason I pay for the OpenAI API.

tinco · on Oct 31, 2023

Makes sense, we're also very small (pre-seed) so definitely no cash cow for you guys yet. We probably shouldn't be prematurely optimizing our prompting performance as it's not really a bottleneck, but a 4x improvement just by swapping an API would be too good not to act on.

halfjoking · on Nov 1, 2023

If you offer an API you don't have to maintain a Visual Studio plugin. Trying to compete with tools like Cursor would be the real distraction.

And Cursor is just the start - there will be innovative workflows built on top of APIs you can't predict. You're missing out not having developers build an ecosystem for you.

karmajunkie · on Nov 1, 2023

just as a point to consider, NOT having an api (and thus no integrations into my editors of choice) is the main reason i haven’t given y’all a fair test run. i’d almost rather not know what i’m missing (though the threads here have convinced me to give it a shot.)

epups · on Oct 31, 2023

In my experience, Phind is not as good as GPT4, but it's by far the second best LLM for programming. I find that tremendously impressive considering they are competing against the whole world for that title right now.

I agree with the assessment about consistency being its major flaw. While with GPT4 I can continue a conversation for quite long, Phind easily looses the required context. Perhaps it has to do with summarization capabilities, or messing with the context window has these types of side effects.

rushingcreek · on Oct 31, 2023

Have you tried clicking the model selection dropdown and enabling "Ignore web results"? That can help with keeping context for complicated design tasks.

mtkd · on Oct 31, 2023

Been using Phind for a bit now and started paying for pro

They're smashing it and can't do enough if you report an issue, also they have started a weekly voice call to discuss algos and such with senior devs, like a surgery, only 10 people join at moment

Don't think I've ever recommended anything as much as I have these guys in last couple of months

notadev · on Oct 31, 2023

I use Phind daily, including the VSCode extension, and I love it. Much better than anything ChatGPT is able to come up and the code it generates requires little-to-no modification to work properly. Very big fan!

rekubot · on Nov 1, 2023

Far as I can tell it isn't possible to hook up the VSCode extension to the Phind model, only GPT-4. Do you know any different?

lgkk · on Oct 31, 2023

First off, congrats on building such a cool product. I love that I can just "jump into it" which is great.

Note that I'm not really a power user of these GPT style tools- here are my questions:

Is it possible to get right to the code without the ELI5 and general information?

Do you guys offer an API? I was browsing on my small iphone so maybe I missed this info.

Could you give an overview for someone like me how something like phind works technically? You mentioned those H100s, but at a very high level without revealing any "secret sauce" how does this GPT work from my input to getting a response?

Good luck!

quickthrower2 · on Oct 31, 2023

Could you open source these great models? OK yes you need a competitive advantage. So maybe open source them when you are say 2 models ahead in production?

In any case I am happy there is some competition and that it has come from a more pragmatic scrappy space than one of the multiple billion dollar funded places.

sounds · on Oct 31, 2023

Can we have a larger discussion about the tradeoffs that come with open sourcing a model?

When fb released Llama they obviously gained a huge amount of developer goodwill but it also required them to invest a serious amount of their own developer time to engage with the community.

I'm asking the community what it can offer the company? Or is this just self-abnegation by the company that releases the model?

selfhoster11 · on Nov 1, 2023

I question the word "required". They, or anyone else releasing an open source product into the world, doesn't owe anyone anything, least of all support. As long as there are enough instructions to run the thing, you are perfectly within your rights to let the community sort out the rest between themselves.

sounds · on Nov 4, 2023

I've noticed that even though "They don't owe anyone anything," the community doesn't actually adhere to it. If they shove code over the wall like FAANG companies do now, it appears to upset the community, who will then treat them with hostility.

poser-boy · on Oct 31, 2023

I don't know what model runs on Phind's site right now, but in August Phind published a fine tune of CodeLlama 34B

https://huggingface.co/Phind/Phind-CodeLlama-34B-v2

shazar · on Oct 31, 2023

I gave it two tries, GPT-4 was much better in both cases. Tried with two Leetcode questions. It came back with an empty response for one, and provided a worse code (O(n2) solutions when it can be done with linear time) for the other one.

GPT-4 on the other hand provided a good answer for both questions. Also I guess the UI is buggy w.r.t code formatting, it things the following line is a code and switches to a code block.

``` You are given an array prices where prices[i] is the price of a given stock on the ith day. ```

The only downside for GPT-4 for me right now, is its slowness.

popularonion · on Oct 31, 2023

GPT-4 has ingested all of Leetcode, you can literally just type "leetcode 100 python" and it will regurgitate a response for you.

Only exception I found is with some of the Leetcode Premium questions, you might have to actually type in the problem statement, but it's still very likely that multiple solutions have been ingested from GitHub and elsewhere.

rushingcreek · on Oct 31, 2023

I suggest you try enabling "Ignore search results" from the model dropdown for these types of questions. The web results can be distracting for the model for Leetcode-type questions.

doctoboggan · on Oct 31, 2023

I see you've had to suggest this a few times in this thread, and in my experience I would agree with the suggestion. I wonder if you can have a simple gpt model decide automatically when ignoring search results would improve the result and do it automatically.