It will be fun to see how they stand up to Google and Perplexity. I feel they ar...

7thpower · 2024-10-31T17:01:45 1730394105

I have learned to seriously question my instincts on when something is too late as there are many niches to fill and this is likely a building block for broader functionality.

That being said, for all the talk about how bad google has become, I still prefer it to an unbroken bing.

toomuchtodo · 2024-10-31T16:52:42 1730393562

Anyone can compete as long as they have a sufficiently robust crawl dataset as a foundation, no?

baby_souffle · 2024-10-31T17:00:47 1730394047

> Anyone can compete as long as they have a sufficiently robust crawl dataset as a foundation, no?

There's some sticking power/network-effect/sticky-defaults effects, too, though.

It's _trivial_ to do a google search from anywhere on an android device with at most a tap or two. You can probably get close if a 3rd party has a well integrated native app but that'll require work on the user's behalf to make it the default (where possible).

Same goes for the default search engine for browsers/operating systems ... etc.

I will absolutely be firing off queries to google and GPTSearch in parallel and doing a quick comparison between the two. I am especially curious to see how well queries like "I need the PCI-e 4 10-gig SFP+ card that is best supported / most popular with the /r/homelab community" goes. Google struggles to do anything other than link to forums where people are already asking similar questions.

vineyardmike · 2024-10-31T16:59:08 1730393948

Anyone can compete as long as they have a functional URL and web page. Doesn’t make them good competition, and doesn’t mean users will use it.

The issue is that “AI search” has been a hot topic for a while now. Google (the default everywhere) just rolled out their version to billions of users. Perplexity has been iterating and acquiring customers for a while. Obviously OpenAI has great potential and brand recognition, but are enough people still interested in switching that haven’t yet?

jsheard · 2024-10-31T16:57:43 1730393863

A fossilized snapshot will only get them so far, and sites are increasingly opting to block AI-related crawlers. Apparently about a quarter of the top 1000 sites already block GPTBot: https://originality.ai/ai-bot-blocking

I guess they could be using Bing as their search backend, which would mostly get around the blocking issue (except for searching Reddit which blocks Bingbot now).

toomuchtodo · 2024-10-31T17:00:52 1730394052

Certainly, countermeasures against crawler blocking will be a necessary component of effective search corpus aggregation in the go forward. Otherwise, search will balkanize around who will pay the most for access to public content. Common Crawl is ~10PB, this is not insurmountable.

Edit: I understand there is a freerider/economic issue here, unsure how to solve that as the balance between search engine/gen AI systems and content stores/providers becomes more adversarial.

jsheard · 2024-10-31T17:02:41 1730394161

AFAIK OpenAI currently respects robots.txt, so we'll have to see if they change that policy out of desperation at some point.

andrethegiant · 2024-10-31T18:59:02 1730401142

> AFAIK OpenAI currently respects robots.txt

I wonder to what degree -- for example, do they respect the Crawl-delay directive? For example, HN itself has a 30-second crawl-delay (https://news.ycombinator.com/robots.txt), meaning that crawlers are supposed to wait 30 seconds before requesting the next page. I doubt ChatGPT will delay a user's search of HN by up to 30 seconds, even though that's what robots.txt instructs them to do.

echoangle · 2024-11-01T01:10:53 1730423453

Would ChatGPT when live interacting with a user even have to respect robots.txt? I would think the robots.txt only applies to automatic crawling. When directed by a user, one could argue that ChatGPT is basically the user agent the user is using to view the web. If you wanted to write a browser extension that shows the reading time for all search results on google, would you respect robots.txt when prefetching all pages from the results? I probably wouldn’t, because that’s not really automated crawling to me.

claudiulodro · 2024-11-01T14:37:21 1730471841

They do respect robots.txt (supposedly), but they also introduced a new user agent that nobody would yet have in their robots.txt as part of this feature[1], and looking at my server logs it's already crawled a bunch of sites.

[1] https://platform.openai.com/docs/bots/overview-of-openai-cra...

StableAlkyne · 2024-10-31T17:05:45 1730394345

If it ends up anywhere near as popular as Google, those sites will have a financial incentive to allow the crawlers.

The average person just does not discover content without the search engine recommending it.

jsheard · 2024-10-31T17:12:21 1730394741

The whole issue that site owners have with these AI search engines is that there isn't a financial incentive for them to cooperate, since the summarization largely replaces the need for users to click through to the site the information came from. No click-through, no ad impressions, no possibility of the user being converted into a recurring visitor or paid subscriber, just pure freeloading by the search engine.

maeil · 2024-11-01T11:18:50 1730459930

> I feel they are a bit late in the search game

It doesn't matter. Google was late to the browser game, yet now the whole world runs Chrome. Analogy applies 1:1 to OpenAI.

joshdavham · 2024-10-31T17:37:32 1730396252

> excited to see what they cook

Me too! I've really started to dislike Google search recently and am super excited we now have more viable options!