As noted in the article, apparently Google is presenting the top four search engines used in a given country. So presumably this means they're seeing a lot more DuckDuckGo searches in the data they're collecting from Chrome users.
It's also a solid choice for them to hedge against antitrust claims, if they can point to having just added them to their browser, regardless of the fact that Google is the default and they do not present a choice screen like Microsoft had to in the EU.
At its simplest, in the West we have a thing for threes. Three bits of God, three little pigs, three branches of government (in the USA at least), "things come in threes", three books/movies in a series (a trilogy), stories that have a "beginning/middle/end".
Bottom line, the West tends toward organizing and thinking of things in threes. Some might even be superstitious about threes (perhaps a Pythagorean influence).
In China the number 4 plays a similar role. I don't know much about 'numerology' in China, save to say that recently the number 4 (which apparently sounds like 'death' in Chinese) has been considered bad luck. Here's a better explanation than I could give: https://www.quora.com/In-Chinese-culture-why-is-the-number-4...
4 seasons, the 4 corners of the world, the 4 cardinal directions, the 4 bodily humors (https://en.wikipedia.org/wiki/Humorism), you can find a ton of 4's in the West.
In Hong Kong, "Chinese" buildings skip floor 4, 14, ..., while "Western" buildings skip floor 13. More recent culturally inclusive buildings skip floors 4, 13, 14, ... The most prestigious floors are 8 and 88.
To add a complication, Chinese tends to use the US convention (with the ground floor being floor 1), while the English convention is the British one (with the ground floor being floor 0).
By happy coincidence, then, the 13th floor is also 十四樓, ie the 14th floor, so you only need to skip one floor, rather than two. That explains why HK skyscrapers are so high.
3 is important in decision making because it makes it easier to form a consensus. If I disagree with your idea, you have another entity to act as an arbiter.
It was an odd thing to me but Chrome would not list DuckDuckGo until after you had visited DuckDuckGo.com manually. Once on DDG it became an option. That's been around for a while, as I've had DDG as my Chrome default for a couple years. I presume it's now an option even if you've never visited.
Similarly, how many different options are available for similar classes of items found at Costco?
Say, frozen chicken, napkins, instant noodles, paper cups, etc. In some cases there is only 1 option offered, sometimes 2, rarely are there ever more than 4 options offered at Costco for a single type of item. When you trust that you are being offered the best choice or a top choice, well, we know what happens at Costco. People buy pallets in that warehouse.
Yes, but (1) Costco has nowhere near a monopoly on grocery, clothing, and electronics and (2) not even Costco customers do 100% of their grocery, clothing, and electronics shopping at Costco.
Actually consented, as in understood the implications and freely decided that Google should have this data, probably none. That would take a lot of generosity, especially to pay that team of lawyers and technical experts, so that you have any chance of actually understanding the implications.
Unwillingly consented, that's the vast majority of Chrome Sync users. Unless you enable the end-to-end-encryption (for which they require a second passphrase, so probably less than 0.1% actually use that), they will use your data for ad profiling etc.. Yes, that is on page 1312 of the Chrome Sync privacy statement. (They're only required to write it into there, if they do it, so it is quite certain that they didn't just want the bad PR for nothing.)
Is consent required? Assuming they actually do collect this data from their Chrome Sync data or through similar personally identifiable ways, consent would be required in many jurisdictions, especially the EU.
However, if they cared enough, it would be possible for them to collect this particular data point without personal identification.
You could for example create a UUID per installation that's only associated with this one data point.
Or you could have a time-based solution where each Chrome instance goes out to "vote" for their default search engine e.g. every 4 weeks. If you then look at the statistics on a weekly basis, you can just take these values times 4 to even roughly correct numbers. It's certainly going to be representative enough, you don't need every browser instance to have their vote in every week's statistic.
These metrics are from UMA stats. They are collected from everyone who ticks the box to report stats when installing Chrome.
They only get histograms of counts of visits to search engines, not the entire URL, and not search engines or other sites not in the list of things they track (which is at the bottom of the file).
Well, the first question is why are the pages rendering slow to begin with?
One way to make the pages I visit load faster is to disable Javascript. Another is to remove (or block) advertising. Another is to put DNS data for these sites into local hosts or zone files.
Those actions are how I prefer to approach the problem.
However as far as I can tell, those are not actions Google wants to take. They have their own preferred approach.
It is possible there are users who are aligned with Google in terms of how they want to approach the problems created by misuse/overuse of Javascript and advertising.
It is also possible there are some users who have no idea why pages are slow to load.
Those groups might want to send usage data to Google.
However I am not in either group. I dislike the web advertising business that Google depends on and therefore must nourish and support.
As such, there is no reason I can think of why I would want to send data to Google.
Also, I have not checked but I wonder if Google is restricted in how they can use the collected diagnostic data. Are they prohibited from using it for the purposes of selling advertising?
Usage data helps us make UI changes. For example, if not a ton of people are using some functionality, we might prioritize modifying or removing it. When we make a change, seeing how it affected usage is an important part of verifying we did the right thing.
So if Chrome's ever made a UI change you disagreed with, then you're in a group that would have benefitted from sending Google usage data.
Having grown tired of graphical software back in the 90's I have little interest in graphical user interfaces and interactive use. Chrome has never made a UI change I disagreed with because I do not care about the popular graphical browsers.
I care about command line programs, less-interactive and non-interactive use. Truly, the best interface is no interface.
The whitepaper.html appears to explain how usage data is utilised in ways that help Chrome improve but does not appear to contain any restrictions on use of the data to help further Google's ad sales business, whether directly or indirectly.
It is the business model that I do not wish to support.
Producing software such as Chrome is just something the company is doing in the course of selling advertising and collecting maximal amounts of data from users, whether the data is anonymised or not.
> So presumably this means they're seeing a lot more DuckDuckGo searches in the data they're collecting from Chrome users
There's a lot to unpack in that statement... Is there any recent analysis on the usage stats that chrome is reporting back that someone could point to?
No clue at all why politician was downvoted for this.
Isn't it well known that Google scoops up web history from the browser or have they stopped doing/never done this? In the latter case any pointers would be appreciated.
Dunno about currently, but about three years ago, if you opened your own site in Chrome, a couple minutes later you'd get a visit from Google bot on the same url.
== most of the time. The majority of people don't care enough to change the defaults. Most of the time, they don't even think about whether there even is something to change. The overwhelming majority thinks roughly like this: "Ooo, Computer just knows all of this about me? Neat!".
Source: I work in education - even in a highly educated area in a developed EU country, young and old alike think like this.
In the default config the sync data is encrypted end-to-end with the user's Google account password. However, there is also an option to share browser history with Google for telemetric reasons, and it's on by default (regardless of sync encryption settings)
Isn't Chrome Sync opt-out at this point? I seem to recall some small controversy about that a while ago, and setting a passphrase seems like something that it's unlikely most people will do.
A default open browser history synced across devices seems like exactly the sort of thing that would show that DDG has increased its market share.
Dunno about currently, but about three years ago, if you opened your own site in Chrome, a couple minutes later you'd get a visit from Google bot on the same url.
Not only is Google's indexing infrastructure not that fast, but they deliberately don't do that because some poorly designed sites have passwords or unique keys in the URL that should not be used to retrieve content for the public search index.
Regarding "data they're collecting": The list here is based on popularity of search engines in different locales, determined using publicly available data.
I think you're misinterpreting the chart you quoted.
Google search growth rate is always positive in that page. It just decelerated. Growth rate being negative means you're actively losing more users than you gain.
Also, in 2000, the entire Internet had about 10% the number of users it has today (~300M vs around 4B). Still a very impressive number, but the comparison isn't very meaningful.
One argument to be made is that Google Search can only go downwards from here, as it is currently a clear market leader, and the remaining segments are not easy for them to break into.
For example, Baidu has a stranglehold on search in China, and that's not likely to change drastically, with Google facing internal opposition to entering China.
This is true, but Google is also adding more search surfaces (e.g., google home, assistant etc.). So, it's possible that they might attract proportionately more users using these surfaces.
I don’t have any Google home devices, but I’d be curious to know how many of those queries are simply commands “Google play my music” and how many are actual internet searches “Google what is the capital of Alabama”.
How important to Google is their web search product?
I know it was their first product, but I would imagine they get much of their revenue from other avenues, such as Android's built-in totally-not-antitrust web search app, and YouTube and Gmail and web ads...
I haven’t looked at their annual report recently, but back in 2016, advertisement made up a majority of their revenues and profits — around 90% if I recall correctly. I’d be willing to bet that keyword advertisements on search make up a larger portion of that traffic than that through YouTube videos.
Web Search is still Google's unicorn but it is not as much profitable as few years ago mostly because there now better advertising channels like social nets and online videos. Many people today also use price comparison apps instead of web search.
I don't think this really changes anything. It's more important to Google that DuckDuckGo users don't disable Chrome's prediction service, that way they can still collect search data on them. Adding DuckDuckGo as a search engine option whilst they leave the prediction service option intact means that this is nothing more than a publicity stunt. It's actually quite deceiving for many users who do not realise they are still sending data to Google.
Eh? "Use a prediction service" is about whether you send data as you type _to your default search engine_, not to Google. If you change to DuckDuckGo as your default search engine, toggling "use a prediction service" on and off will not send any more or less data to Google, because omnibox typing is never sent to Google in that case regardless.
Source: I am the former Chrome omnibox owner. You can find the relevant code for this starting at https://cs.chromium.org/chromium/src/components/omnibox/brow... ; look for how GetDefaultProviderURL() works and when that query is sent. You can also watch packets with your favorite network analyzer.
Was playing around with the omnibox debug tool (chrome://omnibox) the other day, pretty cool how it inter-ranks literal search, search suggests, history into one.
Hold up, are you saying that users who use DDG are still sending _all_ their searches to Google? I'm not disagreeing but I'd love to see a source for this. It seems to me that if you switch, Chrome should use the DDG autosuggest API [0].
Correct; if you are using DDG as your default search engine, and you enable "use a prediction service", suggest queries are sent to the "suggest_url" configured for that engine. For DDG, that URL is here: https://cs.chromium.org/chromium/src/components/search_engin...
Thank you both for pointing this out. Traditionally this has not been the case. When adding a search engine manually, "suggest_url" is not available as an option, so the prediction service would always send data to Google. It seems like a good thing that they've fixed it for the new DuckDuckGo option, but it's a shame that you still cannot configure this manually.
You're correct that manual configuration of the suggest_url is unavailable ( http://crbug.com/8395 ), but incorrect that the prediction service sends data to Google in the case where that's not configured. In that case, the prediction service is inactive. If you add a manual engine and change your DSE to it, then effectively "use a prediction service" has little effect. (It will only kick in if you do keyword-triggered searches, a.k.a. tab-to-search, on engines that do have a configured suggest_url.)
I agree it would be nice to let people configure this (see comment 12 on that bug), but we're pretty careful about getting privacy right (despite wide-ranging internet claims to the contrary) and "falling back" to Google in that case would be a pretty major gaffe.
I'm not sure that the behaviour you describe has always been present. I may be mistaken, it's been a long time (years) since I've added a search engine to Chromium, but I seem to remember having to manually disable the prediction service. Anyway, in any case, I'm glad that the behaviour is now sane and that there are privacy minded folks like your self working on Chromium. Thank you. It's very reassuring.
I've worked on Chrome since the beginning (I'm a founding team member) and I designed and built the omnibox and wrote most of this code. I'm confident we haven't ever done what you're describing.
We haven't always been perfect. When we first launched, for example, we didn't exclude some cases from suggest querying that we should have, and that was my oversight. I can't remember the specifics (things like https:// URLs or input while in incognito mode, IIRC) but I landed a patch a couple days after the 2008 launch to clean it up.
The Chrome team as a whole is very privacy focused. There's a lot of people in public (including in this article's comments) who think Chrome is some sort of Google data collection device, but having seen things from the inside, I would trust Chrome with my data over any other browser. It makes reasonable tradeoffs by default (e.g. not enabling features like sync or server-side spellcheck that tend to send more data), and what stuff is there that people might not want (e.g. omnibox server-side suggestions) can all be easily disabled.
I have seen the server-side infrastructure, and can say that the data, if it arrives on a Google server, is typically very carefully handled. Claims like "your browser history is available to every employee and sold to partner companies" are categorically wrong.
I just want to point out that you're making a false equivalency. "I would trust Chrome with my data over any other browser" - you don't _have_ to trust other browsers with your data. You can run them without any data collection at all.
False dichotomy. You provide all browsers with data by using them; the question is what they do with that data. Chrome is not materially different than other browsers in the level of control you're able to have over what gets sent elsewhere. You can very easily set it so the only thing the server sees is a "check if an update is available." If you're using Chromium instead of Chrome, then you don't have the updater, so even that is not present.
It's the complete opposite to that, and you said it yourself. Their aim is to quietly retain/recapture users while keeping antitrust at bay, and they did well precisely in not publicizing it.
I'm on my second attempt to use DDG instead of Google. As time goes on, my percentage of searches I use google for ticks higher and higher. I'm starting to intuitively recognize when search results will be garbage with DDG. It's tough because I really want to take back my privacy, but it seems that for 50% of searches, DDG just doesn't get me anywhere near what i'm looking for.
The other day I searched for the website to check a restaurant gift card balance. All of DDGs results were obvious scam webpages. I often search for ElasticSearch documentation. DDG always returns very old versions for these docs, while google returns the most recent version.
1. Hmm, I rarely switch back to Google, and the most recent time I did, it did not deliver better results. It might be that Google has so much information on you that it gives better results (while it, fortunately, has not much information on me, so it has to compete with DDG on an equal footing).
2. I don't use ElasticSearch, but I can tell you that searching the python docs is quite simple in DDG, just throw a !py3 in there to directly search the latest Python 3 docs. Apparently, there's a comparable bang for ElasticSearch, !elastic. But I don't know how well it works (and it's a bit long, really).
DDG is my default search engine, and I really want to use it for privacy reasons. However, I have developed a habit of querying with "!g" to switch the search over to Google.
This has happened because, firstly, I, too, can instantly recognise when results are garbage and so immediately type "!g". Secondly, I know when certain types of searches will be garbage - usually anything related to programming is useless using DDG. So, for work, my default search engine is just Google.
Sometimes, I just query with "!g" without even thinking about it, and at one point I realised I hadn't even been using DDG for several weeks except as a redirect.
Curious to know whether someone has made a website to compare DDG and Google search results side by side. Anyone on HN want to take up that challenge? This story is definitely not the first DDG against Google story in the last few months.
You could do it pretty trivially with a pair of iframes and a text input
EDIT: I tried to do just this, and both of them blocked it :(
Refused to display 'https://www.google.com/?q=cheese' in a frame because it set 'X-Frame-Options' to 'sameorigin'.
Refused to display 'https://duckduckgo.com/?q=cheese' in a frame because an ancestor violates the following Content Security Policy directive: "frame-ancestors 'self'".
Iframes are pretty much dead on the web for this reason. It's kinda lame, because it means the web platform is incapable of making a web browser, which is sort of the 'turing test' for a platform/programming language.
Not true - people have made browsers in Electron. The permission-denial above would have to be respected by the browser itself; it'd be easy to tell the server your iframe doesn't exist in a page from a different domain. The reason it exists at all, I'd assume, is for the security of the person using the browser. A malicious site could embed a legitimate site within itself, for purposes of misleading the user or scraping information. So if your app is using a highly-controlled iframe within itself, and it has the authority to overrule these blockers (which presumably Electron gives it), then you can do whatever you want.
Also FWIW, iframes can still be useful on the regular web for third-party widgets, as well as same-domain pages.
The fact you have to use Electron kind of makes my point...
If the web is a 'turing complete platform', then it should be possible to run a web browser in a web browser. So Chrome inside Chrome. That could be anywhere between the level of 'webassembly to run the whole thing', or it could be at the level of 'iframes give all the necessary functionality'. Today the first isn't viable because webpages can't make raw TCP sockets. The latter isn't viable because of the way sites can differentiate between iframes and the top level window.
If I have to use a fork/modified copy of Chrome for the outer copy with slightly different rules, then it isn't capable of implementing itself.
Imagine if gcc couldn't compile gcc - you needed to use a seperate compiler-compiler. It's the same thing.
Even better is the !sp bang, which is for StartPage (a google search proxy). This way you can get the Google results and still retain some privacy from Google. I also use !w (wikipedia) and !so (stackoverflow) bangs regularly. Finally, the search doesn't have to be prefix with these bangs, it just needs in to be in the search somewhere - I find it quicker to just append it to the end personally.
I use this regularly, and still find it frustrating. It gets old fast to just keep trying a search in DDG only to find it not working and do it all over again with !g
I've been using DDG for the past few years and I think I've lost my Google-fu. I used to be able to get the result I was after in a couple of searches with a few carefully chosen keywords. Now when I strike out on DDG and search Google, I get a bunch of popular stuff with similar words in it, rather than what I'm looking for. Whether that's my fault or Google's, I dunno.
I have started doing this as well, except I'm using KeePassXC and using Dropbox to distribute the file everywhere. Would Bitwarden work behind a company firewall at a company that doesn't allow Dropbox?
I'm the same with KeePassXC but use Resilio Sync for redundancy between devices. There's also SyncThing for a free solution to Resilio Sync but haven't dived too much.
Bitwarden is pretty nice, using it at work. I do still like my control that KeePass gives, though.
I have Chrome for one purpose, and one purpose only: vSphere installations that have yet to be upgraded to include the HTML5 version. Chrome is the "Flash browser".
Over the last few years DuckDuckGo have become so good at handling my queries that I only occasionally use Google. That typically happens when DuckDuckGo doesn't find what I expect, but it always turns out that neither does Google.
wow, this is totally different than my experience. i'm a privacy advocate who defaults to DDG, but I find myself forced to use Google for many technical queries because DDG rarely handles anything except obviously answered searches. side by side comparisons with Google have been devastating, so i switch back and forth depending on technical workload and privacy-interestedness
@wintorez, I started using Brave browser[1] and DuckDuckGo for work and personal. It's based on Chrome but with privacy in mind.
Also, I currently use 1Password, but have been thinking about using Enpass[2] because you can sync with any cloud drive. I like the idea of syncing to a third party cloud drive in case my password service is compromised.
My concern with Brave is them being beholden to the direction of Chromium - I am wrong about that?
I used Brave for awhile then switched to Firefox+uBlock Origin, hoping to do my teensy part in decreasing the market share of Chromium-based browsers while still being privacy-focused.
At the same time, having the same base as Chrome means you won't be left behind when people start only developing for Chrome (which is a problem right now).
Often vulnerabilities go unpatched for days, which is pretty bad when the exact vulnerability exploit code is already visible in the chromium bug tracker!
Heard good things about brave. I like to experience the web though more than one browser, just to see if there are any discrepancies that I'm not aware of.
Curious — why not Firefox and Startpage for work stuff or Firefox and DDG for work stuff? You can always resort to bang commands if DDG results aren’t great for particular searches. You can use the Multi-Account Containers extension (and related container extensions) to have Firefox work for multiple “profiles” of usage.
Or you could even use Chrome and DDG or Chrome and Startpage for work.
Anything where Chrome and/or Google are avoided is a good thing, IMO.
In that case you can probably at least use Chrome and Startpage at work. Startpage.com pays Google for the right to use their search results, so you'll still find your pages about obscure error messages, and Startpage doesn't track you.
I started doing the same. Bitwarden is the best! I urge HN users who haven't checked it out to give it a try and please support the project so it lives on.
Initially I was doing the same, but then switched to using Firefox profiles - 1 for work + google search; and 2 for personal with DDG and ublock origin.
Try the Multi-Account Containers extension on Firefox. It helps isolate sites across tabs and helps avoid the normal need to create multiple profiles. There are several other container extensions (the first and most famous one being Facebook container).
Disclaimer: @yegg, if you're reading this, I'm posting this rant with love.
I am so disappointed with DDG recently, it has adopted Google's strategy of returning searches that have nothing to do with your query if not enough results were found [0], and dialed it up to 11. If "I" "don't" "put" "each" "word" "in" "quotes," the results I get have nothing to do with my search... but if I do that (apart from the inconvenience of it all) it means (presumably?) that stemming isn't done on the search terms.
Maybe I'm old school, but I expect search results to match the search terms. Fuzzy matching (stemming, synonyms) is an added bonus, but silently dropping words which don't appear is decidedly not. Moreover, a search result returning "only" two results should be taken as a good thing for someone with confidence in their dataset (DDG naturally doesn't have that, because their coverage is far from 100% of the web) - it means the search terms were extremely precise and the results are highly relevant, with irrelevant results filtered out. Decreasing the signal-to-noise ratio by willfully ignoring my search terms may increase the quantity of search results but - and I don't know about you - for me I don't care about quantity and would choose relevance as the more appropriate metric to benchmark against.
(All that said, I still use DDG as my main search engine even if I am turning to appending !g far more than I ever used to because I firmly prefer DDG's respect for my privacy and person over Google's treatment of the same. But I'm disgruntled and, frankly, very disappointed. Sorry, @yegg!)
Edit: actually the situation is even worse. DDG doesn't seem to even always respect "quoted" terms. Here's literally the first search I did after posting this [1]. The quoted term "CFF2" doesn't even appear in the majority of the results DDG pulls in - not just not in the page summary displayed, but literally not on the result page at all. For comparison, here's the Google equivalent:
>Moreover, a search result returning "only" two results should be taken as a good thing for someone with confidence in their dataset
I completely agree with you here but in my experience it's not anything new with DDG, that's always been a problem as far as I'm concerned.
As a hobby I sometimes have to reverse engineer electronic circuits, when I'm not sure what a chip does I try to search the inscriptions on the package to see if I can find a datasheet online. Sometimes you end up with very cryptic strings like "xardc10-egh" or whatever. If you input this string on Google it gives you no results:
That being said DDG improved slightly, when I did searches like those a couple of years ago I'd often end up with results containing completely broken encodings, binary dumps as ascii and other obviously erroneous content that got indexed by mistake. Here the results at least appear to link towards proper pages.
Seriously: Where do you get the idea that DuckDuckGo is just Bing? I can't find a single source for that claim. What I can find is a post from Gabriel Weinberg that says that DuckDuckGo is not Bing.
It comes up VERY times DuckDuckGo is mentioned, yet there's not a single source that suggests that DuckDuckGo is just a frontend for Bing.
So again, please provide a source! I can't find one.
It's clear that DuckDuckGo used Bing for some result, but not to what extend. Are all result Bing? Does Bing only provide results when DuckDuckGos own crawler fails? Are the results mixed? I very much get the impression that results are mixed, but that's not completely clear either.
Their own crawler is only used for fluff like widgets. All organic search results are from Bing and Oath:
> In fact, DuckDuckGo gets its results from over four hundred sources. These include hundreds of vertical sources delivering niche Instant Answers, DuckDuckBot (our crawler) and crowd-sourced sites (like Wikipedia, stored in our answer indexes). We also of course have more traditional links in the search results, which we also source from a variety of partners, including Oath (formerly Yahoo) and Bing.
I've seen the ddg bot in my home webserver (with a .com) logs in the past month. I even bothered to check to make sure it's IP matched the ones on the bot about page.
It looks like in Jan 2019, Google Analytics finally started classifying DDG as an Organic Search engine instead of lumping it into "Referrals" category.
Although the change has the awkward effect of splitting ddg reporting into the two groups based on date of traffic.
For those that don't know 'quietly' is newspeak for something that happened that no press release was issued. [1]
[1] Because in the world of the press everything should be announced so they can broadcast it and sell advertising by running stories. And not have to find it out by other more laborious methods.
It is also often followed with a statement that the real authority on the issue didn't even bother responding with any comments, when in all likelihood the journalists also didn't try very hard to reach anyone. As it is in this case.
For those that don't know 'quietly' is newspeak for something that happened that no press release was issued.
This is a deliberate Orwell reference? Vernor Vinge speculated in Rainbows End that everything which couldn't be searched for in a search engine would effectively become invisible. In 2019, that manifests as, "everything which can't be searched for in a search engine, which is backed up by crosslinked mainstream news sites and which isn't warded by words meant to scare casual readers away."
My impression Google helps DDG to become a popular alternative to its own search engine. Why? Just funny fact: In 2018 Google transferred ownership of the domain name Duck.com to DuckDuckGo.
Look, you know that Google in an act of benevolence gave them duck.com last year, that’s PR.
Antitrust angle is obvious. They want to appear they aren’t the only game in town. Esp when you have people like Warren making (hollow) antitrust campaign noise.
Except for all that privacy and functionality and stuff that Bing doesn't have (like bangs, or cursor down + enter to directly go to search result without using your mouse, etc. etc.)
> DuckDuckGo gets its results from over four hundred sources. These include hundreds of vertical sources delivering niche Instant Answers, DuckDuckBot (our crawler) and crowd-sourced sites (like Wikipedia, stored in our answer indexes). We also of course have more traditional links in the search results, which we also source from a variety of partners, including Oath (formerly Yahoo) and Bing.
How does that work with their privacy stance? Do Yahoo/Bing get to keep and use that search data and it's just anonymized, or does DDG pay to keep it untracked?
Kind of disheartening regardless. I assumed they had their own scrappy, independent tech stack.
Headless Chrome is available and widely-used, and commonly you can get around the JS thing by simply waiting a few seconds before scraping. I'd assume the crawling itself isn't the hard part (aside from maybe just the raw compute time it takes).
For the most part yes. They could be getting search results from other paid search engine APIs but you have to balance cost of providing results with ad/affiliate revenue.
You can get paid API search results from Google and Yandex for example just like with Bing (similar prices, different limits). And you can even use Wolfram Alpha API for certain types of queries ("what is apple's average revenue per employee?").
Doing all these would allow you to surface better results than using just one. But it comes at a cost.
Google and Bing are the only ones that matter and you can’t compete with Google by paying them for their search results.
Yandex is Russian and has pretty poor results. And you’d be naive to think that those of us concerned about privacy would ever touch something built on top of it.
In other words there aren’t paid search engines that DuckDuckGo could turn to. Unless they build their own crawler, the only game in town is Bing.
There is no difference in terms of building something on top of Bing or Yandex as your private data never touches their servers. All they get is anonymized stream of queries, in this case from DuckDuckGo.
And yes you can compete with Google (for a certain target group) by paying them for their search results. Results are just a distribution channel, it is what you do with them that matters. For example Google and DuckDuckGo both choose to show you ads and affiliate links but that is hardly the only option.
They're desperate to collect user click data because they know that's the only way they'll have any chance of success. Even anonymized, that's very valuable data.
With some of the huge anti-trust fines levied against Google by the EU, this seems to me like Google trying to support that they are not a monopoly in search.
Purely from a search quality and end user experience stand, I'd choose Google or Bing over ddg.
I have given ddg a shot for over a couple of months. But I found myself using other search engines more often than not for the lack be quality results.
Not the parent, but many searches on technical topics have better quality results on Google (or Startpage, which is a proxy to it).
DDG also doesn’t have searching by time for longer than the last month, whereas many a times when I look for technical stuff I also tend to look for things in the last one year, which Google and Startpage provide.
Here's one I encountered today with an obscure debugging tool. Google gives me a bunch of relevant and useful links, all in the first few results. DDG has one semi-relevant page that links the actually relevant page and the rest are useless. That single result isn't even the first link.
I started to use Chrome only for Google services (gmail, youtube, maps, etc) and Firefox with DDG for everything else. With this setup Goggle can send home only the data they already know.
As someone who supports Firefox, I would say that it’s important to signal Google that there are Firefox users using its services. People have been reporting about issues with some of Google’s services on Firefox. Skype from Microsoft was recently discovered as not supporting Firefox. Every signal users send to these companies matters.
If you don't send back crash reports and statistics to a company, you shouldn't get upset when they stop supporting your usecase or won't fix bugs you encounter.
A hash prefix list gets downloaded locally; Chrome checks locally against the prefix list. If a URL hits, Chrome will send the hash prefix (not the full hash and not the URL) to the server, the server will send back all full hashes that match that prefix, and then the client will complete the check locally.
In theory, if the server had a small number of matching full hashes, it could guess about what URL a client might be hitting, but in practice the system is designed as much as possible to avoid ever leaking data about what you're visiting to Google servers.
Clients download a database of partial hashes of malware URLs. If they get a hit on one of those partial hashes, they make a request for the full list of hashes with that prefix.
Google knows when a client makes one of those requests, but the exact URLs (or hashes) they're looking up are never revealed. The partial hash is 32 bits long, so there's enough collisions that making a request isn't especially revealing.
I seem to recall reading it can be a mix of both, though generally the way you mentioned. A Bloom filter that filters locally, and if it's a hit then it sends over the URL to double-check. Would be nice if someone could confirm though.
Older versions were Bloom filters, but newer versions have moved away from that (and to a list of hash prefixes) because Bloom filters are hard to update.
So if you're web based (like me) then activities such as sending an email, checking out YT, reading HN, watching Twitch, and jerking off, all end up as entries in that log file.
Chrome Sync is so sweet though. It's probably pretty close to impossible to live w/o Chrome Sync. It's my favorite tech feature. The USP for me is "As a user I want switch devices while I'm browsing, so whatever I was reading prior to having to go to the bath room I can seamlessly continue reading."
Yes. As you have pointed out, the ISP can only log a host name [0]. Well, if the user story is porn, then as it happens, host names are pretty darn telling. Also, looking at a list of host names in chronological order might help them to classify me as "closet this" or "closet that", as I find myself less and less inhibited by society's rules the closer I come to crescendo.
So if you're web based (like me) then activities such as sending an email, checking out YT, reading HN, watching Twitch, and jerking off, all end up as entries in that log file.
Do you really think Google would have trained an AI to determine that last activity? How would they have trained the AI?
How graphic do you want me to be? But isn't the real question: is there utility for an ad network to know about your preferences in porn? If there is such utility, you're best to believe Google implemented a way to get them.
If I was an ad network I would love to hear about your porn habits. I would absolutely love it.
If you know, you are obviously one of the people who has that data. If you don't know, you aren't.
If I was an ad network I would love to hear about your porn habits. I would absolutely love it.
Such networks don't have to know whether or not and the exact moment a given user jerks it. Though, it would actually be better if you were actively browsing around and not jerking it currently. I guess that's why discovery and AI are so bad on porn sites. It's actually better for their ad revenue!
Since I've already outed myself as a porn consumer I might as well get this off my chest: the recommendation engines of the video sites I visit (all the big ones) have poor, very poor AI. It's like they don't know me as a customer. I'm out here busting my balls to find "The Perfect Clip" but it's a jungle and they are not making it easy on me.
Presumably you aren't visiting porn sites for the articles. Voila, no AI needed.
Sloppy. How do they know I'm not preparing for actual intercourse? How do they know I'm not downloading porn for later? I could also be watching gun videos, since they've been hosted there.
They have designed the whole android interface to scatter the settings around so it is very difficult to turn off many of the privacy-invasive features. And even if you manage to turn them off, somehow they always end up turning themselves on again.
Everybody working for this company should be ashamed.
> They need it in order to properly train their ad-network.
Given that they scoop up all this data I'd appreciate if their ad-network actually improved. Just the other day the dating site scams where back.
"We'll try not to show it again" they say. Well for vacuuming the market for the best and brightest they either don't try very hard or they are very dysfunctional because they fail as a group.
Imagine that the only thing that is reported to the Chrome backend is a client ID and a URL, which would be the only thing needed to render this app: https://myactivity.google.com
Would you think of Google as trustworthy because they only gave their backend two pieces of data? I myself would not, because I'm pretty sure the actual request and response messages are looked up by client ID (in their Google Analytics data store).
Give Google some privacy of what is being sent home :)
Chrome (and Chromium) creates at start and maintains SSL connections to Google. It is not easy to sniff what is being sent. Even if you MITM it, like in enterprise transparent proxies, Chrome will throw an error because of cert pinning. google domains should be whitelisted: "we recommend that you avoid the use of transparent proxies." https://support.google.com/chrome/a/answer/3504942?hl=en
Chrome has the all powerful "omnibox" that still sends stuff to Google. Since searches or URLs go through the omnibox there's a good chance Google gets (some of) the data.
It's probably not as sinister as I make it out to be but I can think of a few items of data that differ between using DDG's native UI and that of the Chrome search bar: the headers Chrome sends to DDG are different, and the autocomplete results that come back from DDG can now be monitored
Good point, I hadn't thought of the auto complete. I figured you were more referring to the actual query itself which Chrome already sends home to Google via it's cloud synced web history. In which case it does not matter whether you use the omnibar or visit the website directly as long as you have history sync enabled.
I would expect to see a small bump in the stats [1] which given this is DDG's main source of revenue is absolutely a good thing.
"Duckduckgo is one of our main rivals." Is a bit of a self fulfilling prophecy for Google. They need to amp up DDG's legitimacy to ward off accusations of antitrust. Credibility, legitimacy and awareness are really the only things DDG needs to reach a wider audience and gain greater adoption.
Weinberg explained the beginnings of the name with respect to the children's game duck, duck, goose. He said of the origin of the name: "Really it just popped in my head one day and I just liked it. It is certainly influenced/derived from duck duck goose, but other than that there is no relation, e.g., a metaphor."
Most likely they have found another way to get the same data. If not, it will be interesting to see how long it takes until there is a "bug" that reverts the search engine back to Google once they've lobbied enough that they feel safe against antitrust claims.
You can. In fact, Chrome automatically creates search engines for any site you search on. I can search Amazon by typing "Am<tab><query><enter>" in the address bar for example, and Chrome learned how to do that automatically despite not having any knowledge of how Amazon's search system works when I first installed it.
I guess the only difference is that with this change, DDG is available as a search engine by default with a blank install, even before you've actually used it.
I'm pessimistic enough now that my first thought wasn't anti-trust, but rather whether Chrome itself tracks my DDG searches and sends to Google anyway.
Nice try G. I still won't have a byte of your code on anything I use. No data for you. I wish websites stop using your fonts and analytics and captcha too.
Is this the first of a wave of anti-anti-trust moves by big tech? It's a play that I certainly would advise. It makes sense to trade marginal revenue with low hanging fruit gestures like these to take the air out of folks like Warren and the European Competition Committee.
Imagine if there was no "default" search engine or list of suggested search engines in the settings.
I keep a running list of "alternative" search engines -- not directly Google or Bing, not Yahoo -- that work without JS or session cookies. There are thirteen in the list at the moment.
It's also a solid choice for them to hedge against antitrust claims, if they can point to having just added them to their browser, regardless of the fact that Google is the default and they do not present a choice screen like Microsoft had to in the EU.