More

matt4711 · 2024-05-05T20:48:05 1714942085

A paper [1] we wrote in 2015 (cited by the authors) uses some more sophisticated data structures (compressed suffix trees) and Kneser–Ney smoothing to get the same "unlimited" context. I imagine with better smoothing and the same larger corpus sizes as the authors use this could improve on some of the results the authors provide.

Back then neural LMs were just beginning to emerge and we briefly experimented with using one of these unlimited N-gram models for pre-training but never got any results.

[1] https://aclanthology.org/D15-1288.pdf

inciampati · 2024-05-05T21:38:44 1714945124

The formulation of a succinct full text index as an "infinite n-gram model" is both genius in connecting with the ML literature, but also blind-eye, in that it ignores all the work on using exactly this data structure (well, the FM-index) over DNA. bwa mem is the OG ∞-gram aligner.

jltsiren · 2024-05-06T01:10:27 1714957827

The idea is far older than BWA. The entire point of the Burrows-Wheeler transform is to create a compact order-k "language model" for every value of k simultaneously. And then use that for data compression.

David Wheeler reportedly had the idea in the early 80s, but he rejected it as impractical. Then he tried publishing it with Michael Burrows in the early 90s, but it was rejected from the Data Compression Conference. Algorithms researchers found the BWT more interesting than data compression researchers, especially after the discovery of the FM-index in 2000. There was a lot of theoretical work done in the 2000s, and the paper mentioned by GP cites some of that.

The primary application imagined for the FM-index was usually information retrieval, mostly due to the people involved. But some people considered using it for DNA sequences and started focusing on efficient construction and approximate pattern matching instead of better compression. And when sequencing technology advanced to the point where BWT-based aligners became relevant, a few of them were published almost simultaneously.

And if you ask Heng Li, he gives credit to Tak-Wah Lam: https://lh3.github.io/2024/04/12/where-did-bwa-come-from

matt4711 · on June 20, 2021

LADWP (LA power provider) has a similar opt-in program for Nest owners with the following conditions:

How the program works

* We’ll adjust your connected thermostat(s) when summer electricity demand is at its highest to help decrease stress on the grid

* Prior to an event, we may lower your thermostat(s) by a few degrees to help maintain comfort during the actual event

* if the temperature in your home feels uncomfortable, you can opt out of an event at any time by adjusting your thermostat.

* You’ll receive a $60 e-gift card at the end of the season for participating.

Signing up nets you another $125 gift card.

matt4711 · on Oct 2, 2020

I find the requirement of using a laptop without being allowed to use an external monitor to view potentially very long documents (half screen!) for hours on a tiny screen to be ridiculous.

matt4711 · on Jan 22, 2020

Looking at the paper, comparing methods that use k bits per hash to a method that uses k*log(m) bits per hash seems unfair and misleading.

matt4711 · on Dec 13, 2018

Funny enough "learning characters/words in the context of the vocabulary they are in" is exactly what NLP machine learning models use to learn "rich" word/text representations based on the "distribution hypothesis" which states that the choice of words in the same context share a common meaning.

matt4711 · on Aug 12, 2018

I'm pretty sure all the twitter datasets violate the twitter TOCs.

JamesMcMinn · on Aug 13, 2018

On a quick pass of the Twitter datasets, they all seem to conform to Twitter's developer Terms.

matt4711 · on Aug 13, 2018

Like the requirement that you have to delete tweets in datasets that have been deleted on twitter?

JamesMcMinn · on Aug 14, 2018

As far as I could tell, none of them actually contain tweets (e.g. any JSON), just IDs, and mostly user IDs at that.

matt4711 · on June 18, 2018

From the first link: "Comparing OFDM to LTE today we find a better scalability to a much lower latency (an order of magnitude lower round-trip time [RTT] than LTE today) in OFDM."

Doesn't LTE already have quite good latency properties?

gballan · on June 18, 2018

As I understand it, not good enough for "vehicular communication, industrial control, factory automation, remote surgery, smart grids and public safety applications" [1]. A lot of the improvement is down to the frame structure (i.e., how user and training data are distributed in time and frequency). See "Frame structure" in [1].

[1] https://www.ericsson.com/en/ericsson-technology-review/archi...

chrispeel · on June 18, 2018

It depends on what you mean by "good". With 5G NR it will at least be possible to get to .5 ms or less.

https://www.edn.com/electronics-blogs/5g-waves/4460346/5G--T...

padelt · on June 18, 2018

One data point: LTE gives me 65-90ms ping RTT in reasonable non-crowded indoor conditions. Subjectively, working interactively via LTE is totally fine for me. Compare that to UMTS (3G T-Mobile DE) where I think RTT was more like 350ms and working interactively is a pain.

dx034 · on June 18, 2018

Anything >10ms can be noticeable if you have a lot of requests or symmetric applications.

But I find reliability the far bigger problem with LTE than performance if everything goes right. Not sure how 5G will perform there.

matt4711 · on Feb 1, 2018

Hetzner is involved in many of the non-related entity transfers.

matt4711 · on Dec 22, 2017

I thought the NVIDIA drivers for the more fancy cards (TITAN etc) are the same as for the gforce cards. Wouldn't this restriction apply to those cards as well? Doesn't make much sense to me...

freeone3000 · on Dec 22, 2017

They can't price-differentiate FP64 compute out, since ML uses FP32 or even FP16. They tried discriminating FP16 performance but frameworks switched to using FP32 units and downconverting to FP16 after. They can't kill FP32 performance since that's used for gaming. They tried killing the virtualization, they tried differentiating based on clustering, they tried every reasonable technical procedure. So now they're falling back to legal means, to defend an artificial price distinction that has no reflection in card features that anyone cares about.

wmf · on Dec 22, 2017

In the future, Teslas will have Tensor Cores and GeForces won't, so deep learning will be much faster (but also much more expensive, so it kind of cancels out) on Tesla cards.

leggomylibro · on Dec 22, 2017

Yeah, doesn't it seem a little weird that they seem to want to enforce the distinction between gaming and industrial cards, but they also went ahead and included tensor cores in that new Titan card?

Did they not think that through and have an 'oh shit' moment when they saw all the news articles or something? Or is this a 'the first hit is free' sort of deal where they want people to learn about using the product on a startup budget, without giving up the ability to squeeze if someone has a good idea and wants to scale?

wmf · on Dec 22, 2017

Titan V is so expensive that it's not cannibalizing anything. Once the GeForce 1180 comes out... they'll make it just slow/expensive enough that it still doesn't cannibalize anything.

Dylan16807 · on Dec 22, 2017

They did take the "GeForce" label off of the new Titan. And the 1180 might not have any tensor cores at all.

solomatov · on Dec 22, 2017

Are you sure that they won't do it? Did they mention it somehow?

AFAIU, tensor cores are just super efficient matrix operations, and they might super useful in gaming applications, for example, physics engines.

a_f · on Dec 22, 2017

They have just released the new Titan V, which has Tensor Cores I believe. That would indicate that they do want to include them in non workstation/dedicated ML cards, no?

Deathmax · on Dec 22, 2017

Except that they dropped the GeForce branding from the Titan V, and appear to be targeting the card for compute at developers/researchers [0].

[0] https://nvidianews.nvidia.com/news/nvidia-titan-v-transforms...

freeone3000 · on Dec 27, 2017

The Titan V is a compute card, the same way the Titan XP and Titan Xs were. Differing factors between them and the xx80[Ti] of the corresponding generation were memory bandwidth and capacity - gaming performance was nearly identical to the geforce. Titans are entry-level compute.

leggomylibro · on Dec 22, 2017

That is probably the point.

They want to sell two fungible products while enforcing a pricing hurdle so that certain types of customers have to pay much more for the same performance.

matt4711 · on Sept 5, 2017

I was at the SIGIR'17 presentation of this paper (won best paper award btw) and have some comments in general:

- They mentioned (from what I remember) that they now use BitFunnel as they core of the complete Bing search engine not just the fresh parts.

- When I read the paper and looking at the code, it looks like their index doesn't include frequency information whereas your PEF code does. It is unclear what was counted in the experiments.

- If you look at the code, they are actually doing much more complicated stuff than just regular bloom filters by "bin packing" the hash positions for each term to reduce false positive rates (see https://github.com/BitFunnel/BitFunnel/issues/278 ). I'm nor sure if it is "fair" to compare a system developed by 10+ engineers over many years to a "phd student" code base developed over short period of time. I think the PEF code is excellent but I'm more talking about that engineering efforts can have a large impact on performance.

- I'm fairly sure you are right regarding the lack of URL-sorting. However, this can have another cause. If you consider Figure 4 in the paper which shows how "higher ranking rows" group documents together to allow faster intersection. URL sorting causes clusters in document-ids. Say, in the example in Fig. 4 there might be a cluster for that specific term for documents 0,1,2,3. This would mean the "higher ranking" row approach becomes worse (more false positives) when clustering occurs in the collection. So while URL-sorting helps PEF, it will most likely make BitFunnel worse.

ot · on Sept 5, 2017

> They mentioned (from what I remember) that they now use BitFunnel as they core of the complete Bing search engine not just the fresh parts.

I find it hard to believe this. Their main index is certainly not all-RAM (there must be some flash and maybe even disk), and the throughput would just not be enough for something like BitFunnel.

> When I read the paper and looking at the code, it looks like their index doesn't include frequency information whereas your PEF code does. It is unclear what was counted in the experiments.

In PEF the frequencies are not interleaved with the postings, so if you don't read them you don't pay any computational overhead (they mention this in the paper). However, it's not clear whether they included them when measuring the space.

> I'm nor sure if it is "fair" to compare a system developed by 10+ engineers over many years to a "phd student" code base developed over short period of time.

I'm not trying to compare the code :) On the contrary, I'm mostly concerned about the behavior as the collection size grows. Gov2, especially if split into 5 pieces, is relatively small.

> So while URL-sorting helps PEF, it will most likely make BitFunnel worse.

That's possible, but I don't see why they could not use different docid orderings for BitFunnel and PEF. If they use the one that is better for BitFunnel, that's not fair to PEF.

matt4711 · on Sept 5, 2017

> I find it hard to believe this. Their main index is certainly not all-RAM (there must be some flash and maybe even disk), and the throughput would just not be enough for something like BitFunnel.

From looking at the github repo it does look like the system runs entirely in main memory.

ot · on Sept 5, 2017

Yes, I meant that I don't think they're holding an entire web index in RAM.