More

apwheele · 2025-04-23T12:09:56 1745410196

For a newer list I would add the mapbox API as well.

So I work in data analytics, not so much web-mapping. For those applications, IMO local solutions, like ESRI, are good options if you are limited to addresses in the US, https://crimede-coder.com/blogposts/2024/LocalGeocoding.

Googles TOS was that you can't even cache the results, https://cloud.google.com/maps-platform/terms. So saving to a database and doing analysis of your data is not allowed AFAICT.

apwheele · 2025-01-06T15:08:30 1736176110

Care to share the contexts in which someone needs a zero-shot model for time series? I have just never come across one in which you don't have some historical data to fit a model and go from there.

delusional · 2025-01-06T15:15:39 1736176539

In this case I don't think zero-shot means no context. I think it's more used in relation to fine-tuning the model parameters over your data.

> TTM-1 currently supports 2 modes:

> Zeroshot forecasting: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).

> Finetuned forecasting: Finetune the pre-trained model with a subset of your target data to further improve the forecast

apwheele · 2025-01-02T14:03:45 1735826625

I feel the same way, and a good follow up book about more general communication is Trees, maps and theorems by Doumont, https://andrewpwheeler.com/2016/12/05/review-of-trees-maps-a...

That said, I think it is possible my refusal to do cheeky ppt slides with smart art and fill them with graphs of real data instead has stunted my career growth into management.

greentxt · 2025-01-02T14:32:10 1735828330

One problem with both of these takes on powerpoint is they assume it will be presented in person. That's less often the case now. People present more often via teams or zoom and so a lot of the ideas (don't expect people to read and listen simultaneously) are not accurate anymore (half your viewers are audio only, more people get copies of the slides than make the original presentation). Remote vs in person are totally different beasts.

apwheele · 2024-12-03T14:10:57 1733235057

IMO if doing this, you should avoid text in the charts entirely (as the title can sometimes I think lead the models astray, such as the clustering title I think will bias it to find clusters even if none exist). Presuming you are the one making the chart and not just prompting with another image.

I believe the text in the image will be more prone to misinterpretation that direct text in the prompt anyway, https://andrewpwheeler.com/2024/07/16/using-genai-to-describ...

apwheele · 2024-11-19T18:29:33 1732040973

Wayback url, https://web.archive.org/web/20241118131800/https://www.newso...

apwheele · 2024-11-19T17:15:16 1732036516

My companies looks similarish to the recent screenshot, but it is a hellscape of a billion options and poor search functionality. To the extent I just need to ask a person the right link or tree search whenever I need to actually use it.

I don't envy developers who need to work on this, but IMO the best systems I have worked with have a very shallow tree and then a "human will work out the appropriate team to route to".

AlbertoGP · 2024-11-19T22:45:51 1732056351

> a hellscape of a billion options and poor search functionality. To the extent I just need to ask a person the right link or tree search whenever I need to actually use it.

I did some development work for a customer a couple of years ago: I had to take screenshots and bookmaks to even have a chance of finding something a second time!

Once I got to the built-in code editor for the right script it was fine though, I had no trouble with their programming docs.

apwheele · 2024-11-14T16:15:08 1731600908

A tell for fake firms in my local newspaper is they ask for a snail mail resume. These appear to me to be more like shell companies submitting multiple H1Bs as far as I can tell though, not legit firms saying they cannot hire any US.

kyawzazaw · 2024-11-14T17:12:56 1731604376

real firms do this too.

Pick up a local newspaper that is not well known.

apwheele · 2024-10-29T10:29:55 1730197795

I agree understanding KM is a very good place to start survival analysis. Many examples in my business I have for KM the censoring is due to certain events taking along time (auditing healthcare claims) to resolve.

When I first learned survival analysis, my professor had me construct life-tables, and then learned KM. You can often do quite a bit with discrete time tables, so if you have data:

    ID TimeRange Outcome
     A   4          1
     B   3          0

You can then explode the data into the form:

    ID Time Outcome
     A   1     0
     A   2     0
     A   3     0
     A   4     1
     B   1     0
     B   2     0
     B   3     0

If you groupby this table and get the numerator/denominator, that is what you need to calculate the life-table, and the discrete version of the KM plot.

Understanding that also allows you to use more typical binary regression or machine learning models, and then you just calculate the cumulative hazard from the predictions afterwards, https://andrewpwheeler.com/2020/09/26/discrete-time-survival....

apwheele · 2024-10-21T11:00:11 1729508411

Care to elaborate on this? So this post does not save the resulting weight, so you don't use that in any subsequent calculations. You would just treat the result as a simple random sample. So it is unclear why this critique matters.

tmoertel · 2024-10-21T14:22:13 1729520533

Yes, rhymer has hit on the primary tradeoff behind Algorithm A. While Algorithm A is fast, easy, and makes it possible to sample any dataset you can access with SQL, including massive distributed datasets, its draws from the population are not independent and identically distributed (because with each draw you make the population one item smaller), nor does it let you compute the inclusion probabilties that would let you use the most common reweighting methods, such as a Horvitz–Thompson estimator, to produce unbiased estimates from potentially biased samples.

In practice, however, when your samples are small, your populations are large, and your populations' weights are not concentrated in a small number of members, you can use Algorithm A and just pretend that you have a sample of i.i.d. draws. In these circumstances, the potential for bias is generally not worth worrying about.

But when you cannot play so fast and loose, you can produce unbiased estimates from an Algorithm A sample by using an ordered estimator, such as Des Raj's ordered estimator [1].

You could alternatively use a different sampling method, one that does produce inclusion probabilities. But these tend to be implemented only in more sophisticated statistical systems (e.g., R's "survey" package [2]) and thus not useful unless you can fit your dataset into those systems.

For very large datasets, then, I end up using Algorithm A to take samples and (when it matters) Des Raj's ordered estimator to make estimates.

(I was planning a follow up post to my blog on this subject.)

[1] The original papers are hard to come by online, so I suggest starting with an introduction like https://home.iitk.ac.in/~shalab/sampling/chapter7-sampling-v..., starting at page 11.

[2] https://cran.r-project.org/web/packages/survey/survey.pdf

apwheele · 2024-10-21T10:57:57 1729508277

Very nice, another pro-tip for folks is that you can set the weights to get approximate stratified sampling. So say group A had 100,000 rows, and group B had 10,000 rows, and you wanted each in the resulting to have approximately the same proportion. You would set the weight for each A row to be 1/100,000 and for B to be 1/10,000.

If you want exact counts I think you would need to do RANK and PARTITION BY.