santiviquez's comments

santiviquez · 2024-06-05T21:57:25 1717624645

OP here: I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking at data drift results.

"In my opinion, data drift detection methods are very useful when we want to understand what went wrong with a model, but they are not the right tools to know how my model's performance is doing.

Essentially, using data drift as a proxy for performance monitoring is not a great idea.

I wanted to prove that by giving data drift methods a second chance and trying to get the most out of them. I built a technique that relies on drift signals to estimate model performance and compared its results against the current SoTA performance estimation methods (PAPE and CBPE) to see which technique performs best."

santiviquez · on Aug 9, 2023

The world is a dynamic mess. So, it is natural for things to change. And, don't get me wrong, data drift methods are good tools for detecting those changes.

But in the context of ML monitoring, data drift methods, which are often presented as the go-to solution for detecting performance degradation in ML models are not the right tool.

In my findings, data drift doesn't always imply a decline in the model's performance. Some reasons might be:

- The drifted feature may have low importance for the model's predictions, meaning that changes in that specific feature have minimal impact on overall performance.

- The model might be able to correctly extrapolate or generalize from the available data, even in the presence of drifting features.

- Even if multiple features exhibit drift, the relationship between these features and other relevant features may remain unchanged, resulting in stable model performance.

So, because of these limitations, drift methods often generate lots of false alarms. This makes them somewhat noisy solutions for ML monitoring.

I'm interested in reading your opinions about it and knowing if any of you have experienced something similar.

santiviquez · on June 28, 2023

On 14 June 2023 – The European Parliament adopted the AI Act, with 499 votes in favor, 28 against, and 93 abstentions.

So, it is likely that the EU AI Act will be a reality in the near future. I wrote a guide to understand the implications of this new regulation from the position of a data scientist.

In the article, I explain in simple words what the Act is, its risk-based approach, how to comply with it. And, probably the $1M question, how will this affect your day-to-day job as a data scientist?

santiviquez · on May 15, 2023

I haven't read the book but based on the title I can confirm that you can entertain with a pocket calculator.

On our first date with my now wife I showed her my Casio FX-991ESPLUS Scientific Calculator and we spent at least 30 min playing with it.

saagarjha · on May 16, 2023

Hmm, I guess I need to adapt my dating strategy…

santiviquez · on April 14, 2023

Haha true

santiviquez · on April 14, 2023

Ups that's my bad. Already fixed it. Thanks for pointing it out :)

thrdbndndn · on April 14, 2023

You still didn't fix it. There is no "The University of Monterrey".

"Monterrey Institute of Technology and Higher Education" is a different university from University of Monterrey.

santiviquez · on May 25, 2020

Hi! I am the creator of the game, I'm happy to answer any questions.

The purpose of the game is to learn theoretical concepts by playing trivia nights with your data scientists friends.

santiviquez · on May 23, 2020

Hi! I am the creator of the game, I'm happy to answer any questions.

The purpose of the game is to learn theoretical concepts by playing trivia nights with your data scientists friends.

Some of the questions were created and submitted by experienced data scientist such as:

Jesse Mostipak (@kierisi): Community Advocate at Kaggle.

Goku Mohandas (@GokuMohandas): Founder of madewithml.com. AI research at Apple and Author at OReilly Media.

Rachael Tatman (@rctatman): Developer Advocate at Rasa. Kaggle Grandmaster and former Developer Advocate at Kaggle.

Alexey Grigorev (@Al_Grigor): Lead Data Scientist at OLX Group and author of Machine Learning Bookcamp.