>San Francisco-based Twitter also disclosed that it had discovered an error in h...

yellowbeard · on Oct 26, 2017

Monthly actives is generally the primary metric and it's not so easy to calculate.

whatok · on Oct 26, 2017

Quote from http://money.cnn.com/2017/10/26/technology/business/twitter-...

"These third-party applications used Digits, a software development kit of our now-divested Fabric platform, that allowed third-party applications to send authentication messages via SMS through our systems, which did not relate to activity on the Twitter platform," the company explained in its earnings report.

Really seems like something they should have caught earlier.

tshaddox · on Oct 26, 2017

Do you mean it’s not easy to define? It shouldn’t be difficult to calculate any particular metric going forward, but it’s inportant to define what it means to be “active.”

lotso · on Oct 26, 2017

Calculating these metrics at scale is not trivial.

Dylan16807 · on Oct 26, 2017

In real time, yes.

But the user database should already have backups, importing those backups into an analysis server should be easy, and running queries like that on an analysis server should be easy.

Counting messages, or users with X messages, etc. is also largely a function of whether your backup/restore system works. But this time you do it in chunks.

squarecog · on Oct 26, 2017

I helped build Twitter's data platform, 2010-2016.

There isn't an "analysis server" and analyzing user activity is not done on a "user database backup" at Twitter's scale, though indeed that's a common way that would be done for smaller businesses.

By the way, if by user db you literally mean the db with user accounts, that's not the right data source -- you want the user _activity_ db to count active users, and for high-scale applications, those are different things. Presumably user activity updates are orders of magnitude more frequent than user object updates. You don't want to thrash your user db by constantly updating some "last seen at" field. Put that stuff somewhere else.

That said, it's true that counting is simple, it's just a Hadoop / Spark / distributed computing platform of choice job. Filter, distinct, count. It's not even hard in real-time if you have enough ram or are ok with approximate counts with bounded error, thanks to Storm, Heron, Flink, etc.

Defining what exactly constitutes an active user and catching edge cases such as this Digits thing is where things get tricky; the number of weird scenarios that cause under/overcount for what seem like reasonable and straightforward definitions would surprise you.

@baddox nailed it.

Dylan16807 · on Oct 26, 2017

Thanks. Note that I wasn't trying to guess at what twitter does, just to provide a workflow that should be viable almost anywhere, in the absence of easier options. It's good to hear that the underlying idea of "calculating the metric isn't the hard part" is true.

komali2 · on Oct 26, 2017

Oh, fair that that would be a more important metric, but when they said "user base" I incorrectly assumed they meant "all registered users."

colemannugent · on Oct 26, 2017

I bet an engineer noticed and told a manager that the numbers were technically lower, the upper management finds out and decides to release that info when they already have massive mindshare and it won't hurt them as much.

wutwutwutwut · on Oct 26, 2017

And users activates, users logged on last X months, users not deleted, no duplicates caused by some obscure event synchronization issue etc etc. Bugs are easy.

beager · on Oct 26, 2017

I believe they attributed this to MAU/DAU of Digits[0] who are not otherwise MAU/DAU of Twitter. Ostensibly, users who use Digits to sign into 3rd-party apps ride in the same DB as bona fide Twitter users, and they just didn't discount them.

[0]: https://techcrunch.com/2014/10/22/twitter-launches-digits-a-...