If they’re that incompetent that they need to fake their database and think it’s a good idea, there’s little surprise that they’re also too incompetent to realize that task is only a couple hours of coding.
"""
After the August 3, 2021 Zoom meeting, the Data Science Professor returned a
signed version of Frank’s NDA. The Data Science Professor’s usual hourly rate was $300.
Javice unilaterally doubled the Data Science Professor’s rate to $600.
[...]
Specifically, on August 5, 2021 at 11:05 a.m., the Data Science Professor
provided Javice an invoice for $13,300, documenting 22.17 hours of work over just three days.
The invoice entries show that the bulk of his time was spent on the main task that Javice retained
the Data Science Professor to perform – making up customer data. The Data Science Professor’s
invoice indicated that he performed “college major generation” and “generation of all features
except for the financials” while creating “first names, last names, emails, phone numbers” and
“looking into whitepages.”
In response to the initial invoice, Javice demanded that he remove all the details
admitting to how they had created fake customers – and added a $4,700 bonus. In an email to
the Data Science Professor at 12:39 p.m. on August 5, 2021, Javice wrote: “send the invoice
back at $18k and just one line item for data analysis.” In total, Javice paid the Data Science
Professor over $800 per hour for his work creating the Fake Customer List, which is 270% of his
usual hourly rate.
The Data Science Professor provided Javice the revised invoice via email seven
minutes later at 12:46 p.m., commenting “Wow. Thank you. Here is the new invoice.”
"""
it sounds like his initial invoice was quite clear in the work completed, then updated at the client's request. So while you can argue moral grounds for not doing this work, I don't think there's illegality, i.e. conspiracy.
I mean if you are a professor and knowledgeable in how the startup uses the data, it’s hardly justifiable that “oh crap i didn’t know they were using it for illegal purposes”.
This is spoken to [in the full complaint][1]. The data scientist was told Frank really did have 4 million users, and the scientist only needed to generate this "synthetic data" as a way to "anonymize" their "real" data. I.e. the scientist was duped:
JAVICE told Scientist-1 [...] that she had a database of approximately 4 million
people and wanted to create a database of anonymized data that mirrored the
statistical properties of the original database (the “Synthetic Data Set”).
[After JAVICE sends Scientist-1 the data], Scientist-1 understood that the data
available via the Access Link Email -
**a data set of approximately 142,000 people** (emphasis added) -
was a random sample of a larger database which contained data for approximately
4 million people. In fact, that data represented every Frank user who had at
least started a FAFSA.
I read in an earlier report that their own developers refused to do the task. [1] Not clear if the professor knew what the fake data was being used for.