More

wakatime · on April 8, 2021

He's using Django, so most likely Django Migrations which is built into that framework. If you're using Flask, you're probably using Alembic with SQLAlchemy. Those are the two main ways to handle schema migrations in Python.

wakatime · on April 8, 2021

I really enjoyed your post too! I would be interested in more details around the "100s of hours". I want to try a k8s setup like yours, but after investing those 100s of hours into my Flask setup it's hard to justify spending that time again for something else when this already works.

Also interested in the costs for your setup. My costs are in my other comment [1].

[1] https://news.ycombinator.com/item?id=26740911

suifbwish · on April 9, 2021

You should be able to run a flask app pretty easily in kube. Basically you would build a docker image containing the app then deploy it with k8 I believe

wmichelin · on April 9, 2021

This is a dramatic oversimplification of how complex it is for a python developer to configure and deploy an application on kubernetes.

suifbwish · on April 9, 2021

If they have the drive to create an entire SAAS app, how is following a a few tutorials on deploying it to a container in k8 too difficult? It only takes 20-30 minutes to setup and there are hundreds of videos and step by step walk through a that hold their hand through it start to finish. Maybe I am over estimating how difficult it is to build an app in Flask then.

rtsil · on April 9, 2021

Building, deploying and getting something that works fine isn't that complicated, but in my experience, without a strong background of the tech (the 100s of hours required), you will lose a significant amount of time, compounded by a high amount of stress and probably money / customer dissatisfaction, when a problem arises (even a trivial one), and that always happen.

wakatime · on April 8, 2021

My one-person SaaS architecture with over 250k users:

* Flask + Flask-Login + Flask-SQLAlchemy [1]

* uWSGI app servers [2]

* Nginx web servers [3]

* Dramatiq/Celery with RabbitMQ for background tasks

* Combination of Postgres, S3, and DigitalOcean Spaces for storing customer data [4]

* SSDB (disk-based Redis) for caching, global locks, rate limiting, queues and counters used in application logic, etc [5]

I like how OP shows the service providers he uses, and why he decides not to self-host those parts of his infra. Also, there's a large up front cost involved for any stack (Rails, Django, k8s). I'd be interested in a more detailed writeup with configs, to try out OP's auto-scaling setup. My configs are linked in the gist below [2] for my non-auto-scaling Flask setup.

I spend about $4,000/mo on infra costs. S3 is $400/mo, Mailgun $600/mo, and DigitalOcean is $3,000/mo. Our scale/server load might be different, but I'm still interested in what the costs would be with your setup.

[1] https://wakatime.com/blog/33-flask-part-2-building-a-restful...

[2] https://gist.github.com/alanhamlett/ac34e683efec731990a75ab6...

[3] https://wakatime.com/blog/23-how-to-scale-ssl-with-haproxy-a...

[4] https://wakatime.com/blog/46-latency-of-digitalocean-spaces-...

[5] https://wakatime.com/blog/45-using-a-diskbased-redis-clone-t...

wakatime · on March 6, 2021

HN is part of the reason I moved to SF back in the day and started my company. My company is the reason I met my wife, because she saw me wearing a t-shirt I made to promote it while walking down the street in SF. You could say HN is the reason I met my wife ;)

wakatime · on March 2, 2021

A related database using ideas from Clickhouse:

https://github.com/VictoriaMetrics/VictoriaMetrics

wikibob · on March 2, 2021

Are you familiar with VictoriaMetrics?

Can you elaborate on how it is similar and dissimilar to Clickhouse?

What specific techniques are the same?

ekimekim · on March 2, 2021

The core storage engine borrows heavily from it - I'll attempt to summarize and apologies for any errors, it's been a while since I worked with VictoriaMetrics or ClickHouse.

Basically data is stored in sorted "runs". Appending is cheap because you just create a new run. You have a background "merge" operation that coalesces runs into larger runs periodically, amortizing write costs. Reads are very efficient as long as you're doing range queries (very likely on a time-series database) as you need only linearly scan the portion of each run that contains your time range.

wakatime · on Feb 26, 2021

These estimates are time spent *thinking* about programming. The average dev doesn't get to code a full 8 hours per day. On average from WakaTime data, devs spend only 1-2 hours per day actually typing code. With 261 working days per year, that would take 26 years to reach 10,000 hours.

For example, the total combined hours spent programming the wakatime.com website over the last 7 years was 4,035 hours.

10,000 hours assumes an 8 hour workday. For coding time like WakaTime measures (actual hands on keyboard time), the goal should be 2,000 hours.

hombre_fatal · on Feb 26, 2021

Am I to believe that you think time spent writing code is the hard or main part of software engineering rather than the thought that went behind that code?

Writing the code is the easy part.

rraihansaputra · on Feb 26, 2021

Different people in different situations use different ways to refine 'the thought that went behind that code'. Some like to write on paper/whiteboard, some like to draw graphs, some like to research for similar solutions on the web, some like to directly code. Time in IDE/Editor is easy to quantify, time thinking about solving the problem is not. That is strictly individual work, though. Time spent convincing the PM/stakeholder to reduce/change scope is one that is very productive but rarely quantifiable/counted.

wakatime · on Feb 26, 2021

> Time in IDE/Editor is easy to quantify, time thinking about solving the problem is not.

That's the key. To find the time thinking just fill in the gaps between the time spent coding. For ex, the GIF in this post https://wakatime.com/blog/27-fill-the-gaps-in-your-coding-ac...

wakatime · on Feb 25, 2021

Yep, same experience with Spaces here. That's why we use Spaces for backups only, since it's very affordable and backups don't need millisecond latency.

https://wakatime.com/blog/46-latency-of-digitalocean-spaces-...

wakatime · on Feb 25, 2021

Maybe they have support tiers? Every time I created a support ticket I got a response within an hour from a technical person. Their support responds much faster than AWS in my experiences.

ivraatiems · on Feb 25, 2021

The responses were fast. They were just also nonsensical.

wakatime · on Feb 25, 2021

Nothing is perfect. We use DigitalOcean Droplets because they're more bang-for-buck than AWS EC2 instances, especially if you're doing a lot of disk IO. However, even though it's more expensive we use AWS S3 instead of DigitalOcean Spaces because it's faster, more reliable, and replicated automatically. I wrote about these decisions recently here:

https://wakatime.com/blog/46-latency-of-digitalocean-spaces-...

wakatime · on Feb 21, 2021

I'm very selective with the external scripts allowed on my websites. Ad networks are notorious for running malicious JavaScript on popular sites like NYTimes[1] and Yahoo[2] home pages. Any plans for an API so sites can receive ad content as JSON and display it without ever executing your external JavaScript? I might consider it for future side projects if I could npm install your client library instead of including an external script tag.

[1] https://www.nytimes.com/2009/09/13/business/media/13note.htm...

[2] https://www.washingtonpost.com/news/the-switch/wp/2014/01/04...

pbalau · on Feb 22, 2021

> Any plans for an API so sites can receive ad content as JSON and display it without ever executing your external JavaScript?

That will be a very easy target for faking impressions...

onion2k · on Feb 22, 2021

That's trivial to solve though. Just don't pay per impression for users who opt for that method of delivery. Pay for clicks instead.

franga2000 · on Feb 23, 2021

That will be a very easy target for faking clicks...

wakatime · on Feb 22, 2021

It's the same JavaScript running, just hosted on your domain instead of externally. The JS shouldn't support eval, which is unfortunately a common way to display ads in networks with embedded external scripts. Version updates can go through your review too.

mike_d · on Feb 22, 2021

The current state of the art in ad fraud detection has basically become "here is a bunch of weird random stuff, lets see if you get the right answer." That stuff is delivered by dynamic JavaScript.

What you are proposing is AMP Ads and is universally hated by advertisers and publishers.

hansvm · on Feb 22, 2021

Speaking as someone who knows approximately nothing about ad fraud, what additional protections exist? The scheme you described could be easily thwarted by appropriately sandboxing their script to modify a shadow DOM instead of the real one (and countermeasures like checking the page once in awhile would just as well apply to a JSON approach).

geocar · on Feb 22, 2021

Ad fraud takes a number of different forms, including:

* Buying a low-value ad (like a banner) and cramming a high-value ad (like a video) in there, and lying to the ad server about the visibility, sound, etc, using JavaScript. Sandboxing is typically stymied by a number of cross-domain limitations in real-browsers we can detect server-side.

* Buying installs for a older/hacked browser (or browser extensions) that has been scripted up to load the ads. People would embed these in screen-savers making real users visit these ad pages when the pc owner was unlikely to be around. They won't have the protection real browsers have, and so can trivially modify the network profile.

* Making a headless browser call ads on pages. These pages "look" valid, and you can visit them to see the ads, but the headless browser has collected cookies from various shopping sites, and uses a number of home/DSL proxy services to obscure detection. For any single impression, they look indistinguishable from legitimate traffic.

These are detected in different ways: JavaScript helps some for the first two, but in the second it's mostly that you're looking for bugs in the implementation (and it's just JavaScript gives you a wider search area). Usually these things are "home grown", so if you've got a wide view of the industry, and can change your scripts frequently, you can "detect" them being built in real-time.

However that last one is tricky, and outside of bugs[1], you're left with timing attacks which I won't enumerate because their obscurity is the strongest protection for continued utility, but in general they work on the principle of leaking some identifying data in HTTP and DNS responses, and relying on the fact that that headless browser needs to call lots of ads to pay for the electricity and Internet that it uses, so we get lots of opportunities for a collision.

[1]: https://geocar.sdf1.org/browser-verification.html

pbalau · on Feb 22, 2021

If I have control over the code that displays the ads, then I can fake the impressions. The past 20 years of development in the ad tech space didn't happen just because, it happened because there is a real problem that needs solving, a problem that constantly evolves.

geocar · on Feb 22, 2021

I've built an ad network that supports exactly this: I give you (the publisher) a bag of JSON or XML that tags up the content, and you decide how to render it. I typically pay on click, but I have paid impressions in some cases where the publisher and I can reach a level of trust. I don't think wakatime.com would be an appropriate publisher for me, but maybe you have other sites that are more appropriate.

My original goal was avoiding ad blockers: By having the publisher render the ad themselves, it doesn't look obviously an ad, and as long as the publisher doesn't make the page itself an ad farm, users do not tend to block with custom CSS (that might end up in popular blocking tools). It seems to work okay- we've been operational for over five years at this point, and I've not seen one of the publisher domains or CSS show up in ublock.

simonmales · on Feb 21, 2021

I used to do ad scheduling 10+ years ago and at least in those times the ads has NOSCRIPT tags.

But for it to really work you would need to store a cookie to correctly redirect the user.

I like your idea of rendering the ads on the server side, but I would hope they would have super low responsive times. Or at least low timeout on your side.

wakatime · on Feb 21, 2021

You can render client side too, the key is no external JavaScript is being trusted to run on the page.

That ofc means the common practice of advertisers pasting a JavaScript snippet, the network doing a review process, then rendering that snippet as an advertisement on some property would not be allowed on this Ad Network.

etaioinshrdlu · on Feb 22, 2021

It seems like iframes may be a better solution. They provide quite strong isolation of inside to outside communication. I worked in the ad tech industry 5 years ago and everything was iframes.

max_ · on Feb 22, 2021

Don't they take a toll on cpu & rendering times?

What do u think of images rendered on the backend?

etaioinshrdlu · on Feb 22, 2021

Yep, they definitely have a performance impact. Plain old image tags are really lightweight and secure. Unfortunately these simple methods are highly susceptible to ad impression fraud. Really, all online advertising is susceptible to fraud, but if you're paying for clicks or impressions, you can tame the fraud with mass amounts of JS, browser sniffing, data collection, aggregate analysis. This is what the large ad networks like Google do and it's a large industry with many actors.

Reducing this data collection and turning to simpler methods like images increases fraud, which decreases the amount honest publishers would earn (likely hugely). So it's definitely doable, but tends to not make much economic sense at large scale.