Statistics and final summary report for my canceled hobby project

Made some forecasts based on the data in the statistics & logs of the closed project, and it seemed that the project was down trending on all PKI values anyway, so the time for closure couldn't have been better anyway. If the trend line would have been very different, it could have also improved my motivation to tinker.

Final statistics before site closure:

  • About 75 000 images (user content).
  • About 40 000 active user linked records (user linked data, like messages).
  • About 29 000 user linked records were pruned (inactive).
  • About 12 000 active users (note this is active users in 3 months).
  • About 100 daily active users.
  • About 28 000 inactive users were deleted.
  • About 2 300 active users with detailed user profile. About 16 000 inactive user profiles were pruned.
  • About 1 000 user feedback (on other users) were in the database at highest time. Prune statistics collection code failed for this, so no prune data. But I've got total counts and number of new entries.
  • About 800 user feedback were pruned. It's important to notice that feedback were only pruned for inactive users, and it's highly likely that the users with most feedback, won't become inactive. Because feedback is mostly linked to the small number of most active accounts.
  • About 300 megabytes of data in database (excluding the images!)
  • About 8 gigabytes of image files (user content).

All the features I had designed and tested for the site, worked. It seems that there actually was no code save the feedback prune statistics at all, only the column was present in the statistics database.

Backend system was configured to harvest the whole source data network every 15 minutes for active data sources. With inactive data sources there were quadratic backoff. Also indirect hints were used to re-active data source, if new fresh references to it were seen on the network.

Site contained internal scheduler, task queues and retry logic modules and all the basic stuff. Statistics collection, logging and so on. Telegram, Twitter and Mailgun (email) integration as well as messaging integrations using proprietary API. Possibility for users to order notifications over twitter, telegram, or email. For integrators there were two options asynchronous Pub/Sub (RESTful JSON push), real-time feed over WebSocket, or fetching data using API and timestamp. Several off-site daily backups with history retention. Multi-process, multi-threading, Python code utilizing pools efficiently. PostgreSQL database with full text search capability and using Peewee ORM. Bit under 100 database tables and lot of joining. Many more or less interesting optimizations to improve performance (which wasn't actually necessary, but for fun) and so on.

Images were placed in directories with 4096 image files / directory to create directories with suitable number of files. This is the reason why the images weren't in the database. Also saving quite non mutable large blobs in database doesn't make sense afaik. Because the file system is designed exactly for that and backing up stuff is much easier in this way.

Above information was based strictly on the database of the system. Then there are the web front end statistics. I'm going to provide only a small snapshot here about those. Let's see. These statistics are also aggregated on hourly level for archiving. And this data is based on the last week average when the site was operating.

All the larger search engines indexed the site daily in active fashion producing several hundred hits.

Daily statistics:

  • Average daily (individual) visitors: 1 200
  • Average daily HTTPS requests: 12 000
  • Average daily HTTPS bandwidth: 200 megabytes

But as mentioned, this was just a hobby / learning experiment.

Other related statistics stuff:

  • Source rows: 7 700 rows of Python code (excluding non project related libraries)
  • Source bytes: 304 kilobytes (excluding libraries, including only my code)
  • Template rows: 2 636 rows (mostly HTML, mixed with bit of Python)
  • Template bytes: 228 kilobytes
  • Some CSS and JS: not so much ...
  • Git commits: 1 304 commits total

Yeah, writing and getting all that done must have take a while. ;) But it was fun!

Costs? A few euros per month for cloud server + lots of time learning and testing all the stuff. But hey, otherwise weekends and evenings are way boring.

Do I have charts? Sure I do and I created several. Some using Tableau and some using LibreOffice Calc (yay!) but I've got all the history on hour level in database. But I'm not going to share those with wider audience. Also the site had of course live statistics page being updated hourly with graphs history charts with selectable intervals, like hourly, daily, weekly, monthly and yearly, etc.

During the project I talked a lot with great tech, startup and future technology minded guys, awesome team!

Ref: Hobby projects page

2020-02-23