Django Everywhere

Apr 30, 2023

Our goal with Workflows & Automations includes a standardization on Python as our backend language of choice. We wanted to standardize on a common web framework as well. While Flask and FastAPI are the most popular for APIs they tend to have a no-batteries included approach leading to N+1 ways of building software. However, opsZero is building an opinionated stack and this requires standardization and we have chosen Django for our Web Framework.

Our entire business is already reliant on Django as the primary framework so we have extensive experience using it, and absolutely love the built-in templating and ORM with migrations. Lastly, with ASGI a lot of extra features for WebSockets and Events are built into the framework using channels. The built in functionality along with years of Django experience means we can provide Django expertise quickly.

We want to meet our customers needs regardless of the Cloud they are using and with the lowest cost possible so we are making available pre-build templates to use Django in both Serverless as well as Kubernetes environments. We are releasing three templates:

These three templates allow us to deliver value to you faster.

Elon Musk's Engineering Principles

Apr 16, 2023

Think what you want of Elon Musk, but he has achieved quite a bit in engineering novel solutions to complex problems. We've worked mainly on implementing the same process to great effect in what we do. The principles are:

  1. Fix dumb requirements. Each requirement has a specific owner.
  2. Remove unnecessary parts
  3. Simplify/Optimize
  4. Speed up cycle time
  5. Automate

You can watch him describe his process here:

This is how we think about how to apply these principles as it applies to opsZero.

Fix dumb requirements.

When solving a problem for a customer the customer may not actually know what they need. So uncover the actual requirement behind the request. Usually, a problem such as the production database is high CPU and clients can't connect may actually be a root cause issue of the production database is being used to replicate data to a data warehouse which is causing the issue. This root cause analysis can be gleaned through a five whys analysis.

Second, with these requirements there needs to be a clear owner responsible for the issue. If there is not an owner for something then that itself is an issue. Ownership of each component means that someone exists to optimize each piece.

Remove unnecessary parts

Systems over time become complex. Pieces are added that don't need to exist. Or they were added then most likely someone forgot about. Systems over time should get less complex not more so. As we build things we build to get the task done. This means we may add complexity to the system that didn't need to exist but because we are pathfinding our way to the solution that complexity is needed. Once we get to the point of the system working as needed we go back and remove the pieces that are not needed.

Simplify/Optimize

After removing unnecessary parts, there may still be complexity within the current components. To simplify these components, we need to reduce variability and increase standardization. The use of if-else blocks to account for variability can increase complexity. However, simplification by reducing variability requires a subjective decision on the optimal approach. Therefore, it's best to initially build with some variability and refine through A/B testing over time towards the optimal solution.

Speed up cycle time

Once an optimal approach is found, remove variability to have a standardized approach for deliverability. This leads to faster outcomes with fewer if-else blocks creating flow.

Automate

Lastly, automate the processes such that things happen without intervention.

Using Cloudflare D1

Feb 26, 2023

Cloudflare D1 is a great way to quickly create and work with SQLite databases where a larger PostgreSQL or MySQL don't make sense. These are some example to quickly work with D1.

Create the Database and Table

wrangler d1 create data-cloud-vendors
wrangler d1 execute data-cloud-vendors --command='CREATE TABLE Customers (CustomerID INT, CompanyName TEXT, ContactName TEXT, PRIMARY KEY (`CustomerID`));'

Update Data

wrangler d1 execute data-cloud-vendors --command='SELECT \* FROM Customers' --json

Download, Edit Locally, Upload

wrangler d1 backup create data-cloud-vendors
wrangler d1 backup download data-cloud-vendors <backup-id>
sqlite3 file.sqlite3 .dump > schema.sql
# Add a drop table if exists
wrangler d1 execute data-cloud-vendors --file=schema.sql

PostgreSQL Troubleshooting

Feb 22, 2023

Logical Replication

Logical Replication is how data is primarily replicated.

SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');

Inactive Replication

If logical replication is not running, then it is wasting space, and we need to ensure that it is dropped if it is not in use.

SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS replicationSlotLag, active FROM pg_replication_slots ;

# SELECT pg_drop_replication_slot('Your_slotname_name');

Disk Usage

Check what tables are taking up storage space.

SELECT datname, temp_files AS "Temporary files",pg_size_pretty(temp_bytes) AS "Size of temporary files" FROM pg_stat_database ;

SELECT
    pg_database.datname,
    pg_size_pretty(pg_database_size(pg_database.datname)) AS size
    FROM pg_database;

Understand the Sizes of Indexes

SELECT datname, temp_files AS "Temporary files",temp_bytes AS "Size of temporary files" FROM pg_stat_database ;

Cloud Storage AWS S3 vs Azure Storage vs Cloudflare R2

Feb 20, 2023

What storage provider should I use?

It seems more and more that Cloudflare R2 is the best option for most users that don't have strict compliance requirements. With its S3-compatible API, it is easy to migrate existing data and integrate into your existing applications.

What are other considerations when choosing between storage providers?

You may want to consider the usage pattern of your own application. For example, if your app is constantly reading and writing to S3 you may get a better performance with S3 as Cloudflare R2 requests will require reaching outside of the AWS network.

What is the pitfalls of using Cloudflare R2 vs AWS S3?

One of the major considerations of whether to use S3 or R2 is that R2 is a newish platform while S3 has been around for 17 years (as of 2023). There is a lot of knowledge and AWS provides as lot of different ways to optimize S3 for your workloads. R2 may require certain rearchitecture of your platform to use well.

Using PostgreSQL

Feb 14, 2023

Connect to a database through a bastion

SSH\_PRIVATE\_KEY=~/.ssh/id_rsa

RDS\_DATABASE\_HOST=opszero-database.aasdasd.us-east-1.rds.amazonaws.com
RDS\_DATABASE\_PORT=5432
RDS\_DATABASE\_USERNAME=postgres
RDS\_DATABASE\_PASSWORD=postgres
RDS\_DATABASE\_DB=postgres_development

BASTION\_USERNAME=ubuntu
BASTION\_HOST=137.32.32.83

ssh -i ${SSH\_PRIVATE\_KEY} -f -N -L ${RDS\_DATABASE\_PORT}:${RDS\_DATABASE\_HOST}:${RDS\_DATABASE\_PORT} ${BASTION\_USERNAME}@${BASTION\_HOST} -v

# In another terminal

psql "postgresql://${RDS\_DATABASE\_USERNAME}:${RDS\_DATABASE\_PASSWORD}@127.0.0.1:5432/${RDS\_DATABASE\_DB}"


Dump and Restore

pg\_dump 'postgresql://postgres:[email protected]:5432/db' > backup.sql
psql 'postgresql://${RDS\_DATABASE\_USERNAME}:${RDS\_DATABASE\_PASSWORD}@34.29.235.84:5432/restored_db -f backup.sql

Create a User

CREATE USER newuser123 WITH PASSWORD 'foobar123';

# Read only access
GRANT CONNECT ON DATABASE database_name TO newuser123;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO newuser123;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO newuser123;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO newuser123;

# Grant all on database
GRANT ALL PRIVILEGES ON DATABASE database_name TO newuser123;

# Turn User into Super User
ALTER USER newuser123 WITH SUPERUSER;

Useful Stats

-- show running queries (pre 9.2)
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg\_stat\_activity%'
ORDER BY query_start desc;

-- kill running query
SELECT pg_cancel_backend(procpid);

-- kill idle query
SELECT pg_terminate_backend(procpid);

-- vacuum command
VACUUM (VERBOSE, ANALYZE);

-- all database users
select * from pg_stat_activity where current_query not like '<%';

-- all databases and their sizes
select * from pg_user;

-- all tables and their size, with/without indexes
select datname, pg_size_pretty(pg_database_size(datname))
from pg_database
order by pg_database_size(datname) desc;

-- cache hit rates (should not be less than 0.99)
SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit)  as heap_hit, (sum(heap_blks_hit) - sum(heap_blks_read)) / sum(heap_blks_hit) as ratio
FROM pg_statio_user_tables;

-- table index usage rates (should not be less than 0.99)
SELECT relname, 100 * idx_scan / (seq_scan + idx_scan) percent_of_times_index_used, n_live_tup rows_in_table
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;

-- how many indexes are in cache
SELECT sum(idx_blks_read) as idx_read, sum(idx_blks_hit)  as idx_hit, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio
FROM pg_statio_user_indexes;

Deploying to Cloudflare Pages using Github Actions

Feb 14, 2023

Cloudflare provides a great CDN with no egress charges on bandwidth. The best way to use Cloudflare is through Cloudflare Pages. Using Cloudflare Pages should be pretty straightforward for most frameworks that generate a SPA. However, see the example below for how to use Cloudflare Pages from asset pipelines for Ruby on Rails and Django.

Here is an example of using Github Actions to publish Django Static Files

  - name: Build Static Files
   run: |
    docker run --env STATIC_ROOT='/static-compiled/' \
          --env DATABASE_URL='sqlite:///db.sqlite' \
          -v $PWD/static:/app/static -v $PWD/static-compiled:/static-compiled \
          $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
          python manage.py collectstatic --noinput

  - name: Publish Static Files
   uses: cloudflare/[email protected]
   with:
    apiToken: ${{ secrets.CF_API_TOKEN }}
    accountId: ${{ secrets.CF_ACCOUNT_ID }}
    command: pages publish ./static-compiled --project-name=opszero-static --commit-dirty=true

Setting Github Secrets

Feb 14, 2023

Run the following within your repo:

gh secret set nameofsecret "Secret"

This will update the secret to be made available to Github Runners. Alternatively, you can go to the web interface: https://github.com///settings/secrets/actions to update the variable.

And there is Azure coming from behind...

Nov 17, 2022

The Bay Area startup tech stack is MacBooks, Google Workspace, Slack, iPhones, and either AWS or Google Cloud. The rest of the world seems to be Microsoft Windows, Microsoft Office, Microsoft Teams, Android, and an on-site Sharepoint server. AWS has the most to lose as Azure catches up.

My wife recently needed Parallels with Windows installed on her MacBook to use Arcgis. I thought, what the hell, and decided to install Parallels on my own machine because I’ve heard so much about how much better Excel for Windows is than the Mac version. (Yes, I got excited about Excel, so sue me...) So I did it. And having played with Windows for the first time in a decade and a half I have to say I finally get Microsoft’s strategy after seeing this parallel universe.

Microsoft is playing a long game. But their game is to tie everything, and I mean everything, to Microsoft Azure. GitHub, Office, Excel, VSCode, Windows, the Power Platform. Everything at Microsoft seems to have a long game of connecting to Azure consistently. Excel pulls data from Azure, making it an alternative to tools like Tableau. GitHub Actions use Azure for compute,.VSCode seems to be connecting more and more to Azure for easy deployments. Windows seems to have easy corporate deployment options via Active Directory on Azure.

If you are in the Bay Area bubble with the Apple, Google, and AWS tech stack, we may be missing out on one of the significant technological shifts. I am betting the winner, in the long run, will be Microsoft. Microsoft has a huge distribution advantage. Say what you will about Steve Ballmer, but he built a high-power enterprise sales team at Microsoft. Buying a single unified package from Microsoft will, over time, be cheaper than buying piecemeal software from different vendors. This is why Slack lost. But everyone in the bay was scratching their head at why Slack lost because we were looking at Google as the 800-pound gorilla, not Microsoft, which is now the 1200-pound gorilla.

So what is the long-term trajectory? I think from a technological standpoint, Azure will consistently be behind AWS. Microsoft is a close follower, not a leader. So if you want the newest, then AWS will still likely be the primary Cloud provider to use. However, if your company is conservative and doesn’t care about newness, then Microsoft will be just fine. There will be deals put in place that give companies both Azure, Office, and Teams at a rate below what others are offering, and companies will pay for it.

This is all speculative, of course, and Amazon being one of the most innovative companies of our generation, will hopefully give Microsoft a run for its money. But at this point, the two Clouds I am betting on for production, compliance-oriented workloads are Azure then AWS.

Windows Based Crawler

Nov 15, 2022

I like Excel for Windows. The Mac version is a joke compared to what the full-blown Windows version can do with data analysis and data finagling right from the app itself. A lot of what I have been working on as of late has been trying to get data into Excel stored on OneDrive with data crawled using Playwright. The reason for this is that some of the data is small enough that building a full database isn't necessary, and is not normalized enough to just use PowerQuery.

To achieve this outcome I have used Github Actions to trigger the run. Github Actions triggers on a schedule which sends the task to Github Runner which startups a Python script. Since Github Actions has access to the root volume on the Mac Mini (don't worry the machine is dedicated to just Github Actions) I can use xlwings to launch Excel to update. Once completed it just copies the file into OneDrive or Dropbox for me to access elsewhere.

There is absolutely no difference between the hosted runner and the self-hosted runner for this example other than that it just runs on a self-hosted instance that happens to have Excel on it:

  name: Download and Upload
    on:
      schedule:
        - cron: "0 1 \* \* \*"
      push:
        branches:
          - main

    jobs:
      build:
        runs-on: self-hosted
        steps:
          - uses: actions/checkout@v3
          - name: Install Dependencies
            run: |
              pyenv global 3.11
              pip3 install -r ./requirements.txt
          - name: Combine
            run: |
              python ./main.py

1