Learnings from schema migrating 100+GB, 400 Million+ rows MySQL table using GitHub's open source gh-ost
Check Reference
A quick summary
As part of my work, I had to run an alter table migrations on a 1.3 TB+ size production MySQL cluster. The migration was done on a 400M+ rows, 100+GB size table. We had zero downtime and no replication lag for all such schema migration activities on our high-availability, self-hosted, primary-replica MySQL cluster. This talk covers how we did this using GitHub's open-source tool - gh-ost and the lessons learned along the way.
More details
Did you ever have to run an `alter table` on a production MySQL table containing a few million rows? If yes, then you know the problem. Alter table in MySQL is mostly a blocking operation, even with Online DDL. It may not be blocking on the primary, but it's blocking on the replicas. It is uninterruptible and causes replication lag on replicas. Your application will experience huge latencies, query failures, and connection terminations if you run a naive alter table on a huge MySQL table on production.
The popular solutions for this problem are Percona's pt-online-schema-change, facebook-osc and GitHub's gh-ost.
In this talk, I will demonstrate how we used GitHub's open-source tool [gh-ost](https://github.com/github/gh-ost) to perform no downtime, no replica lag, schema migration on large MySQL tables (400M+ rows) in production. I will discuss why we chose gh-ost, how it works, and its operational simplicity aspect.
We have been using gh-ost in production for over two years and performed 50+ schema migrations on production (i.e., at least two migrations per month). Some of these migrations run on our largest tables with 400M+ rows and 100GB+ single table sizes.
Here’s a [presentation](https://docs.google.com/presentation/d/1bLbTH5X3JQoNrAYwfotnOgDsHxpP2qtCiwBUADEQ87U/edit) I have prepared (still work in progress) but this should give you some idea what to expect in the talk. I gave this talk at a local meetup earlier (no recording), and it was well received. Although it’s a very specific topic, I will cover design lessons from gh-ost’s operational simplicity that can be applied to many open-source tools we use.