MySQL at Google by Mark Callaghan
Mark Callaghan is taking the stage to present his Key Note at the MySQL Conference and Expo, “This is not a web app: The continuing evolution of MySQL at Google.”
I am going to take notes as fast as I can. Excuse any typos etc.
Mark worked on DBMS internals at Informix and then at Oracle. He worked on embedding BerkeleyDB at a startup. He joined Google in August, 2005. At Google his team is working to enhance MySQL and to support a large production deployment. He blogs at mysqlha.blogspot.com and has helped publish Google patch for MySQL at code.google.com. In addition, he agitates for MySQL.
What is MySQL at Google?
He will give details but some numbers he won't give. It is a large MySQL deployment. The QPS rate is tremendous. The number of machines they use is reasonably large. MySQL is used in a large, important enterprise deployment. They run many commodity machines. Google depends on replication, InnoDB and stability. MySQL is sharded with many replicas per shard. At Google, database service must always be available. They have been successful with it and happy with the results.
The database itself is providing change management. If you just push changes, you are more than likely to have a debugging nightmare.
A number of replicas can be connected to MySQL without crashing the master. You'll be surprised at how many replicas can be deployed.
MySQL is solid and easy to improve. InnoDB from Heikki Tuuri and company is amazing. Inspiration provided by Yasufumi Kinoshita and Percona. InnoDB is the most beautiful database software Mark has worked on and he has worked for a few database companies.
Prehistory
- MySQL 4.0 and Innodb arrive
Consistency matters most. When chosing between consistency and availability, you want to be consistent. You shouldn't have two servers claiming to be masters. Generally, the full schema is understood by few people.
Audit is a big concern. Who is doing what change? Legacy is a another concern.
Control is an issue. You have to show you can control access to the database.
Finally, the focus is on transactions, they don't want to lose any data.
Data quality is important to Google.
How do we build this?
A bad build ruins everything. He inherited a dedicated build machine. They moved to hermetic builds and cross-compilation fun and eventually learned to love autoconf.
How de we test this?
MySQL has a suite of regression tests but they are easy to pass. They have queries running in production, how can they use those? They sample queries in production using a Python script and then replay them to simulate sample production workloads.
They built stress tests generally around replication. If you kill a slave, it can come back and start from where it left off.
Use valgrind
Eventually they realized that MySQL has valgrind and started using it.
They also discovered the value of compiler warnings.
How do we deploy this?
Simple approach is put it out there and hope for the best. Search of error log files is automated. On a daily basis, crashes are categorized. Machines are removed automatically removed from service. Finally, they have automated replacement of machines.
How do we monitor this?
He has a feature request: SHOW USER STATISTICS.
They archive SHOW PROCESSLIST and SHOW STATUS. Add SHOW USER_STATS and SHOW TABLE_STATS. It's amazing what you can do with awk and bash. They prefer to take a top-down approach for monitoring. They generate daily and weekly load reports, including QPS. QPS on critical servers was going 2x per year. After deploying a better monitoring tool, they determined it was queries that weren't really crucial for those servers.
How do we improve this?
Understand your problems and deploy what you build. If you are just building and not deploying, you are not going to learn the tradeoffs. Also, monitor to learn what the problems are.
Replication features added At a high level, they are slowing moving towards self healing.
At a low level, somewhat crash-safe slaves. They use mirror binlog which keeps a copy of master binlog on slave. Other fewatures include semi-sync replication, binlog event checksums and global transaction IDs. They are currently in the process of having fully crash-safe slaves.
They have monotonically increasing global transaction IDs.
Performance features added
high level
low level
high level
Row-change logging
Online data drift:
- how do you compare continuously updated tables?
- technique is similar to mk-table-checksum
- deployment is more complicated
More to life than software development
Engineers at Google
Production crises
new features
new hardware
out of time. :)
Thank you for sharing, Mark! This was an informative session (of course, it would be great to actually get some numbers but still ...)
I am going to take notes as fast as I can. Excuse any typos etc.
Mark worked on DBMS internals at Informix and then at Oracle. He worked on embedding BerkeleyDB at a startup. He joined Google in August, 2005. At Google his team is working to enhance MySQL and to support a large production deployment. He blogs at mysqlha.blogspot.com and has helped publish Google patch for MySQL at code.google.com. In addition, he agitates for MySQL.
What is MySQL at Google?
He will give details but some numbers he won't give. It is a large MySQL deployment. The QPS rate is tremendous. The number of machines they use is reasonably large. MySQL is used in a large, important enterprise deployment. They run many commodity machines. Google depends on replication, InnoDB and stability. MySQL is sharded with many replicas per shard. At Google, database service must always be available. They have been successful with it and happy with the results.
The database itself is providing change management. If you just push changes, you are more than likely to have a debugging nightmare.
A number of replicas can be connected to MySQL without crashing the master. You'll be surprised at how many replicas can be deployed.
MySQL is solid and easy to improve. InnoDB from Heikki Tuuri and company is amazing. Inspiration provided by Yasufumi Kinoshita and Percona. InnoDB is the most beautiful database software Mark has worked on and he has worked for a few database companies.
Prehistory
- MyISAM and another RDBMS were used.
- Not much is known about that time.
- Search for "xooglers mysql"
- MySQL 4.0 and Innodb arrive
Consistency matters most. When chosing between consistency and availability, you want to be consistent. You shouldn't have two servers claiming to be masters. Generally, the full schema is understood by few people.
Audit is a big concern. Who is doing what change? Legacy is a another concern.
Control is an issue. You have to show you can control access to the database.
Finally, the focus is on transactions, they don't want to lose any data.
Data quality is important to Google.
How do we build this?
A bad build ruins everything. He inherited a dedicated build machine. They moved to hermetic builds and cross-compilation fun and eventually learned to love autoconf.
How de we test this?
MySQL has a suite of regression tests but they are easy to pass. They have queries running in production, how can they use those? They sample queries in production using a Python script and then replay them to simulate sample production workloads.
They built stress tests generally around replication. If you kill a slave, it can come back and start from where it left off.
Use valgrind
Eventually they realized that MySQL has valgrind and started using it.
They also discovered the value of compiler warnings.
How do we deploy this?
Simple approach is put it out there and hope for the best. Search of error log files is automated. On a daily basis, crashes are categorized. Machines are removed automatically removed from service. Finally, they have automated replacement of machines.
How do we monitor this?
He has a feature request: SHOW USER STATISTICS.
They archive SHOW PROCESSLIST and SHOW STATUS. Add SHOW USER_STATS and SHOW TABLE_STATS. It's amazing what you can do with awk and bash. They prefer to take a top-down approach for monitoring. They generate daily and weekly load reports, including QPS. QPS on critical servers was going 2x per year. After deploying a better monitoring tool, they determined it was queries that weren't really crucial for those servers.
How do we improve this?
Understand your problems and deploy what you build. If you are just building and not deploying, you are not going to learn the tradeoffs. Also, monitor to learn what the problems are.
Replication features added At a high level, they are slowing moving towards self healing.
At a low level, somewhat crash-safe slaves. They use mirror binlog which keeps a copy of master binlog on slave. Other fewatures include semi-sync replication, binlog event checksums and global transaction IDs. They are currently in the process of having fully crash-safe slaves.
They have monotonically increasing global transaction IDs.
Performance features added
high level
- run (much) faster on multi-core and multi-disk servers
low level
- faster read-write mutex for InnODB
- less contention for InnoDB mutexes
- faster IO for InnoDB
- faster parser,
- integration with tcmalloc
- backport connection pool from MySQL 6 to 5.0.
high level
- make it easier to support MySQL
- a UI is nice, but never enough. they focus on command-line more.
- SHOW USER_STATS, SHOW TABLE_STATS
- archive and query SHOW PROCESSLIST samples
- roles for authentication and access control
- tools for monitoring servers.
- row-change logging
- online data drift checking (based on Maatkit but with more features)
- Flapulator (to look for crashes)
- Query tester (runs queries in parallel and compares the results to ensure new binary is getting same results as old binary).
Row-change logging
- keep MySQL and something else in sync
- similar to row-based replication
- fast parallel queries
- change noditifaction is possible
- full audit log.
- allows for real time replication
Online data drift:
- how do you compare continuously updated tables?
- technique is similar to mk-table-checksum
- deployment is more complicated
More to life than software development
Engineers at Google
- SRE: Site reliability engineer
- SWE: Software engineer
- SRE focus on running services in production.
Production crises
- OOM v1
- unstable kernel
- yes we have no core files
- we don't need no stack traces
- underpowered slaves
- database load growth
- query storms
- corrupt relay log events
- failover
- long lock waits
- OOM v2
new features
- online schema change
- failover automation
- frameworks to manage large deployments
- synchronous replication
- minimize replication delay on a loaded server
new hardware
- performance on multi-core and many-core servers
- performance with SSD
- performance with too much RAM.
- run a server in production
- work at a large systems company
- learn about a DBMS other than MySQL
out of time. :)
Thank you for sharing, Mark! This was an informative session (of course, it would be great to actually get some numbers but still ...)
Labels: google, markcallaghan, mysql, mysqlconf






