What do we do about primary keys

  • they’re the default. Just about any RDBMS has them, any ORM will happily use them.
  • they’re fast. Integers take 4 bytes to represent, bigints (as per mySQL type) take 8 bytes. They’re naturally fast for indexing.
  • they’re naturally unique for a table — and that’s usually good enough.
  • they disclose information about your entities (for example, when used in URLs, but as long as they get into some UI, it can be a problem) and helps with scraping data off your db as well as simply leaking metrics (image you have an online shop and someone makes today an order, gets an auto incremented id then makes another order at the end of the month — boom, they know how many orders you got)
  • to debug you need to pass around several pieces of information, which makes things difficult in certain settings (like multi-tenant environments, where a tenant might have their own prod + staging)
  • they’re *really* unique. Running at 16 bytes of randomly generated data, you have a better chance at winning the jackpot 100 times in a row that running into an UUID collision
  • they can be used as such to identify data across tables / databases / servers. This makes transitions and migrations easy (lets say you want to start over with a new db, but without collisions with IDs from your previous db — auto increments are an issue unless you arbitrarily set a different starting point)
  • no meaning can be inferred from them, they can be used publicly without disclosing anything about your system (aside from your use of uuid’s, of course)
  • At 16 bytes, they are usually represented as strings. This is a big performance hit on indexing, particularly if you use relationship. A search by an UUID — by default — is a text search, which is significantly slow.
  • Speaking of mySQL, you need version 8 to be able to partially mitigate the performance impact by changing their representation to binary, but on any generation you need to also call UUID_TO_BIN
  • it’s right there! I mean, you are using one.
  • modern versions (like mySQL 8) can do either autoincrements or UUIDs, with various encodings (binary for UUID, which helps the performance issue)
  • you’re doing it at the lowest layer so consistency is assured (even in clustered environments, as long as your primary key is clustered)
  • leaving it to the RDBMS couples you with that engine. Particularly if using db-generated UUIDs, which aren’t available everywhere just yet.
  • performance: image you want to save related entities. You need one trip to the db to save the first and get its primary key so that you can pass it to the second one to establish the relationship. Second trip is to save the second entity.
  • you’re free to manage relationships outside of RDBMS, for flexibility and performance (no foreign keys, you can save in a single transaction in the above example)
  • you are not tied to a db vendor
  • you’d better use UUIDs — autoincrements managed this way need an extra db call to see what value you can use.
  • it’s not scalable, two copies of your application generating UUIDs with the same algorithm are increasing the risk of collision by 4 and it rises exponentially the more copies you run.
  • unless you write your own implementation of UUID generation, you’re tying yourself to a library vendor.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store