juan_gandhi: (VP)
Juan-Carlos Gandhi ([personal profile] juan_gandhi) wrote2015-10-01 02:02 pm
Entry tags:

long id

I think I got it where all this bs about passing around numerical ids of entities instead of entity references (maybe lazy) come from. It's like 'error code'. It comes from the ancient c programming, where we just could not allocate a string for a readable piece of text, or for the data that may need some efforts to instantiate or allocate.

In short. It's stupid to pass around "ids" in a program.

[identity profile] exceeder.livejournal.com 2015-10-02 07:08 am (UTC)(link)

1B business transactions. Sometimes tiny transactions in terms of money, but still, they require full traceability. I did stuff for telecom, bidding in ad tech, online screen sharing cloud for X11 3D engineering apps, medical teleradiology etc... Don't get me wrong, I'm not alone. There are a lot of smart people needed to plan things right for this scale.


Typical setup is master-master 64G 32 cpu cores each on SSD RAID. But then again, sometimes there are n of those pairs each per vertical partition set, some stuff goes to Mongo or Aerospike etc. It depends, case by case. You do get like couple dozen rdbms servers in usual production one way or another, no matter how much ppl love No SQL.

dennisgorelik: 2020-06-13 in my home office (Default)

[personal profile] dennisgorelik 2015-10-02 10:02 am (UTC)(link)
Why did you create single 1B records table when you could create 20 * 50M records tables?

At that scale smaller tables typically have faster performance and easier maintenance.

[identity profile] exceeder.livejournal.com 2015-10-02 05:18 pm (UTC)(link)
Ok, say, you partition it vertically (in real world, good luck finding a good way to partition stuff in the first place). But say, it will be just by id from a sequence. And then what? tableId+rowId everywhere? Wouldn't it be much easier to, e.g. have tableId = rowId % 20? But then you need long again, no?

In reality, there is both, vertical partitioning (as you proposed) and sharding (managed by db engine, mostly) employed - but these should be used sparingly and avoided whenever reasonably possible.

This here is a wrong place to discuss this stuff anyways. In the world out there things always turn out way more complex than the books and blogs suggest. Partitioning breaks things, A LOT. But sometimes cannot be avoided after certain size.

I don't mean to insult you in any way, I am sure you are an awesome professional with inquiring mind, I was naive too, but with my first real 4 TB database and over 1k transactions per second came a lot of revelations.

E.g. suddenly awesomely stable, vetted for years db engines turn into pumpkins at 12am -just put some real load, not the kind you can generate from a single test server :(

Cheers!
dennisgorelik: 2020-06-13 in my home office (Default)

[personal profile] dennisgorelik 2015-10-02 08:33 pm (UTC)(link)
> tableId+rowId everywhere?

It could be long in application layer (C#, Java), but then would be converted by algorithm to {TableName + int RowId} when it is a time to retrieve that data.

Or even keep in application layer {TableId, RowId} like you suggested.

I mean the key idea is to use int in database indexes instead of long.


I know that RDBMS starts to have serious issues at scale.
That's why I prefer to use int.
Edited 2015-10-02 20:41 (UTC)