Mongdb for DBAs: Week6

Scalability

Posted by Borg on February 21, 2017

Sharding & Data Distribution

shard key 用于决定文档存于哪个 shard, 分为 range basedhash based。以 range based 为例,一个[min, max]范围形成一个chunk,取一个中间的数字将原来的区块分为两个(splite),再把新分出来的区块发送(migration)给其它shard。

Running

需要三种进程:

  • –shardsvr 实际存储数据的进程
  • –configsvr 一般三个足够
  • mongos –configdb host:port,host1:port1,host3:port3 最好在 27017 默认端口上运行,任意数量,但一般大于1,configdb参数后为configsvr配置服务器地址。

各个进程创建好,初始化replicate set后,连接mongos添加shard,再设置shard key。 如果未设置shard key,则数据默认存储在primary shard,不会分配到其它shard。

sh.addShard("host:port")  # 只要添加每个shardreplicate set里的primary
sh.enableSharding(dbname)
sh.shardCollection("dbname.collection",key,unique)  # unique bool,可选参数
sh.shardCollection("collectionname",{shardkey:1},true)

Shard Key Selection

  • sufficient cardinality: enough values to spread data out
  • consider compound shard keys
  • Is the key monotonically increasing? # 如果sharkey单调增长,最后一个shard将会存储过多数据。注: BSON ObjectId 即 _id 字段会单调增长

Process and Machine Layout

shard以及底下的replicate set应该分散在不同的机器上,而conifg server可以与任意replicate set运行在同一机器上,mongos也可以与replicate set运行在同一机器上,也可以运行在用户的客户端。举例:3 shards, 3 replicate sets each shard, 3 config server and 6 mongos process might use 9 machines.
注意replicate set数量最好是奇数。

Bulk Inserts and Pre-splitting

bulk insert时,会先写入到其中一个shard,再执行split与migrate,如果插入数据的速度大于migrate的速度,这时就应该考虑使用pre-splitting:
sh.splitAt(collection, {key:val})
预先添加分割点

Tips and Best Practices

  • only shard big collections
  • pick shard key carefully (can not be changed once set)
  • consider presplitting on a bulk load
  • be aware of monotonically increasing shard key values on inserts
  • adding shards is fairly easy but isn’t instaneous
  • always connet to mongos except for some DBA work: put mongos on default port
  • use logical config server names