Skip to main content

Simple Script to test Sharding on MongoDB

Sharding, eh? There's so many questions on a daily basis about sharding -
  • What is sharding?
  • How do I do shard?
  • When do I do shard?
    • How do I know I need to shard?
  • How many shards do I need?
  • What shard key should I use?
  • Can I change my shard key?
  • What's a hotspot?
  • How many shards do I need?
  • Do I have a replica set within a shard?
  • etc
 and everyone is unique with a different use-case so the answer isn't always the same.

Here's the official documents page (on sharding) and Kristina's blog, which is simply excellent on so many levels - I recommend reading both links (btw, it'll take a while :). Kristina uses some awesome analogies to explain sharding.

This blog post isn't about the technicalities of sharding, there are much more intelligent people than me who can explain that. I wrote a simple script to learn a bit more sharding and for reproducing issues and I thought I'd share it. It's written in bash because I didn't want to worry about dependencies :)

After you run the script, you should be able to run 
 $ mongo twitter --eval 'sh.status()'  
from your local shell and you should see something like the following indicating that you have now created a sharded cluster with a sharded database "twitter", a sharded collection "tweets"
 MongoDB shell version: 2.0.6  
 connecting to: twitter  
 --- Sharding Status ---  
  sharding version: { "_id" : 1, "version" : 3 }  
  shards:  
   { "_id" : "shard0000", "host" : "localhost:10000" }  
   { "_id" : "shard0001", "host" : "localhost:10001" }  
   { "_id" : "shard0002", "host" : "localhost:10002" }  
  databases:  
   { "_id" : "admin", "partitioned" : false, "primary" : "config" }  
   { "_id" : "twitter", "partitioned" : true, "primary" : "shard0000" }  
     twitter.tweets chunks:  
         shard0000  1  
       { "query" : { $minKey : 1 }, "max_id" : { $minKey : 1 } } -->> { "query" : { $maxKey : 1 }, "max_id" : { $maxKey : 1 } } on : shard0000 { "t" : 1000, "i" : 0 }  

Hopefully it's of interest or help to someone :)

Comments

Popular posts from this blog

MongoDB Authori(s|z)ation

Introduction Having answered numerous questions on the new and old authori(s|z)ation within MongoDB, I thought I'd write a short blog post explaining how things work as there seems to be some confusion. What's New Prior to version 2.4 , there was a very basic sense of "Role Based Access Controls" (RBAC) within MongoDB as there were only two roles - read readWrite which is quite limited. For example, if the user has "readWrite", that user is essentially "root" and the user can add/remove users as well as inserting data into the database, i.e. there is no role segregation. Version 2.4 added in the following 3 core roles - userAdmin dbAdmin clusterAdmin with a notable extension such that there are now 4 roles that apply across all databases - readAnyDatabase readWriteAnyDatabase userAdminAnyDatabase dbAdminAnyDatabase This increased RBAC is a significant improvement from a security perspective in MongoDB. It is imp

Separate MongoDB Syslog by Facility

In my last post , I showed how you can set up MongoDB v2.2 to syslog its logs off to a remote syslog server. As my `tcpdump` snippets show, the syslog messages hit the syslog server tagged as "user.info", which means that they're assigned to the "user" facility with a severity level of "info". I've received a few questions regarding the possiblity of splitting out syslog messages by facility, however, as everything is currently sent to a "user.info" bucket, so-to-speak, this is not possibility. There is a current feature request for this capability and work will be done on this but if this is important for you, I'd strongly encourage you to vote for this feature. In the meantime, however, (whilst not ideal) you can still do some host filtering with rsyslog as outlined here .

What's the point of (InfoSec) Certifications?

Quite recently, my GSE was up for renewal. I'm currently in the middle of transporting my family to another continent and I've slightly more responsibilities work-wise in 2016 versus 2012. However, given the effort and study that it took to get the cert the first time (and to a lesser degree the expense), I figured it was a no-brainer to renew. For me, I've always been a huge fan of the GSE and considered it the epitome of InfoSec certifications, much like the CCIE for (Cisco) networking. Personally, I learn better by "doing" and consider it as the evidence that someone knows their stuff so the "2-day lab" element in the GSE was a both a huge goal and challenge that I was excited about. I talked about the value of "doing" when trying to learn about yourself previously here with the infamous Security Ninja and here on my own blog so there's no point in repeating myself. When I did the GSE, I absolutely loved the hands-on lab mo