Skip to main content

Simple Script to test Sharding on MongoDB

Sharding, eh? There's so many questions on a daily basis about sharding -
  • What is sharding?
  • How do I do shard?
  • When do I do shard?
    • How do I know I need to shard?
  • How many shards do I need?
  • What shard key should I use?
  • Can I change my shard key?
  • What's a hotspot?
  • How many shards do I need?
  • Do I have a replica set within a shard?
  • etc
 and everyone is unique with a different use-case so the answer isn't always the same.

Here's the official documents page (on sharding) and Kristina's blog, which is simply excellent on so many levels - I recommend reading both links (btw, it'll take a while :). Kristina uses some awesome analogies to explain sharding.

This blog post isn't about the technicalities of sharding, there are much more intelligent people than me who can explain that. I wrote a simple script to learn a bit more sharding and for reproducing issues and I thought I'd share it. It's written in bash because I didn't want to worry about dependencies :)

After you run the script, you should be able to run 
 $ mongo twitter --eval 'sh.status()'  
from your local shell and you should see something like the following indicating that you have now created a sharded cluster with a sharded database "twitter", a sharded collection "tweets"
 MongoDB shell version: 2.0.6  
 connecting to: twitter  
 --- Sharding Status ---  
  sharding version: { "_id" : 1, "version" : 3 }  
  shards:  
   { "_id" : "shard0000", "host" : "localhost:10000" }  
   { "_id" : "shard0001", "host" : "localhost:10001" }  
   { "_id" : "shard0002", "host" : "localhost:10002" }  
  databases:  
   { "_id" : "admin", "partitioned" : false, "primary" : "config" }  
   { "_id" : "twitter", "partitioned" : true, "primary" : "shard0000" }  
     twitter.tweets chunks:  
         shard0000  1  
       { "query" : { $minKey : 1 }, "max_id" : { $minKey : 1 } } -->> { "query" : { $maxKey : 1 }, "max_id" : { $maxKey : 1 } } on : shard0000 { "t" : 1000, "i" : 0 }  

Hopefully it's of interest or help to someone :)

Comments

Popular posts from this blog

Being a Support Engineer @ 10gen - Part 1

There's a mis-conception around the role of a "Support Engineer".  As a clue, it's not what Urban Dictionary   says   - A person whose job is to answer calls from customers of a small- to large-sized company...... They are teathered to a their desk all day via phone headset........ phone jockeys usually hate their jobs.......they are are paid well enough..........until they completely burn out, and hate everyone.   and doesn't always involve this - Image Source: http://half-bakedbaker.blogspot.ie/2009/11/cannoli-and-broken-computer.html As you can see  here , there's lots of open roles in  10gen  and more specifically with 10gen, in  Dublin . I thought I'd write this quick blog to explain what Support Engineers actually do and why I joined 10gen as a "Support Engineer". I could be wrong but didn't Google come up with term " Site Reliability Engineer " to do away with the stigma associated with being a

MongoDB Authori(s|z)ation

Introduction Having answered numerous questions on the new and old authori(s|z)ation within MongoDB, I thought I'd write a short blog post explaining how things work as there seems to be some confusion. What's New Prior to version 2.4 , there was a very basic sense of "Role Based Access Controls" (RBAC) within MongoDB as there were only two roles - read readWrite which is quite limited. For example, if the user has "readWrite", that user is essentially "root" and the user can add/remove users as well as inserting data into the database, i.e. there is no role segregation. Version 2.4 added in the following 3 core roles - userAdmin dbAdmin clusterAdmin with a notable extension such that there are now 4 roles that apply across all databases - readAnyDatabase readWriteAnyDatabase userAdminAnyDatabase dbAdminAnyDatabase This increased RBAC is a significant improvement from a security perspective in MongoDB. It is imp

Separate MongoDB Syslog by Facility

In my last post , I showed how you can set up MongoDB v2.2 to syslog its logs off to a remote syslog server. As my `tcpdump` snippets show, the syslog messages hit the syslog server tagged as "user.info", which means that they're assigned to the "user" facility with a severity level of "info". I've received a few questions regarding the possiblity of splitting out syslog messages by facility, however, as everything is currently sent to a "user.info" bucket, so-to-speak, this is not possibility. There is a current feature request for this capability and work will be done on this but if this is important for you, I'd strongly encourage you to vote for this feature. In the meantime, however, (whilst not ideal) you can still do some host filtering with rsyslog as outlined here .