Recent comments

You are here

Intro to Variety - a schema analyzer for MongoDB

I would like to introduce you to a tool I just started using called Variety. This tool allows one ot analyze the schema of a MongoDB Database. While it is true that mongo is a schemaless database it is still important for database administrators to keep records of the layout. This will help the administrator keep track of what sort of data is kept in each database and in each collection. It will also help development teams plan with ease future expansion and scalability options. Variety looks in your database and gives simple statistics on the various keys and statistics on the data that is in each collection.

Variety is maintained on GitHub and I have recently started contributing to the project to help make it better. If you like this project the feel free to contribute in any way that you can even if it is just suggesting features or enhancments. Before I show and tell I would like to mention that the core developers state on their GitHub page that they do NOT recommend using this tool in a production environment, however I still do and plan on using it on a regular basis to track my constantly changing databases.

So lets dive into how it works. There are many examples of usage on the GitHub page but I will cover some basics here for you to get started. First let's highlight some of the great features

  • Great ascii formatted output of collection information
  • See whats types of data are present in each key
  • See the percentage of data for each key
  • Include mongo queries to limit which documents we include in the results
  • Specify maxDepth to use when searching through documents in a collection
  • Analyze only subsets of documents
  • Sort documents with mongo queries
  • Save results to another database for future reference
  • Output information in JSON format
  • No dependencies

Basic Usage

Using variety is easy first we need to download it from the GitHub page which is at
https://github.com/variety/variety

we now need to navigate into the variety folder and we can start using the tool, of course before using it make sure your database is up and running. and you need to be in the folder containing variety.js

you can get basic output using the command


mongo DATABASE_NAME --eval "var collection = 'COLL_NAME' " variety.js

simply replace DATABASE_NAME with the real name of your database and COLL_NAME with the real name of the collection to show information for.
***NOTE In my latest contribution I have added the ability to show information on all collections, it will show any that are not empty, however I have made it not show info for the system.indexes collection, which is a collection made by MongoDB to maintain the collections and database. I have also added the ability to specify an array for var collection, this allows us to analyze more than one collection at a time, when we don't want all of the collections. Soon you should be able to use these features using the syntax of:

all collections


mongo DATABASE_NAME --eval "var mode = 'recursive' " variety.js

array of collections


mongo DATABASE_NAME --eval "var collection = ['coll1', 'coll2', 'coll3'] " variety.js

database using non default port???

No problem you can include mongo queries in the command to connect to your database just like you do in mongo shell


mongo DATABASE_NAME --port 27111 --eval "var collection = 'COLL_NAME' " variety.js

maybe a non default location?


mongo DATABASE_NAME --dbpath /path/to/database/folder --eval "var collection = 'COLL_NAME' " variety.js

want to sort the data

Of course in large collections we will want to sort the data in certain ways! For this simply include another mongo query, the $sort command. Here we sort by the date field in a decending order. the sort command is very powerful if you don't understand it's usage I suggest you read up on the $sort command.


mongo DATABASE_NAME --eval "var collection = 'COLL_NAME', sort = { date : -1 }" variety.js

analyzing subsets of documents
There comes a time that our databases are so big analyzing subsets of documents may be needed, well the maintainers of variety have already thought of that. Again the ability to add queries is a very powerful feature of Variety. we can specify the subdocuments to examine. The following will analyze only documents where caredAbout is equal to true, quite an awsome feature.


mongo DATABASE_NAME --eval "var collection = 'COLL_NAME', query = {'caredAbout':true}" variety.js

JSON outputted data
By default the statistics are outputted to the screen in a nice little ASCII formatted table that looks like the picture at the beginning of this article, which is nice but what if we need JSON formatting? easy fix with this command outputFormat = 'json'


mongo DATABASE_NAME --eval "var collection = 'users', outputFormat = 'json' " variety.js

printing
Printing is easy in linux we can simply pipe the output to a file and print it from there.


mongo DATABASE_NAME --eval "var collection = 'users' " variety.js > FILENAME

One may also mix the options to get more specific in what sort of data they are analyzing, I encourage you to checkout the GitHub site for more information and some other examples. Please take a look at this wonderful tool if you are using MongoDB and let us know what you think. Many thanks to the core contributers of the Variety tool.

Tomáš Dvořák
Wes Freeman
James Cropcho (original creator of Variety)

Tags: 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer