This repository has been archived by the owner on Dec 22, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 224
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit df1dcf5
Showing
20 changed files
with
1,122 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
collections.yml | ||
/.bundle/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
source 'https://rubygems.org' | ||
|
||
gemspec | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
GIT | ||
remote: [email protected]:stripe-internal/mongoriver | ||
revision: d5b5ca1471f9efe7c91b3abe2c26f612a2dd4e9c | ||
ref: d5b5ca1471f9efe7c91b3abe2c26f612a2dd4e9c | ||
specs: | ||
mongoriver (0.0.1) | ||
bson_ext | ||
log4r | ||
mongo (>= 1.7) | ||
|
||
PATH | ||
remote: . | ||
specs: | ||
mosql (0.0.1) | ||
bson_ext | ||
json | ||
log4r | ||
mongo | ||
pg | ||
rake | ||
sequel | ||
|
||
GEM | ||
remote: https://intgems.stripe.com:446/ | ||
specs: | ||
bson (1.7.1) | ||
bson_ext (1.7.1) | ||
bson (~> 1.7.1) | ||
json (1.7.5) | ||
log4r (1.1.10) | ||
metaclass (0.0.1) | ||
minitest (3.0.0) | ||
mocha (0.10.5) | ||
metaclass (~> 0.0.1) | ||
mongo (1.7.1) | ||
bson (~> 1.7.1) | ||
pg (0.14.1) | ||
rake (10.0.2) | ||
sequel (3.41.0) | ||
|
||
PLATFORMS | ||
ruby | ||
|
||
DEPENDENCIES | ||
minitest | ||
mocha | ||
mongoriver! | ||
mosql! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
# MoSQL: a MongoDB → SQL streaming translator | ||
|
||
At Stripe, we love MongoDB. We love the flexibility it gives us in | ||
changing data schemas as we grow and learn, and we love its | ||
operational properties. We love replsets. We love the uniform query | ||
language that doesn't require generating and parsing strings, tracking | ||
placeholder parameters, or any of that nonsense. | ||
|
||
The thing is, we also love SQL. We love the ease of doing ad-hoc data | ||
analysis over small-to-mid-size datasets in SQL. We love doing JOINs | ||
to pull together reports summarizing properties across multiple | ||
datasets. We love the fact that virtually every employee we hire | ||
already knows SQL and is comfortable using it to ask and answer | ||
questions about data. | ||
|
||
So, we thought, why can't we have the best of both worlds? Thus: | ||
MoSQL. | ||
|
||
# MoSQL: Put Mo' SQL in your NoSQL | ||
|
||
![MoSQL](https://stripe.com/img/blog/posts/mosql/mosql.png) | ||
|
||
MoSQL imports the contents of your MongoDB database cluster into a | ||
PostgreSQL instance, using an oplog tailer to keep the SQL mirror live | ||
up-to-date. This lets you run production services against a MongoDB | ||
database, and then run offline analytics or reporting using the full | ||
power of SQL. | ||
|
||
## Installation | ||
|
||
Install from Rubygems as: | ||
|
||
$ gem install mosql | ||
|
||
Or build from source by: | ||
|
||
$ gem build mosql.gemspec | ||
|
||
And then install the built gem. | ||
|
||
## The Collection Map file | ||
|
||
In order to define a SQL schema and import your data, MoSQL needs a | ||
collection map file describing the schema of your MongoDB data. (Don't | ||
worry -- MoSQL can handle it if your mongo data doesn't always exactly | ||
fit the stated schema. More on that later). | ||
|
||
The collection map is a YAML file describing the databases and | ||
collections in Mongo that you want to import, in terms of their SQL | ||
types. An example collection map might be: | ||
|
||
|
||
mongodb: | ||
blog_posts: | ||
:columns: | ||
- _id: TEXT | ||
- author: TEXT | ||
- title: TEXT | ||
- created: DOUBLE PRECISION | ||
:meta: | ||
:table: blog_posts | ||
:extra_props: true | ||
|
||
Said another way, the collection map is a YAML file containing a hash | ||
mapping | ||
|
||
<Mongo DB name> -> { <Mongo Collection Name> -> <Collection Definition> } | ||
|
||
Where a `<Collection Definition>` is a hash with `:columns` and | ||
`:meta` fields. `:columns` is a list of one-element hashes, mapping | ||
field-name to SQL type. It is required to include at least an `_id` | ||
mapping. `:meta` contains metadata about this collection/table. It is | ||
required to include at least `:table`, naming the SQL table this | ||
collection will be mapped to. `extra_props` determines the handling of | ||
unknown fields in MongoDB objects -- more about that later. | ||
|
||
By default, `mosql` looks for a collection map in a file named | ||
`collections.yml` in your current working directory, but you can | ||
specify a different one with `-c` or `--collections`. | ||
|
||
## Usage | ||
|
||
Once you have a collection map. MoSQL usage is easy. The basic form | ||
is: | ||
|
||
mosql [-c collections.yml] [--sql postgres://sql-server/sql-db] [--mongo mongodb://mongo-uri] | ||
|
||
By default, `mosql` connects to both PostgreSQL and MongoDB instances | ||
running on default ports on localhost without authentication. You can | ||
point it at different targets using the `--sql` and `--mongo` | ||
command-line parameters. | ||
|
||
`mosql` will: | ||
|
||
1. Create the appropriate SQL tables | ||
2. Import data from the Mongo database | ||
3. Start tailing the mongo oplog, propogating changes from MongoDB to SQL. | ||
|
||
|
||
After the first run, `mosql` will store the status of the optailer in | ||
the `mongo_sql` table in your SQL database, and automatically resume | ||
where it left off. `mosql` uses the replset name to keep track of | ||
which mongo database it's tailing, so that you can tail multiple | ||
databases into the same SQL database. If you want to tail the same | ||
replSet, or multiple replSets with the same name, for some reason, you | ||
can use the `--service` flag to change the name `mosql` uses to track | ||
state. | ||
|
||
You likely want to run `mosql` against a secondary node, at least for | ||
the initial import, which will cause large amounts of disk activity on | ||
the target node. One option is to use read preferences in your | ||
connection URI: | ||
|
||
mosql --mongo mongodb://node1,node2,node3?readPreference=secondary | ||
|
||
## Advanced usage | ||
|
||
For advanced scenarios, you can pass options to control mosql's | ||
behavior. If you pass `--skip-tail`, mosql will do the initial import, | ||
but not tail the oplog. This could be used, for example, to do an | ||
import off of a backup snapshot, and then start the tailer on the live | ||
cluster. | ||
|
||
If you need to force a fresh reimport, run `--reimport`, which will | ||
cause `mosql` to drop tables, create them anew, and do another import. | ||
|
||
## Schema mismatches and _extra_props | ||
|
||
If MoSQL encounters values in the MongoDB database that don't fit | ||
within the stated schema (e.g. a floating-point value in a INTEGER | ||
field), it will log a warning, ignore the entire object, and continue. | ||
|
||
If it encounters a MongoDB object with fields not listed in the | ||
collection map, it will discard the extra fields, unless | ||
`:extra_props` is set in the `:meta` hash. If it is, it will collect | ||
any missing fields, JSON-encode them in a hash, and store the | ||
resulting text in `_extra_props` in SQL. It's up to you to do | ||
something useful with the JSON. One option is to use [plv8][plv8] to | ||
parse them inside PostgreSQL, or you can just pull the JSON out whole | ||
and parse it in application code. | ||
|
||
This is also currently the only way to handle array or object values | ||
inside records -- specify `:extra_props`, and they'll get JSON-encoded | ||
into `_extra_props`. There's no reason we couldn't support | ||
JSON-encoded values for individual columns/fields, but we haven't | ||
written that code yet. | ||
|
||
[plv8]: http://code.google.com/p/plv8js/ | ||
|
||
# Development | ||
|
||
Patches and contributions are welcome! Please fork the project and | ||
open a pull request on [github][github], or just report issues. | ||
|
||
MoSQL includes a small but hopefully-growing test suite. It assumes a | ||
running PostgreSQL and MongoDB instance on the local host; You can | ||
point it at a different target via environment variables; See | ||
`test/functional/_lib.rb` for more information. | ||
|
||
[github]: https://github.com/stripe/mosql |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
require 'rake/testtask' | ||
|
||
task :default | ||
task :build | ||
|
||
Rake::TestTask.new do |t| | ||
t.libs = ["lib"] | ||
t.verbose = true | ||
t.test_files = FileList['test/**/*.rb'].reject do |file| | ||
file.end_with?('_lib.rb') | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/usr/bin/env ruby | ||
|
||
require 'rubygems' | ||
require 'bundler/setup' | ||
require 'mosql/cli' | ||
|
||
MoSQL::CLI.run(ARGV) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
require 'log4r' | ||
require 'mongo' | ||
require 'sequel' | ||
require 'mongoriver' | ||
require 'json' | ||
|
||
require 'mosql/version' | ||
require 'mosql/log' | ||
require 'mosql/sql' | ||
require 'mosql/schema' | ||
require 'mosql/tailer' |
Oops, something went wrong.