MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

Overview

mongo-connector

The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Plc.

View build status

For complete documentation, check out the Mongo Connector Wiki.

System Overview

mongo-connector creates a pipeline from a MongoDB cluster to one or more target systems, such as Solr, Elasticsearch, or another MongoDB cluster. It synchronizes data in MongoDB to the target then tails the MongoDB oplog, keeping up with operations in MongoDB in real-time. Detailed documentation is available on the wiki.

Getting Started

mongo-connector supports Python 3.4+ and MongoDB versions 3.4 and 3.6.

Installation

To install mongo-connector with the MongoDB doc manager suitable for replicating data to MongoDB, use pip:

pip install mongo-connector

The install command can be customized to include the Doc Managers and any extra dependencies for the target system.

Target System Install Command
MongoDB pip install mongo-connector
Elasticsearch 1.x pip install 'mongo-connector[elastic]'
Amazon Elasticsearch 1.x Service pip install 'mongo-connector[elastic-aws]'
Elasticsearch 2.x pip install 'mongo-connector[elastic2]'
Amazon Elasticsearch 2.x Service pip install 'mongo-connector[elastic2-aws]'
Elasticsearch 5.x pip install 'mongo-connector[elastic5]'
Solr pip install 'mongo-connector[solr]'

You may have to run pip with sudo, depending on where you're installing mongo-connector and what privileges you have.

System V Service

Mongo Connector provides support for installing and uninstalling itself as a service daemon under System V Init on Linux. Following install of the package, install or uninstall using the following command:

$ python -m mongo_connector.service.system-v [un]install

Development

You can also install the development version of mongo-connector manually:

git clone https://github.com/yougov/mongo-connector.git
pip install ./mongo-connector

Using mongo-connector

mongo-connector replicates operations from the MongoDB oplog, so a replica set must be running before startup. For development purposes, you may find it convenient to run a one-node replica set (note that this is not recommended for production):

mongod --replSet myDevReplSet

To initialize your server as a replica set, run the following command in the mongo shell:

rs.initiate()

Once the replica set is running, you may start mongo-connector. The simplest invocation resembles the following:

mongo-connector -m <mongodb server hostname>:<replica set port> \
                -t <replication endpoint URL, e.g. http://localhost:8983/solr> \
                -d <name of doc manager, e.g., solr_doc_manager>

mongo-connector has many other options besides those demonstrated above. To get a full listing with descriptions, try mongo-connector --help. You can also use mongo-connector with a configuration file.

If you want to jump-start into using mongo-connector with a another particular system, check out:

Doc Managers

Elasticsearch 1.x: https://github.com/yougov/elastic-doc-manager

Elasticsearch 2.x and 5.x: https://github.com/yougov/elastic2-doc-manager

Solr: https://github.com/yougov/solr-doc-manager

The MongoDB doc manager comes packaged with the mongo-connector project.

Troubleshooting/Questions

Having trouble with installation? Have a question about Mongo Connector? Your question or problem may be answered in the FAQ or in the wiki. If you can't find the answer to your question or problem there, feel free to open an issue on Mongo Connector's Github page.

Comments
  • Mongo connector is lag behind mongodb

    Mongo connector is lag behind mongodb

    Hi guys, We are using mongo-connector to feed data from mongodb to elasticsearch. Everything went well until today we have a big insert / update to mongodb and mongo-connector started falling behind and taking hours to catch-up. Here is the mongo-connector config:

    "mainAddress": "10.a.b.c:27017",
        "oplogFile": "/var/log/mongo-connector/oplog.timestamp",
        "noDump": false,
        "batchSize": -1,
        "verbosity": 2,
        "continueOnError": true,
    
     "docManagers": [
            {
                "docManager": "elastic2_doc_manager",
                "targetURL": "10.x.y.z:9200",
                "bulkSize": 1000,
                "__uniqueKey": "_id",
                "args": {
                    "clientOptions": {"timeout": 60}
                     }
            }
    

    Elasticsearch is setup as cluster of 3 servers. I'm looking for anyway to make mongo-connector tailing and update data faster.

    Thanks everyone.

    waiting for input 
    opened by hungvotrung 42
  • mongo-connector with SOLR keeps exiting.

    mongo-connector with SOLR keeps exiting.

    Hi,

    Im trying to use the mongo-connector to push the contents on my Mongodb (v2.6.1) into SOLR (4.8). I've configured my schema and performed a few test runs using small collections and it works perfectly. However, whenever I try to use it with large (80M items) it crashes out after a few seconds.

    This is the command I'm using:

    mongo-connector -n scans-io.certificates -m localhost:27017 -t http://localhost:8983/solr/certificates -d mongo_connector/doc_managers/solr_doc_manager.py --no-dump -v
    

    And here is what happens:

    2014-05-14 20:12:48,451 - DEBUG - "POST /solr/certificates/update/?commit=false HTTP/1.1" 200 None
    2014-05-14 20:12:48,452 - INFO - Finished 'http://localhost:8983/solr/certificates/update/?commit=false' (post) with body 'u'<add><do' in 0.003 seconds.
    2014-05-14 20:12:48,452 - DEBUG - OplogThread: Doc is processed.
    2014-05-14 20:12:48,452 - DEBUG - OplogThread: updating checkpoint afterprocessing new oplog entries
    2014-05-14 20:12:48,453 - DEBUG - OplogThread: oplog checkpoint updated to Timestamp(1400094752, 438)
    2014-05-14 20:12:48,453 - DEBUG - OplogThread: updating checkpoint after an Exception, cursor closing, or join() on thisthread.
    2014-05-14 20:12:48,453 - DEBUG - OplogThread: oplog checkpoint updated to Timestamp(1400094752, 438)
    2014-05-14 20:12:48,453 - DEBUG - OplogThread: Sleeping.  This batch I removed 0  documents and I upserted 2843 documents.
    2014-05-14 20:12:50,454 - DEBUG - OplogThread: Getting cursor
    2014-05-14 20:12:50,454 - DEBUG - OplogThread: Initializing the oplog cursor.
    2014-05-14 20:12:50,454 - DEBUG - OplogThread: reading last checkpoint as Timestamp(1400094752, 438) 
    2014-05-14 20:12:50,454 - DEBUG - OplogThread: Getting the oplog cursor and moving it to the proper place in the oplog.
    2014-05-14 20:12:50,454 - DEBUG - OplogThread: Getting the oplog cursor in the while true loop for get_oplog_cursor
    2014-05-14 20:12:50,455 - DEBUG - OplogThread: Cursor created, getting a count.
    2014-05-14 20:12:50,898 - DEBUG - OplogThread: Count is 313375
    2014-05-14 20:12:51,135 - DEBUG - OplogThread: Got the cursor, go go go!
    2014-05-14 20:12:51,135 - ERROR - OplogThread: Last entry no longer in oplog cannot recover! Collection(Database(MongoClient('localhost', 27017), u'local'), u'oplog.rs')
    2014-05-14 20:12:51,858 - ERROR - MongoConnector: OplogThread <OplogThread(Thread-2, stopped 4387258368)> unexpectedly stopped! Shutting down
    2014-05-14 20:12:51,858 - INFO - MongoConnector: Stopping all OplogThreads
    2014-05-14 20:12:51,858 - DEBUG - OplogThread: exiting due to join call.
    
    opened by carlskii 30
  • $addToSet and $pull objects being inserted into Elasticsearch

    $addToSet and $pull objects being inserted into Elasticsearch

    I'm having an issue where mongo-connector is sending entries from the oplog that include $addToSet and $pull attributes. These documents are destroying my mapping (I have custom date formats declared), and eventually lead to errors and mongo-connector crashing.

    Before I upgraded my cluster to v2.2.0 and mongo-connector to 2.3, I did not have these issues and mongo-connector was working like a champ. I'm not sure if these kinds of oplog entries were handled before, but it'd be great to get a rundown on that and if this issue is caused by the mongo-connector 2.3/elastic2_doc_manager upgrade or an issue with mongo.

    I'm currently running:

    • mongo-connector 2.3
    • elastic2-doc-manager 0.1.0
    • mongodb 2.2.7
    • elasticsearch 2.2.0

    Example of a doc containing $addToSet

    {
        "ip_data": {
            "3": {
                "web_published": false
            }
        },
        "$addToSet": {
            "ip_data": {
                "web_published": true,
                "seen_first": "2016-02-18T15:43:56.114000",
                "active_last_seen": "2016-02-18T15:43:56.114000",
                "seen_last": "2016-02-18T15:43:56.114000",
                "version": "IPv4",
                "_id": "[redacted]"
            }
        }
    }
    

    Example of a doc containing $pull

    {
        "$pull": {
            "ip_data": {
                "_id": "[redacted]"
            }
        }
    }
    
    opened by jporter-dev 26
  • `ServerSelectionTimoutError` on Remote Replica Set

    `ServerSelectionTimoutError` on Remote Replica Set

    I'm trying to connect from a remote MongoDB instance to a local Elasticsearch instance. The MongoDB instance is a replica set behind a proxy, and I'm connecting with SSL. I've tried tweaking things with varying degrees of success, the closest I've gotten is when I hit this error:

    ServerSelectionTimeoutError: <ip_address_1>:27017: timed out,<ip_address_2>:27017: timed out,<ip_address_3>:27017: timed out

    The 3 IP addresses it lists are correct for the replica set members, which indicates to me that it's actually connecting/authenticating properly, but some issue occurs afterward.

    The command I'm using looks like this: mongo-connector -c mongo-connector-config.json -m mongodb://<ip_address>:<port>/?ssl=true -t 192.168.99.100:9200 -d elastic_doc_manager --ssl-ca-certs my-cert.pem --ssl-certificate-policy optional

    The only things I'm specifying in my config are noDump: true and a few namespaces.

    waiting for input 
    opened by letsgolesco 25
  • mongo-connector stuck on fatal error

    mongo-connector stuck on fatal error

    I have 2 replicaset in 2 different environnment (dev&prod) with no issue. They run mongo-connector 2.0.3 with pyMongo 2.8. Prod is running with python 2.6, dev with 2.7.

    I tried recently to set up a 3thd env with dev data. I ran into fatal issue. This occured with pyMongo 2.0.3, 2.10 and pyMongo 2.8 and 2.8.1.

    When starting mongo-connector service, after a constant number of 6000 doc inserted into ElasticSearch. The 6001nth is raising:

    Exception in thread Thread-3:
    Traceback (most recent call last):
      File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
        self.run()
      File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 85, in wrapped
        func(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 185, in run
        for n, entry in enumerate(cursor):
      File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1090, in next
        if len(self.__data) or self._refresh():
      File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1039, in _refresh
        limit, self.__id))
      File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 958, in __send_message
        self.__compile_re)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/helpers.py", line 121, in _unpack_response
        compile_re)
      File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 537, in decode_all
        tz_aware, uuid_subtype, compile_re))
      File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 331, in _elements_to_dict
        data, position, as_class, tz_aware, uuid_subtype, compile_re)
      File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 320, in _element_to_dict
        data, position, as_class, tz_aware, uuid_subtype, compile_re)
      File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 159, in _get_object
        encoded, as_class, tz_aware, uuid_subtype, compile_re)
      File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 331, in _elements_to_dict
        data, position, as_class, tz_aware, uuid_subtype, compile_re)
      File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 320, in _element_to_dict
        data, position, as_class, tz_aware, uuid_subtype, compile_re)
      File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 232, in _get_date
        dt = EPOCH_AWARE + datetime.timedelta(seconds=seconds)
    InvalidBSON: date value out of range
    

    If I set timeZoneAware to false, this is exactly the same, expect EPOCH_AWAREis replaced by EPOCH_NAIVE.

    So you think the 6001nth record I imported is bad ? Nope. The first time I launched mongo-connector, I have already imported 180,000+ records, but the second time I juste droped the collection and re-imported all of them so that mongo-connector started to insert doc in ES at the time I start import. Same issue. Setting noDump=true had no effect. The proof the 6001nth doc is not bad ? By spliting import file into 2 parts, (1: 6000 records, 2: remaining), I got 6000 record then 100 imported before first error. 6000 and 100 where constants. By splitingn one more time (6000+6000+remaining), I was hable to import 6000+283 docs. Strange. So pyMongo was not detectin a bad BSON, there is a kind of race condition underneath.

    continueOnError and batchSize have no effect, each time I restart mongo-connector without removing oplog-timestamp, even with a dropped collection, it will try to send to ES the same docs again and again. I expected batchSize=1 to force mongo-connector to have an up to date timestamp, but this is not working. the last few records before the fatal error are always "repeated".

    The bad part is that I'm not able to launch mongo-connector only to refresh timestamp and exiting, is is crashing very fast and do not update anything. I'm still looking for a workaround : my goal start from an empty collection, start mongo-connector that will remain quiet, then starting to insert doc at a slow rate to see if it matters.

    Please upgrade pyMongo dep if it solve this solution (pyMongo 3 is supposed to have a better date support), or forward the issue to pyMongo if the latest has the same issue. Anyway, you should fix mongo-connector so that oplog-timestamp is updated to avoid doc manager insertion replay.

    opened by joe-mojo 25
  • ConnectionError when doing collection dump to Elasticsearch

    ConnectionError when doing collection dump to Elasticsearch

    Hi! Think it's needed to ajust the bulk insertion size. When the elasticsearch is left behind by too much, the bulk insertion fails. Here is the trackback:

    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 490, in do_dump
        upsert_all(dm)
      File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 474, in upsert_all
        dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts)
      File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 38, in wrapped
        reraise(new_type, exc_value, exc_tb)
      File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
        return f(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic_doc_manager.py", line 189, in bulk_upsert
        for ok, resp in responses:
      File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 118, in streaming_bulk
        raise e
    ConnectionFailed: ConnectionError(('Connection aborted.', error(111, 'Connection refused'))) caused by: ProtocolError(('Connection aborted.', error(111, 'Connection refused')))
    

    Thanks

    opened by Garito 22
  • Mongo Update - solr Error

    Mongo Update - solr Error

    On updating a document in Mongo Db I get the following error in solr adn document is not updated.I do have the _id field as unique key

    108554676 [http-bio-8080-exec-13] INFO org.apache.solr.update.processor.LogUpdateProcessor – [bookfalcons] webapp=/solr path=/update/ params={commit=true} {} 0 0 108554677 [http-bio-8080-exec-13] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: _id at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:92) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:716) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:556) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:724)

    waiting for input can't reproduce 
    opened by rohsan 22
  • Nested Documents

    Nested Documents

    How to use mongo connector to indexing fields of embedded documents for solr? For example the field StreetName of:

    {
      "_id" : ObjectId("52fa3674395d4602b8be4a1b"),
      "AddressDirectory" : {
        "Owner" : "Mayank",
        "Age" : "24",
        "Company" : "BIPL",
        "Address" : {
          "HouseNo" : "4",
          "StreetName" : "Rohini",
          "City" : "Delhi"
        }
      }
    } 
    

    The manual says the connector flattens the nested documents but in solr I cant see the flatted field StreetName.

    bug 
    opened by Crossener 22
  • Mongo connector with slor is not working , ValueError: No JSON object could be decoded

    Mongo connector with slor is not working , ValueError: No JSON object could be decoded

    I am new here for mongo connector with solr+mongo. Can you please help me here I am running command after setting up my replica set of mongo C:\Python27\Scripts>mongo-connector -m ppispcw28:27017 -n admin.servicenow -t http://estpcsw173:8983/solr/#/ -d solr_doc_manager

    Output: No handlers could be found for logger "mongo_connector.util" Traceback (most recent call last): File "C:\Python27\lib\runpy.py", line 162, in run_module_as_main "main", fname, loader, pkg_name) File "C:\Python27\lib\runpy.py", line 72, in run_code exec code in run_globals File "C:\Python27\Scripts\mongo-connector.exe__main.py", line 9, in File "C:\Python27\lib\site-packages\mongo_connector\util.py", line 85, in wrapped func(args, _kwargs) File "C:\Python27\lib\site-packages\mongo_connector\connector.py", line 998, in main conf.parse_args() File "C:\Python27\lib\site-packages\mongo_connector\config.py", line 114, in parse_args option, dict((k, values.get(k)) for k in option.cli_names)) File "C:\Python27\lib\site-packages\mongo_connector\connector.py", line 789, in apply_doc_managers dm_instances.append(module.DocManager(target_url, *kwargs)) File "C:\Python27\lib\site-packages\mongo_connector\doc_managers\solr_doc_manager.py", line 76, in init self._build_fields() File "C:\Python27\lib\site-packages\mongo_connector\util.py", line 32, in wrapped return f(args, *_kwargs) File "C:\Python27\lib\site-packages\mongo_connector\doc_managers\solr_doc_manager.py", line 95, in _build_fields result = decoder.decode(declared_fields) File "C:\Python27\lib\json\decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Python27\lib\json\decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded My primary suspect is solr config (as I am new for this too) Please help me here

    question 
    opened by coolraaj15 19
  • Keep getting this error after initial dump to elasticsearch

    Keep getting this error after initial dump to elasticsearch

    2014-08-01 14:37:54,640 - ERROR - Call to <function at 0x1fb9d70> failed too many times in retry_until_ok Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in *bootstrap_inner self.run() File "build/bdist.linux-x86_64/egg/mongo_connector/oplog_manager.py", line 141, in run cursor, cursor_len = self.init_cursor() File "build/bdist.linux-x86_64/egg/mongo_connector/oplog_manager.py", line 570, in init_cursor first_oplog_entry = retry_until_ok(lambda: cursor[0]) File "build/bdist.linux-x86_64/egg/mongo_connector/util.py", line 53, in retry_until_ok return func(_args, _kwargs) File "build/bdist.linux-x86_64/egg/mongo_connector/oplog_manager.py", line 570, in first_oplog_entry = retry_until_ok(lambda: cursor[0]) File "build/bdist.linux-x86_64/egg/pymongo/cursor.py", line 597, in __getitem raise IndexError("no such item for Cursor instance") IndexError: no such item for Cursor instance

    Once this error pops up it looks like the connector is no longer processing any commands. I'll update a field in the database and I do not see the changes in elasticsearch.

    opened by SVPD-HenryKwan 17
  • not all fields are copied from mongodb to solr after integration using mongo-connector

    not all fields are copied from mongodb to solr after integration using mongo-connector

    I am able to successfully integrate between MONGODB & SOLR, using MONGO-CONNECTOR. And whenever, I update or add any thing, in the sample collection i have created, it copies only two or three fields in a documents, and rest of the fields data are not copied into solr. This is some thing I am not able to do it.

    This is my collection and its document details. Name of collection: testdb

    document inserted as follows:

    db.testdb.insert( { ... _id: "101", ... name: "test", ... description: "descr", ... mydesc: "mydescr", ... nmdsc: "nmdsc1", ... coords: "coords1" ... })

    And the data sync between solr and mongo logs says successful: 2014-01-17 19:35:38,462 - INFO - Finished 'http://:/solr/update/?commit=true' (post) with body '' in 0.210 seconds.

    But when I execute a query to see the document data it says only these fields data:

    { "responseHeader": { "status": 0, "QTime": 0, "params": { "q": ":", "wt": "json" } }, "response": { "numFound": 1, "start": 0, "docs": [ { "id": "101", "description": "descr", "name": "test", "version": 1457486601392226300 } ] } }

    Clearly i can see that following fields & respective data are not copied into solr:

    ... mydesc: "mydescr", ... nmdsc: "nmdsc1", ... coords: "coords1"

    Following is my schema.xml:

    For some reason the schema.xml is not getting copied, it can be found in the following link, which I had raised question in stackoverflow a week ago:

    http://stackoverflow.com/questions/21188334/not-all-fields-are-copied-from-mongodb-to-solr-after-integration-using-mongo-con

    question waiting for input 
    opened by sridharrao 17
  • Windows ModuleNotFoundError: No module named 'mongo_connector.doc_managers.elastic2_doc_manager'

    Windows ModuleNotFoundError: No module named 'mongo_connector.doc_managers.elastic2_doc_manager'

    > mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager Traceback (most recent call last): File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\connector.py", line 1098, in import_dm_by_path module = import(package, fromlist=(package,)) ModuleNotFoundError: No module named 'mongo_connector.doc_managers.elastic2_doc_manager'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "c:\programdata\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\programdata\anaconda3\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\Scripts\mongo-connector.exe_main.py", line 7, in File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\util.py", line 107, in wrapped func(*args, **kwargs) File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\connector.py", line 1409, in main conf.parse_args() File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\config.py", line 124, in parse_args option.apply_function( File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\connector.py", line 1122, in apply_doc_managers DocManager = import_dm_by_name(dm["docManager"]) File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\connector.py", line 1089, in import_dm_by_name return import_dm_by_path(full_name) File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\connector.py", line 1104, in import_dm_by_path raise errors.InvalidConfiguration( mongo_connector.errors.InvalidConfiguration: Could not import mongo_connector.doc_managers.elastic2_doc_manager. It could be that this doc manager has been moved out of this project and is maintained elsewhere. Make sure that you have the doc manager installed alongside mongo-connector. Check the README for a list of available doc managers. ImportError: No module named 'mongo_connector.doc_managers.elastic2_doc_manager'

    > pip install 'mongo-connector[elastic5]' WARNING: Ignoring invalid distribution -acktrader (c:\programdata\anaconda3\lib\site-packages) ERROR: Invalid requirement: "'mongo-connector[elastic5]'" WARNING: Ignoring invalid distribution -acktrader (c:\programdata\anaconda3\lib\site-packages) WARNING: Ignoring invalid distribution -acktrader (c:\programdata\anaconda3\lib\site-packages) WARNING: Ignoring invalid distribution -acktrader (c:\programdata\anaconda3\lib\site-packages)

    My versions MongoDB: 5.0 ElasticSearch: 8.4.0 Python: 3.8.11 OS: Windows 10

    opened by mh-github 0
  • MongoDB 5.0.6 Compatibility Issues

    MongoDB 5.0.6 Compatibility Issues

    Hey,

    firstup please tell me what additional information is needed to debug this error.

    I ran the following command: mongo-connector --unique-key=id -n news-articles.articles -m localhost:27017 -t http://localhost:8983/solr/mongo_solr_collection -d solr_doc_manager

    (Yes i am calling the collection with -t because the core does not work (I get an Error 404). )

    The connector starts and logs the following:

    2022-02-15 18:23:34,584 [ALWAYS] mongo_connector.connector:50 - Python version: 3.9.2 (default, Feb 28 2021, 17:03:44)
    [GCC 10.2.1 20210110]
    2022-02-15 18:23:34,586 [ALWAYS] mongo_connector.connector:50 - Platform: Linux-5.10.0-11-amd64-x86_64-with-glibc2.31
    2022-02-15 18:23:34,587 [ALWAYS] mongo_connector.connector:50 - pymongo version: 4.0.1
    2022-02-15 18:23:34,587 [WARNING] mongo_connector.connector:170 - MongoConnector: Can't find /srv/news_crawler/oplog.timestamp, attempting to create an empty progress log
    2022-02-15 18:23:34,597 [ALWAYS] mongo_connector.connector:50 - Source MongoDB version: 5.0.6
    2022-02-15 18:23:34,597 [ALWAYS] mongo_connector.connector:50 - Target DocManager: mongo_connector.doc_managers.solr_doc_manager version: 0.1.0
    2022-02-15 18:25:33,715 [ERROR] mongo_connector.util:97 - Call to Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset='rs0'), 'news-articles'), 'collection_names') failed too many times in retry_until_ok
    Traceback (most recent call last):
      File "/srv/news_crawler/venv/lib/python3.9/site-packages/mongo_connector/util.py", line 79, in retry_until_ok
        return func(*args, **kwargs)
      File "/srv/news_crawler/venv/lib/python3.9/site-packages/pymongo/collection.py", line 2579, in __call__
        raise TypeError("'Collection' object is not callable. If you "
    TypeError: 'Collection' object is not callable. If you meant to call the 'collection_names' method on a 'Database' object it is failing because no such method exists.
    

    I don't know what to do with this. MongoDB is up and running:

    Current Mongosh Log ID:	620be3f843ef12a095c537e4
    Connecting to:		mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.1.9
    Using MongoDB:		5.0.6
    Using Mongosh:		1.1.9
    For mongosh info see: https://docs.mongodb.com/mongodb-shell/
       The server generated these startup warnings when booting:
       2022-02-15T17:19:31.349+01:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
       2022-02-15T17:19:32.383+01:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
       2022-02-15T17:19:32.384+01:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'
    Warning: Found ~/.mongorc.js, but not ~/.mongoshrc.js. ~/.mongorc.js will not be loaded.
      You may want to copy or rename ~/.mongorc.js to ~/.mongoshrc.js.
    rs0 [direct: primary] test>
    

    and the designated collection has several thousand items in it:

    rs0 [direct: primary] news-articles> db.articles.find().count()
    147203
    

    Is rs0 setup wrong?

    And solr is also up and running:

    $ curl http://localhost:8983/solr/admin/cores?action=STATUS
    {
      "responseHeader":{
        "status":0,
        "QTime":0},
      "initFailures":{},
      "status":{
        "mongo_solr_collection":{
          "name":"mongo_solr_collection",
          "instanceDir":"/srv/solr-8.11.1/server/solr/mongo_solr_collection",
          "dataDir":"/srv/solr-8.11.1/server/solr/mongo_solr_collection/data/",
          "config":"solrconfig.xml",
          "schema":"managed-schema",
          "startTime":"2022-02-15T15:09:55.153Z",
          "uptime":9084767,
          "index":{
            "numDocs":0,
            "maxDoc":0,
            "deletedDocs":0,
            "indexHeapUsageBytes":0,
            "version":2,
            "segmentCount":0,
            "current":true,
            "hasDeletions":false,
            "directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/srv/solr-8.11.1/server/solr/mongo_solr_collection/data/index [email protected]; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
            "segmentsFile":"segments_1",
            "segmentsFileSizeInBytes":69,
            "userData":{},
            "sizeInBytes":69,
            "size":"69 bytes"}}}}
    

    I really don't know what I am doing wrong. Would be great if someone could point me in the right direction. 🙏

    opened by bablf 7
  • Don't explicitly create a collection

    Don't explicitly create a collection

    Problem

    We have various mongo servers that we sync with a central server using mongo-connector with the mongo_doc_manager. They all sync into the same collection on the central server. When a new system is initialized, it has nothing in the DB and mongo-connector is running. When the first entry is made into the local DB, there is an oplog entry to create the collection. When mongo-connector tries to replay that command on the target DB, pymongo throws an exception because the collection already exists on the central server.

    Note: this problem does not present itself if mongo-connector is started (without an existing oplog.timestamp) after the first entry in the local collection has already been made.

    Solution

    This problem is easily avoided by not trying to explicitly create the collection, effectively ignoring the create entry in the oplog. This does not cause any problems because mongo creates collections automatically whenever a document is inserted into a collection that does not yet exist. The only reason to explicitly create a collection is if special options are specified as per the documentation:

    Normally collection creation is automatic. This method should only be used to specify options on creation. CollectionInvalid will be raised if the collection already exists.

    Since mongo_doc_manager does not specify any options in the create_collection() call, that call should not be made.

    opened by ndepal 0
  • exitCode 48 when running a replica set

    exitCode 48 when running a replica set

    I have Solr 8.9.0 installed on MacOS Catalina, and I’m trying to integrate it with MongoDB 5.0 Community Edition. I have installed the mongo-connector using pip3 install mongo-connector.

    When I try to run a replica set using

    mongod --replSet myDevReplSet

    I get the following:

    {"t":{"$date":"2021-10-12T21:30:25.414+01:00"},"s":"I", "c":"NETWORK", "id":4915701, "ctx":"-","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":13},"outgoing":{"minWireVersion":0,"maxWireVersion":13},"isInternalClient":true}}} {"t":{"$date":"2021-10-12T21:30:25.416+01:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"-","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"} {"t":{"$date":"2021-10-12T21:30:25.420+01:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"} {"t":{"$date":"2021-10-12T21:30:25.420+01:00"},"s":"I", "c":"NETWORK", "id":4648602, "ctx":"main","msg":"Implicit TCP FastOpen in use."} {"t":{"$date":"2021-10-12T21:30:25.421+01:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"} {"t":{"$date":"2021-10-12T21:30:25.421+01:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"} {"t":{"$date":"2021-10-12T21:30:25.421+01:00"},"s":"I", "c":"REPL", "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationDonorService","ns":"config.tenantMigrationDonors"}} {"t":{"$date":"2021-10-12T21:30:25.421+01:00"},"s":"I", "c":"REPL", "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationRecipientService","ns":"config.tenantMigrationRecipients"}} {"t":{"$date":"2021-10-12T21:30:25.421+01:00"},"s":"I", "c":"CONTROL", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":7591,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"TTL003164"}} {"t":{"$date":"2021-10-12T21:30:25.422+01:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"5.0.3","gitVersion":"657fea5a61a74d7a79df7aff8e4bcf0bc742b748","modules":[],"allocator":"system","environment":{"distarch":"x86_64","target_arch":"x86_64"}}}} {"t":{"$date":"2021-10-12T21:30:25.422+01:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Mac OS X","version":"19.6.0"}}} {"t":{"$date":"2021-10-12T21:30:25.422+01:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"replication":{"replSet":"myDevReplSet"}}}} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"E", "c":"CONTROL", "id":20568, "ctx":"initandlisten","msg":"Error setting up listener","attr":{"error":{"code":9001,"codeName":"SocketException","errmsg":"Address already in use"}}} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"REPL", "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":15000}} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"COMMAND", "id":4784901, "ctx":"initandlisten","msg":"Shutting down the MirrorMaestro"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"SHARDING", "id":4784902, "ctx":"initandlisten","msg":"Shutting down the WaitForMajorityService"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"NETWORK", "id":4784905, "ctx":"initandlisten","msg":"Shutting down the global connection pool"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"REPL", "id":4784907, "ctx":"initandlisten","msg":"Shutting down the replica set node executor"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"NETWORK", "id":4784918, "ctx":"initandlisten","msg":"Shutting down the ReplicaSetMonitor"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"SHARDING", "id":4784921, "ctx":"initandlisten","msg":"Shutting down the MigrationUtilExecutor"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"ASIO", "id":22582, "ctx":"MigrationUtil-TaskExecutor","msg":"Killing all outstanding egress activity."} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"COMMAND", "id":4784923, "ctx":"initandlisten","msg":"Shutting down the ServiceEntryPoint"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"CONTROL", "id":4784925, "ctx":"initandlisten","msg":"Shutting down free monitoring"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"CONTROL", "id":4784927, "ctx":"initandlisten","msg":"Shutting down the HealthLog"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"CONTROL", "id":4784928, "ctx":"initandlisten","msg":"Shutting down the TTL monitor"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"CONTROL", "id":4784929, "ctx":"initandlisten","msg":"Acquiring the global lock for shutdown"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"-", "id":4784931, "ctx":"initandlisten","msg":"Dropping the scope cache for shutdown"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"FTDC", "id":4784926, "ctx":"initandlisten","msg":"Shutting down full-time data capture"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"CONTROL", "id":20565, "ctx":"initandlisten","msg":"Now exiting"} {"t":{"$date":"2021-10-12T21:30:25.423+01:00"},"s":"I", "c":"CONTROL", "id":23138, "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":48}}

    If I then login to MongoDB and run rs.initiate(), I get:

    MongoServerError: This node was not started with the replSet option

    Can anyone point me in the right direction?

    Cheers!

    opened by harrycrosby 8
  • -t command not found

    -t command not found

    I'm using the default mongo_connector command:

    mongo-connector -m :
    -t <replication endpoint URL, e.g. http://localhost:8983/solr>
    -d <name of doc manager, e.g., solr_doc_manager>

    I get the error on both glitch.com and on my local machine:

    "-t: command not found"

    opened by DubiousTunic 0
Releases(2.5.1)
  • 2.5.1(Feb 27, 2017)

    We're pleased to announce the release of mongo-connector version 2.5.1!

    To install the latest version please look at the installation instructions.

    Version 2.5.1 improves testing, documentation, and fixes the following bugs:

    • Only use listDatabases when necessary.
    • Do not use the listShards command.
    • Fix PyMongo 3.0 compatibility.
    • Fixes support for MongoDB 2.4's invalid $unsets operations.
    • Set array element to null when $unset, do not remove the element completely.
    • Command line SSL options should override the config file.
    • Properly send "ssl.sslCertificatePolicy" to MongoClients.
    • Properly output log messages while configuration is parsed.
    • All source clients should inherit MongoDB URI options from the main address.
    • Do not retry operations that result in authorization failure.

    Thanks to everyone who contributed to this release:

    @shaneharvey @llovett @behackett @makhdumi @mikael-lindstrom

    Source code(tar.gz)
    Source code(zip)
  • 2.5.0(Jan 9, 2017)

    We're pleased to announce the release of mongo-connector version 2.5.0!

    To install the latest version please look at the new installation instructions.

    2.5.0 adds some new features, bug fixes, and minor breaking changes.

    New Features

    • Support for MongoDB 3.4.
    • Support including or excluding fields per namespace.
    • Support wildcards (*) in namespaces.
    • Support for including and excluding different namespaces at the same time.
    • Adds a new config file format for the 'namespaces' option.
    • Logs environment information on startup.
    • The doc managers can now be installed through extras_require with pip. See the new installation instructions.
    • mongo-connector now tests against MongoDB versions 2.4, 2.6, 3.0, 3.2, and 3.4.

    Bug Fixes

    • mongo-connector now gracefully exits on SIGTERM.
    • Improved handling of rollbacks.
    • Now handles mongos connection failure while looking for shards.
    • mongo-connector can now be canceled during the initial collection dump.
    • Improved handling of connection failure while tailing the oplog.
    • Command line doc manager specific options now override the config file.
    • Improved filtering of nested fields.

    Breaking Changes

    • The solr-doc-manager is has been extracted into a separate package. See the new installation instructions, https://github.com/mongodb-labs/solr-doc-manager, and https://pypi.python.org/pypi/solr-doc-manager.
    • Asterisks (*) in namespaces configuration are now interpreted as wildcards.

    Thanks to everyone who contributed to this release:

    @shaneharvey @llovett @behackett @sha1sum @robertaistleitner @weixili

    Source code(tar.gz)
    Source code(zip)
  • 2.4(Jun 13, 2016)

    We're pleased to announce the release of mongo-connector version 2.4! Version 2.4 of the connector introduces the --exclude-fields option and fixes a few major bugs, including:

    • Do not call count() on oplog cursors
    • Change the oplog timestamp file format to be resilient to PyMongo version and replica set failover. Warning: the format change means that a downgrade from this version is not possible! However, upgrading to the new format is as easy as upgrading mongo-connector and restarting it.

    To see all the changes that went into mongo-connector 2.4, please see the file "CHANGELOG.rst" at the root of the project.

    To upgrade to the new version, you can use pip like this:

    pip install --upgrade mongo-connector
    

    To install for the first time, run:

    pip install mongo-connector
    

    Please find comprehensive documentation for the tool in the Wiki at the project's Github page here: https://github.com/mongodb-labs/mongo-connector/wiki

    If you think you've found a bug, or if you'd like to request a new feature/improvement, please file a new issue on mongo-connector's Github page here: https://github.com/mongodb-labs/mongo-connector

    Happy connecting!

    The Python + Connectors Team

    Source code(tar.gz)
    Source code(zip)
  • 2.3.0(Mar 8, 2016)

    This release includes better error handling, test suite fixes, and support for the Elastic 2.x doc manager.

    In order to support Elastic 1.x and 2.x, the Elastic document manager has been pulled out of mongo-connector and is now located in a separate project. For more information on how to install and run the elastic doc managers, please see the Elastic doc manager documentation for the version of Elastic you prefer. These doc managers will only work with mongo-connector 2.3.0+.

    • Elastic 1.x doc manager: https://github.com/mongodb-labs/elastic-doc-manager
    • Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager

    The Solr doc manager and the MongoDB doc manager are still packaged with the mongo-connector project.

    Source code(tar.gz)
    Source code(zip)
  • 2.2.1(Mar 7, 2016)

Owner
YouGov
YouGov, Plc.
YouGov
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

PyPika - Python Query Builder Abstract What is PyPika? PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a s

KAYAK 1.9k Jan 04, 2023
Familiar asyncio ORM for python, built with relations in mind

Tortoise ORM Introduction Tortoise ORM is an easy-to-use asyncio ORM (Object Relational Mapper) inspired by Django. Tortoise ORM was build with relati

Tortoise 3.3k Dec 31, 2022
PostgreSQL database access simplified

Queries: PostgreSQL Simplified Queries is a BSD licensed opinionated wrapper of the psycopg2 library for interacting with PostgreSQL. The popular psyc

Gavin M. Roy 251 Oct 25, 2022
GINO Is Not ORM - a Python asyncio ORM on SQLAlchemy core.

GINO - GINO Is Not ORM - is a lightweight asynchronous ORM built on top of SQLAlchemy core for Python asyncio. GINO 1.0 supports only PostgreSQL with

GINO Community 2.5k Dec 27, 2022
Tool for synchronizing clickhouse clusters

clicksync Tool for synchronizing clickhouse clusters works only with partitioned MergeTree tables can sync clusters with different node number uses in

Alexander Rumyantsev 1 Nov 30, 2021
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo v3 Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The core library ha

64 Oct 14, 2022
MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl

YouGov 1.9k Jan 04, 2023
A simple python package that perform SQL Server Source Control and Auto Deployment.

deploydb Deploy your database objects automatically when the git branch is updated. Production-ready! ⚙️ Easy-to-use 🔨 Customizable 🔧 Installation I

Mert Güvençli 10 Dec 07, 2022
PyRemoteSQL is a python SQL client that allows you to connect to your remote server with phpMyAdmin installed.

PyRemoteSQL Python MySQL remote client Basically this is a python SQL client that allows you to connect to your remote server with phpMyAdmin installe

ProbablyX 3 Nov 04, 2022
A framework based on tornado for easier development, scaling up and maintenance

turbo 中文文档 Turbo is a framework for fast building web site and RESTFul api, based on tornado. Easily scale up and maintain Rapid development for RESTF

133 Dec 06, 2022
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
Sample scripts to show extracting details directly from the AIQUM database

Sample scripts to show extracting details directly from the AIQUM database

1 Nov 19, 2021
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.

Logica: language of Big Data Logica is an open source declarative logic programming language for data manipulation. Logica is a successor to Yedalog,

Evgeny Skvortsov 1.5k Dec 30, 2022
Find graph motifs using intuitive notation

d o t m o t i f Find graph motifs using intuitive notation DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like t

APL BRAIN 45 Jan 02, 2023
A collection of awesome sqlite tools, scripts, books, etc

Awesome Series @ Planet Open Data World (Countries, Cities, Codes, ...) • Football (Clubs, Players, Stadiums, ...) • SQLite (Tools, Books, Schemas, ..

Planet Open Data 205 Dec 16, 2022
A simple Python tool to transfer data from MySQL to SQLite 3.

MySQL to SQLite3 A simple Python tool to transfer data from MySQL to SQLite 3. This is the long overdue complimentary tool to my SQLite3 to MySQL. It

Klemen Tusar 126 Jan 03, 2023
Python DBAPI simplified

Facata A Python library that provides a simplified alternative to DBAPI 2. It provides a facade in front of DBAPI 2 drivers. Table of Contents Install

Tony Locke 44 Nov 17, 2021
Example Python codes that works with MySQL and Excel files (.xlsx)

Python x MySQL x Excel by Zinglecode Example Python codes that do the processes between MySQL database and Excel spreadsheet files. YouTube videos MyS

Potchara Puttawanchai 1 Feb 07, 2022
A Relational Database Management System for a miniature version of Twitter written in MySQL with CLI in python.

Mini-Twitter-Database This was done as a database design course project at Amirkabir university of technology. This is a relational database managemen

Ali 12 Nov 23, 2022
Google Cloud Client Library for Python

Google Cloud Python Client Python idiomatic clients for Google Cloud Platform services. Stability levels The development status classifier on PyPI ind

Google APIs 4.1k Jan 01, 2023