minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

Overview

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly

rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) implementation, geared towards the assembly of long and accurate reads such as PacBio HiFi.

Rationale

rust-mdbg performs mdBG construction of a 52x human genome HiFi data in around 10 minutes on 8 threads, with 10GB of maximum RAM usage.

rust-mdbg is fast because it operates in minimizer-space, meaning that the reads, the assembly graph, and the final assembly, are all represented as ordered lists of minimizers, instead of strings of nucleotides. A conversion step then yields a classical base-space representation.

Limitations

However, this high speed comes at a cost! :)

  • rust-mdbg gives good-quality results but still of lower contiguity and completeness than state-of-the-art assemblers such as HiCanu and hifiasm.
  • rust-mdbg performs best with at least 40x to 50x of coverage.
  • No polishing step is implemented; so, assemblies will have around the same accuracy as the reads.

Installation

Clone the repository (make sure you have a working Rust environment), and run

cargo build --release

For performing graph simplifications, gfatools is required.

Quick start

cargo build --release
target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example
utils/magic_simplify example

Multi-k assembly

For better contiguity, try the provided multi-k assembly script. It performs assembly iteratively, starting with k= 10, up to an automatically-determined largest k. This comes at the expense of ~7x longer running time.

utils/multik <reads.fq.gz> <some_output_prefix> <nb_threads>

Overview

rust-mdbg is a modular assembler. It consists of three components:

  1. rust-mdbg, to perform assembly in minimizer-space
  2. gfatools (external component), to perform graph simplifications
  3. to_basespace, to convert a minimizer-space assembly to base-space

For convenience, components 2 and 3 are wrapped into a script called magic_simplify.

Input

rust-mdbg takes a single FASTA/FASTQ input (gzip-compressed or not). Multi-line sequences, and sequences with lowercase characters, are not supported.

If you have seqtk installed, you can use

seqtk seq -A reads.unformatted.fq > reads.fa

to format reads accordingly.

Output data

The output of rust-mdbg consists of:

  • A .gfa file containing the minimizer-space de Bruijn graph, without sequences,
  • Several .sequences files containing the sequences of the nodes of the graph.

The executable to_basespace allows to combine both outputs and produce a .gfa file, with sequences.

Running an example

A sample set of reads is provided in the example/ folder. Run

target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example

which will create an example.gfa file.

In order to populate the .gfa file with base-space sequences and perform graph simplification, run

utils/magic_simplify example

which will create example.msimpl.gfa and example.msimpl.fa files.

Parameters

The main parameters of rust-mdbg are the k-min-mer value k, the minimizer length l, and the minimizer density d (delta in the paper). Another parameter is --presimp, set by default to 0.01, which performs a graph simplification: a neighbor node is deleted if its abundance is below 1% that of min(max(abundance of other neighbors), abundance of current node). For better results, and also without the need to set any parameter, try the multi-k strategy (see Multi-k assembly section). This section explains how parameters are set in single-k assembly.

All three parameters k, l, and d significantly impact the quality of results. One can think of them as a generalization of the k parameter in classical de Bruijn graphs. When you run rust-mdbg without specifying parameters, it sets them to:

d = 0.003

l = 12

k = 0.75 * average_readlen * d

These parameters will give reasonable, but far from optimal, draft assemblies. We experimentally found that the best results are often obtained with k values within 20-40, l within 10-14, and d within 0.001-0.005. Setting k and d such that the ratio k/d is slightly below the read length appears to be an effective strategy.

For further information on usage and parameters, run

target/release/rust-mdbg -h

for a one-line summary of each flag, or run

target/release/rust-mdbg --help

for a lengthy explanation of each flag.

Performance

Dataset Genome size (HPC) Coverage
Parameters
N50 Runtime Memory
D. melanogaster HiFi 98Mbp 100x auto
multi-k
k=35,l=12,d=0.002
2.5Mbp
2.5Mbp
6.0Mbp
2m15s
15m
1m9s
2.5GB
1.8GB
1.5GB
Strawberry HiFi 0.7Gbp 36x auto
multi-k
k=38,l=14,d=0.003
0.5Mbp
1Mbp
0.7Mbp
6m12s
40m
5m31s
12GB
11GB
10GB
H. sapiens (HG002) HiFi 2.2Gbp 52x auto
multi-k
k=21,l=14,d=0.003
1.0Mbp
16.9Mbp
13.9Mbp
27m30s
3h15m
10m23s
16.9GB
20GB
10.1GB

Runtime breakdown:
H. sapiens: 10m23s = 6m51s rust-mdbg + 1m48s gfatools + 1m44s to_basespace

The runs with custom parameters (from the paper) were made with commit b99d938, and unlike in the paper, we did not use robust minimizers which requires additional l-mer counting beforehand. For historical reasons, reads and assemblies were homopolymer-compressed in those experiments and the homopolymer-compressed genome size is reported. So beware that these numbers are not directly comparable to the output of other assemblers. In addition to the parameters shown in the table, the rust-mdbg command line also contained --bf --no-error-correct --threads 8.

Running rust-mdbg without graph simplifications

To convert an assembly to base-space without performing any graph simplifications, there are two ways:

  • with gfatools
gfatools asm -u  example.gfa > example.unitigs.gfa
target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
  • without gfatools (slower, but the code is more straightforward to understand)

utils/complete_gfa.py example.sequences example.gfa

In both cases, this will create an example.complete.gfa file that you can convert to FASTA with

bash utils/gfa2fasta.sh example.complete

License

rust-mdbg is freely available under the MIT License.

Developers

  • Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
  • Rayan Chikhi at the Department of Computational Biology at Institut Pasteur

Citation

Minimizer-space de Bruijn graphs (2021) BiorXiv

@article {mdbg,
	author = {Ekim, Bar{\i}{\c s} and Berger, Bonnie and Chikhi, Rayan},
	title = {Minimizer-space de Bruijn graphs},
	year = {2021},
	doi = {10.1101/2021.06.09.447586},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

Contact

Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.

Comments
  • m1 arm support

    m1 arm support

    Hello rust-mdbg team,

    It seems that there is no support for ARM structure yet, I have the following error when compiling on ARM64:

    The following warnings were emitted during compilation:

    warning: cc: error: unrecognized command-line option '-msse4.2' warning: cc: error: unrecognized command-line option '-maes' warning: cc: error: unrecognized command-line option '-mavx' warning: cc: error: unrecognized command-line option '-mavx2'

    error: failed to run custom build command for fasthash-sys v0.3.2

    Caused by: process didn't exit successfully: /Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-17495dcf061597dc/build-script-build (signal: 6, SIGABRT: process abort signal) --- stdout TARGET = Some("aarch64-apple-darwin") OPT_LEVEL = Some("3") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CC_aarch64-apple-darwin = None CC_aarch64_apple_darwin = None HOST_CC = None CC = None HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CFLAGS_aarch64-apple-darwin = None CFLAGS_aarch64_apple_darwin = None HOST_CFLAGS = None CFLAGS = None DEBUG = Some("false") running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" cargo:warning=cc: error: unrecognized command-line option '-msse4.2' cargo:warning=cc: error: unrecognized command-line option '-maes' cargo:warning=cc: error: unrecognized command-line option '-mavx' cargo:warning=cc: error: unrecognized command-line option '-mavx2' exit status: 1

    --- stderr thread 'main' panicked at '

    Internal error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" with args "cc" did not execute successfully (status code exit status: 1).

    ', /Users/jianshuzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.55/src/lib.rs:1672:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace fatal runtime error: failed to initiate panic, error 5 warning: build failed, waiting for other jobs to finish... error: build failed

    Any possibilities to provide support?

    Thanks,

    Jianshu

    opened by jianshu93 6
  • Recommended parameters for metagenome assembly and a related question

    Recommended parameters for metagenome assembly and a related question

    Hi,

    I want to try mdBG on real metagenome samples. I wonder if you could suggest a parameter combo to use (or combos to try out). And should I do the multi-k mode?

    For the real samples, I could crudely guess the number of species in the library, and perhaps an exaggerated total genome size from it as well. I'm not sure if these could be useful.

    Another question is: could mdBG output contig coverage estimates?

    Thank you!

    question 
    opened by xfengnefx 5
  • Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    thread 'main' panicked at 'called Result::unwrap() on an Err value: Error { kind: BufferLimit }', src/main.rs:187:33

    Reads taken from: https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-17/SRR1023860/SRR10238607.1

    Command: utils/multik <reads> <output prefix> 56

    Happens both with and without homopolymer compression.

    bug 
    opened by sebschmi 5
  • multik executes run with k < l

    multik executes run with k < l

    When assembling E.coli with the multik script, it runs mdbg with k = 10 and l = 12, resulting in mdbg panicking with "Non-ACGTN nucleotide encountered!"

    The multik script then continues silently.

    output
    thread '<unnamed>' panicked at 'Non-ACGTN nucleotide encountered!', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/nthash-0.5.0/src/lib.rs:43:9
    stack backtrace:
       0: std::panicking::begin_panic
       1: <nthash::NtHashIterator as core::iter::traits::iterator::Iterator>::next
       2: rust_mdbg::read::Read::extract
       3: rust_mdbg::main::{{closure}}
       4: rust_mdbg::main::{{closure}}
       5: <F as scoped_threadpool::FnBox>::call_box
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
       1: core::panicking::panic_fmt
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
       2: core::result::unwrap_failed
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
       3: scoped_threadpool::Pool::scoped
       4: core::ops::function::FnOnce::call_once{{vtable.shim}}
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x55e87265b2ec - std::backtrace_rs::backtrace::libunwind::trace::h09f7e4e089375279
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
       1:     0x55e87265b2ec - std::backtrace_rs::backtrace::trace_unsynchronized::h1ec96f1c7087094e
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x55e87265b2ec - std::sys_common::backtrace::_print_fmt::h317b71fc9a5cf964
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x55e87265b2ec - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he3555b48e7dfe7f0
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x55e87267d4fc - core::fmt::write::h513b07ca38f4fb1b
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17
       5:     0x55e872657995 - std::io::Write::write_fmt::haf8c932b52111354
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15
       6:     0x55e87265cec0 - std::sys_common::backtrace::_print::h195c38364780a303
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x55e87265cec0 - std::sys_common::backtrace::print::hc09dfdea923b6730
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x55e87265cec0 - std::panicking::default_hook::{{closure}}::hb2e38ec0d91046a3
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50
       9:     0x55e87265ca75 - std::panicking::default_hook::h60284635b0ad54a8
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9
      10:     0x55e87265d574 - std::panicking::rust_panic_with_hook::ha677a669fb275654
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17
      11:     0x55e87265d050 - std::panicking::begin_panic_handler::{{closure}}::h976246fb95d93c31
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
      12:     0x55e87265b794 - std::sys_common::backtrace::__rust_end_short_backtrace::h38077ee5b7b9f99a
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
      13:     0x55e87265cfb9 - rust_begin_unwind
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
      14:     0x55e872545651 - core::panicking::panic_fmt::h35f3a62252ba0fd2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
      15:     0x55e872545743 - core::result::unwrap_failed::hb53671404b9e33c2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
      16:     0x55e8725e8f9f - scoped_threadpool::Scope::join_all::hcb532061605ab1b0
      17:     0x55e87255ee33 - scoped_threadpool::Pool::scoped::hb64980f16173dad1
      18:     0x55e87255b128 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf2fa39940289df70
      19:     0x55e87255ee5e - std::sys_common::backtrace::__rust_begin_short_backtrace::h6bd664fd6d7bb829
      20:     0x55e87259a883 - core::ops::function::FnOnce::call_once{{vtable.shim}}::ha5de8d6fee3bff3e
      21:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hcbc6d2d80772be64
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      22:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h9bffa2ca65a1d6e6
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      23:     0x55e872660893 - std::sys::unix::thread::Thread::new::thread_start::ha678a8b0caec8f55
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys/unix/thread.rs:106:17
      24:     0x7f16121a96db - start_thread
                                   at /build/glibc-S9d2JN/glibc-2.27/nptl/pthread_create.c:463
      25:     0x7f161193071f - __GI___clone
                                   at /build/glibc-S9d2JN/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      26:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    625.22user 29.59system 1:35.45elapsed 685%CPU (0avgtext+0avgdata 579660maxresident)k
    
    bug 
    opened by sebschmi 5
  • Missing assembly-final.msimpl.fa in multik mode

    Missing assembly-final.msimpl.fa in multik mode

    Hello,

    Thank you for this tool.

    I ran mdbg with the following command line: multik reads.fastq.gz assembly 56 10 1000

    I get the files assembly-k*.gfa, assembly-k*.msimpl.gfa and assembly-k*.msimpl.fa with k from 10 to 1000, but I do not get the final output assembly-final.msimpl.fa.

    bug 
    opened by nadegeguiglielmoni 5
  • example.sequences file

    example.sequences file

    Sorry if I'm being slow but when I create the gfa file multiple .sequences files are created but when then in the readme to_basespace takes only a single example.sequences file. Where does this come from? Do you combine the .sequences files in some way or..?

    Thanks!

    question 
    opened by samlipworth 4
  • KSizeOutOfRange errors during rust-mdbg run

    KSizeOutOfRange errors during rust-mdbg run

    Hi there,

    Trying out multik with some Nanopore metagenomics reads (seqtk-formatted) and I'm currently getting the errors below as it iteratively goes through the different -k values. Any ideas on what might be going wrong and how I might fix it?

    So far, the run hasn't fully aborted and I'm letting it run until I get some output - will let y

    $ ../rust-mdbg/utils/multik sup.fastq.gz std_sipp 10
    avg readlen: 6147875, max k: 17521
    assembly with k=10
        Finished release [optimized] target(s) in 0.09s
         Running `sup.fastq.gz -k 10 -l 12 --density 0.003 --minabund 2 --threads 10 --prefix std_sipp-k10 --bf`
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 11 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 5 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 3 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x5591d2438050 - std::backtrace_rs::backtrace::libunwind::trace::h63b7a90188ab5fb3
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
       1:     0x5591d2438050 - std::backtrace_rs::backtrace::trace_unsynchronized::h80aefbf9b851eca7
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x5591d2438050 - std::sys_common::backtrace::_print_fmt::hbef05ae4237a4d72
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x5591d2438050 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h28abce2fdb9884c2
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x5591d245670f - core::fmt::write::h3b84512577ca38a8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/fmt/mod.rs:1092:17
       5:     0x5591d24352b2 - std::io::Write::write_fmt::h465f8feea02e2aa1
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/io/mod.rs:1572:15
       6:     0x5591d243a185 - std::sys_common::backtrace::_print::h525280ee0d29bdde
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x5591d243a185 - std::sys_common::backtrace::print::h1f0f5b9f3ef8fb78
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x5591d243a185 - std::panicking::default_hook::{{closure}}::ha5838f6faa4a5a8f
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:208:50
       9:     0x5591d2439c33 - std::panicking::default_hook::hfb9fe98acb0dcb3b
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:225:9
      10:     0x5591d243a78d - std::panicking::rust_panic_with_hook::hb89f5f19036e6af8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:591:17
      11:     0x5591d243a327 - std::panicking::begin_panic_handler::{{closure}}::h119e7951427f41da
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:497:13
      12:     0x5591d243850c - std::sys_common::backtrace::__rust_end_short_backtrace::hce386c44bf47a128
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:141:18
      13:     0x5591d243a289 - rust_begin_unwind
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
      14:     0x5591d2323341 - core::panicking::panic_fmt::h2242888e8769cd33
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
      15:     0x5591d2323233 - core::option::expect_none_failed::hb1edf11f73e63728
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:1329:5
      16:     0x5591d23c108f - scoped_threadpool::Scope::join_all::hd6132fc8a04c2f8d
      17:     0x5591d233fcbb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h198262ef865dc7ad
      18:     0x5591d2391412 - std::sys_common::backtrace::__rust_begin_short_backtrace::he7799c2fe1d42088
      19:     0x5591d234f443 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc12e7712db099355
      20:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc444a77f8dd8d825
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      21:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8b68a0a9a2093dfc
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      22:     0x5591d243d28a - std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/thread.rs:71:17
      23:     0x7f00c05ae6ba - start_thread
      24:     0x7f00bfdd751d - clone
      25:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    537.76user 35.81system 3:28.70elapsed 274%CPU (0avgtext+0avgdata 855256maxresident)k
    0inputs+1781280outputs (0major+10608356minor)pagefaults 0swaps
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.gfa -t 10,50000 -t 10,50000 -b 100000 -b 100000 -t 10,50000 -b 100000 -b 100000 -b 100000 -t 10,50000 -b 100000 -t 10,50000 -b 1000000 -t 10,150000 -b 1000000 -u
    ERROR: failed to read the graph
    Command exited with non-zero status 2
    0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 1572maxresident)k
    0inputs+0outputs (0major+68minor)pagefaults 0swaps
    + python /home/andre/rust-mdbg/utils/gfa_break_loops.py std_sipp-k10.tmp1.gfa
    + [[ ! std_sipp-k10 == *--old-behavior* ]]
    + cargo run --manifest-path /home/andre/rust-mdbg/utils/../Cargo.toml --release --bin to_basespace -- --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10
        Finished release [optimized] target(s) in 0.07s
         Running `/home/andre/rust-mdbg/target/release/to_basespace --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10`
    + mv std_sipp-k10.tmp2.gfa.complete.gfa std_sipp-k10.tmp2.gfa
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.tmp2.gfa -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u
    [M::main] Version: 0.4-r214-dirty
    [M::main] CMD: /home/andre/gfatools/gfatools asm -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u std_sipp-k10.tmp2.gfa
    [M::main] Real time: 0.000 sec; CPU: 0.000 sec
    0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1752maxresident)k
    0inputs+0outputs (0major+74minor)pagefaults 0swaps
    ++ stat -c%s std_sipp-k10.tmp2.gfa
    + filesize=9
    + ((  filesize > 100000000 ))
    + mv std_sipp-k10.tmp3.gfa std_sipp-k10.msimpl.gfa
    + [[ std_sipp-k10 != *\-\-\k\e\e\p* ]]
    + rm -rf std_sipp-k10.tmp1.gfa std_sipp-k10.tmp2.gfa
    + bash /home/andre/rust-mdbg/utils/gfa2fasta.sh std_sipp-k10.msimpl
    2.19user 0.28system 0:02.50elapsed 99%CPU (0avgtext+0avgdata 24008maxresident)k
    0inputs+8outputs (0major+9763minor)pagefaults 0swaps
    
    bug 
    opened by GeoMicroSoares 4
  • Nanopore metagenome assembly parameters

    Nanopore metagenome assembly parameters

    Hi there,

    Congratulations, this tool seems amazing and I can't wait to use it with my data! Are there specific parameters that I can use/optimize with rust-mdbg to assemble Nanopore metagenomes?

    Thanks.

    enhancement 
    opened by GeoMicroSoares 4
  • Problems in ruinning `rust-mdbg` without graph simplifications

    Problems in ruinning `rust-mdbg` without graph simplifications

    Hi,

    I've installed rust-mdbg

    git clone --recursive https://github.com/ekimb/rust-mdbg.git
    cd rust-mdbg
    cargo build --release
    

    I've run it

    ~/git/rust-mdbg/target/release/rust-mdbg ~/git/rust-mdbg/example/reads-0.00.fa.gz -k 7 --threads 1 --density 0.0008 -l 10 --minabund 2 --prefix example
    ls example*
    
    example.140646999713344.sequences  example.gfa
    

    and finally tried both approaches to go in base-space

    gfatools asm -u  example.gfa > example.unitigs.gfa
    ~/git/rust-mdbg/target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
    
    [M::main] Version: 0.5-r250-dirty
    [M::main] CMD: gfatools asm -u example.gfa
    [M::main] Real time: 0.001 sec; CPU: 0.003 sec
    Done parsing unitigs GFA, got 1 unitigs.
    Done parsing original GFA, with 0 k-min-mers.
    Done parsing .sequences file, recorded 0 sequences.
    thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/to_basespace.rs:258:55
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    and

    python3 ~/git/rust-mdbg/utils/complete_gfa.py example.*.sequences example.gfa
    
    Traceback (most recent call last):
      File "/home/guarracino/git/rust-mdbg/utils/complete_gfa.py", line 32, in <module>
        source_minims = node_minims[spl[1]]
    KeyError: '7'
    

    Am I doing silly errors somewhere?

    opened by AndreaGuarracino 3
  • Differences between uncompressed & compressed fastq files

    Differences between uncompressed & compressed fastq files

    This seems like a bug, but maybe I'm just misunderstanding something with how mdbg works.

    I discovered this after trying to run several human assemblies of varying input coverage (20x,30x,40x,50x) starting from hifi_reads.fq.gz files.

    The contiguity (n50) of all of the assemblies was in the same ballpark as the read n50 and there appeared to be no benefit to increased coverage. This coupled with the poor results in general had me scratching my head so I tried a different test dataset that was an uncompressed hifi_reads.fq and I got a great assembly.

    Curiosity piqued, I went back an unzipped the 20x coverage point I had tried earlier and got a much better assembly.

    See attached logs for logs from both the 20x assemblies starting from both hifi_reads.fq and hifi_reads.fq.gz

    Is this an actual bug, or is it just user error?

    hifi_reads_gzipped.log hifi_reads.log

    bug 
    opened by gconcepcion 3
  • magic_simplify crashes while running in Docker container on HPC cluster

    magic_simplify crashes while running in Docker container on HPC cluster

    Hey! I'm currently trying to run rust-mdbg as part of a fungi genome assembly pipeline using nextflow and docker containers on an HPC cluster and I'm running into these issues where the magic_simplify script crashes with os error 30 : read-only file system. I already checked the docker container and made sure the rust-mdbg dir is not read-only so I'm not sure what exactly is happening here. Maybe someone knows whats up? I'm using singularity to run the docker containers on the HPC cluster I just hope this is not some compatibility issue with singularity/rust..

    command.log

    bug 
    opened by fischer-hub 3
Releases(v1.0.1)
Owner
Barış Ekim
PhD student in Berger Group at @mit.
Barış Ekim
The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text"

Finnish Dialect Identification The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text". We present a te

Rootroo Ltd 2 Dec 25, 2021
State-of-the-art language models can match human performance on many tasks

Status: Archive (code is provided as-is, no updates expected) Grade School Math [Blog Post] [Paper] State-of-the-art language models can match human p

OpenAI 259 Jan 08, 2023
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
Industrial knn-based anomaly detection for images. Visit streamlit link to check out the demo.

Industrial KNN-based Anomaly Detection ⭐ Now has streamlit support! ⭐ Run $ streamlit run streamlit_app.py This repo aims to reproduce the results of

aventau 102 Dec 26, 2022
Python Implementation of Chess Playing AI with variable difficulty

Chess AI with variable difficulty level implemented using the MiniMax AB-Pruning Algorithm

Ali Imran 7 Feb 20, 2022
Artificial Neural network regression model to predict the energy output in a combined cycle power plant.

Energy_Output_Predictor Artificial Neural network regression model to predict the energy output in a combined cycle power plant. Abstract Energy outpu

1 Feb 11, 2022
This is a demo app to be used in the video streaming applications

MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks MoViDNN is an Android application that can be used to ev

ATHENA Christian Doppler (CD) Laboratory 7 Jul 21, 2022
Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

Improving-Adversarial-Transferability-of-Vision-Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Khan, Fatih Porikli arxiv link A

Muzammal Naseer 47 Dec 02, 2022
Bridging Composite and Real: Towards End-to-end Deep Image Matting

Bridging Composite and Real: Towards End-to-end Deep Image Matting Please note that the official repository of the paper Bridging Composite and Real:

Jizhizi_Li 30 Oct 31, 2022
RoMa: A lightweight library to deal with 3D rotations in PyTorch.

RoMa: A lightweight library to deal with 3D rotations in PyTorch. RoMa (which stands for Rotation Manipulation) provides differentiable mappings betwe

NAVER 90 Dec 27, 2022
Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Let's Git - Versionsverwaltung & Open Source Hausaufgabe Herzlich Willkommen zu dieser Hausaufgabe für unseren MOOC: Let's Git! Wir hoffen, dass Du vi

1 Dec 13, 2021
ComputerVision - This repository aims at realized easy network architecture

ComputerVision This repository aims at realized easy network architecture Colori

DongDong 4 Dec 14, 2022
Model Zoo of BDD100K Dataset

Model Zoo of BDD100K Dataset

ETH VIS Group 200 Dec 27, 2022
RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.

RRxIO - Robust Radar Visual/Thermal Inertial Odometry RRxIO offers robust and accurate state estimation even in challenging visual conditions. RRxIO c

Christopher Doer 64 Dec 29, 2022
BEAS: Blockchain Enabled Asynchronous & Secure Federated Machine Learning

BEAS Blockchain Enabled Asynchronous and Secure Federated Machine Learning Default Network Configuration: The default application uses the HyperLedger

Harpreet Virk 11 Nov 20, 2022
PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

PyMatting 1.4k Dec 30, 2022
End-To-End Memory Network using Tensorflow

MemN2N Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset. Get Started git clo

Dominique Luna 339 Oct 27, 2022
HyperLib: Deep learning in the Hyperbolic space

HyperLib: Deep learning in the Hyperbolic space Background This library implements common Neural Network components in the hypberbolic space (using th

105 Dec 25, 2022
pip install python-office

🍬 python for office 👉 http://www.python4office.cn/ 👈 🌎 English Documentation 📚 简介 Python-office 是一个 Python 自动化办公第三方库,能解决大部分自动化办公的问题。而且每个功能只需一行代码,

程序员晚枫 272 Dec 29, 2022
Multiwavelets-based operator model

Multiwavelet model for Operator maps Gaurav Gupta, Xiongye Xiao, and Paul Bogdan Multiwavelet-based Operator Learning for Differential Equations In Ne

Gaurav 33 Dec 04, 2022