minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

Overview

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly

rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) implementation, geared towards the assembly of long and accurate reads such as PacBio HiFi.

Rationale

rust-mdbg performs mdBG construction of a 52x human genome HiFi data in around 10 minutes on 8 threads, with 10GB of maximum RAM usage.

rust-mdbg is fast because it operates in minimizer-space, meaning that the reads, the assembly graph, and the final assembly, are all represented as ordered lists of minimizers, instead of strings of nucleotides. A conversion step then yields a classical base-space representation.

Limitations

However, this high speed comes at a cost! :)

  • rust-mdbg gives good-quality results but still of lower contiguity and completeness than state-of-the-art assemblers such as HiCanu and hifiasm.
  • rust-mdbg performs best with at least 40x to 50x of coverage.
  • No polishing step is implemented; so, assemblies will have around the same accuracy as the reads.

Installation

Clone the repository (make sure you have a working Rust environment), and run

cargo build --release

For performing graph simplifications, gfatools is required.

Quick start

cargo build --release
target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example
utils/magic_simplify example

Multi-k assembly

For better contiguity, try the provided multi-k assembly script. It performs assembly iteratively, starting with k= 10, up to an automatically-determined largest k. This comes at the expense of ~7x longer running time.

utils/multik <reads.fq.gz> <some_output_prefix> <nb_threads>

Overview

rust-mdbg is a modular assembler. It consists of three components:

  1. rust-mdbg, to perform assembly in minimizer-space
  2. gfatools (external component), to perform graph simplifications
  3. to_basespace, to convert a minimizer-space assembly to base-space

For convenience, components 2 and 3 are wrapped into a script called magic_simplify.

Input

rust-mdbg takes a single FASTA/FASTQ input (gzip-compressed or not). Multi-line sequences, and sequences with lowercase characters, are not supported.

If you have seqtk installed, you can use

seqtk seq -A reads.unformatted.fq > reads.fa

to format reads accordingly.

Output data

The output of rust-mdbg consists of:

  • A .gfa file containing the minimizer-space de Bruijn graph, without sequences,
  • Several .sequences files containing the sequences of the nodes of the graph.

The executable to_basespace allows to combine both outputs and produce a .gfa file, with sequences.

Running an example

A sample set of reads is provided in the example/ folder. Run

target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example

which will create an example.gfa file.

In order to populate the .gfa file with base-space sequences and perform graph simplification, run

utils/magic_simplify example

which will create example.msimpl.gfa and example.msimpl.fa files.

Parameters

The main parameters of rust-mdbg are the k-min-mer value k, the minimizer length l, and the minimizer density d (delta in the paper). Another parameter is --presimp, set by default to 0.01, which performs a graph simplification: a neighbor node is deleted if its abundance is below 1% that of min(max(abundance of other neighbors), abundance of current node). For better results, and also without the need to set any parameter, try the multi-k strategy (see Multi-k assembly section). This section explains how parameters are set in single-k assembly.

All three parameters k, l, and d significantly impact the quality of results. One can think of them as a generalization of the k parameter in classical de Bruijn graphs. When you run rust-mdbg without specifying parameters, it sets them to:

d = 0.003

l = 12

k = 0.75 * average_readlen * d

These parameters will give reasonable, but far from optimal, draft assemblies. We experimentally found that the best results are often obtained with k values within 20-40, l within 10-14, and d within 0.001-0.005. Setting k and d such that the ratio k/d is slightly below the read length appears to be an effective strategy.

For further information on usage and parameters, run

target/release/rust-mdbg -h

for a one-line summary of each flag, or run

target/release/rust-mdbg --help

for a lengthy explanation of each flag.

Performance

Dataset Genome size (HPC) Coverage
Parameters
N50 Runtime Memory
D. melanogaster HiFi 98Mbp 100x auto
multi-k
k=35,l=12,d=0.002
2.5Mbp
2.5Mbp
6.0Mbp
2m15s
15m
1m9s
2.5GB
1.8GB
1.5GB
Strawberry HiFi 0.7Gbp 36x auto
multi-k
k=38,l=14,d=0.003
0.5Mbp
1Mbp
0.7Mbp
6m12s
40m
5m31s
12GB
11GB
10GB
H. sapiens (HG002) HiFi 2.2Gbp 52x auto
multi-k
k=21,l=14,d=0.003
1.0Mbp
16.9Mbp
13.9Mbp
27m30s
3h15m
10m23s
16.9GB
20GB
10.1GB

Runtime breakdown:
H. sapiens: 10m23s = 6m51s rust-mdbg + 1m48s gfatools + 1m44s to_basespace

The runs with custom parameters (from the paper) were made with commit b99d938, and unlike in the paper, we did not use robust minimizers which requires additional l-mer counting beforehand. For historical reasons, reads and assemblies were homopolymer-compressed in those experiments and the homopolymer-compressed genome size is reported. So beware that these numbers are not directly comparable to the output of other assemblers. In addition to the parameters shown in the table, the rust-mdbg command line also contained --bf --no-error-correct --threads 8.

Running rust-mdbg without graph simplifications

To convert an assembly to base-space without performing any graph simplifications, there are two ways:

  • with gfatools
gfatools asm -u  example.gfa > example.unitigs.gfa
target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
  • without gfatools (slower, but the code is more straightforward to understand)

utils/complete_gfa.py example.sequences example.gfa

In both cases, this will create an example.complete.gfa file that you can convert to FASTA with

bash utils/gfa2fasta.sh example.complete

License

rust-mdbg is freely available under the MIT License.

Developers

  • Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
  • Rayan Chikhi at the Department of Computational Biology at Institut Pasteur

Citation

Minimizer-space de Bruijn graphs (2021) BiorXiv

@article {mdbg,
	author = {Ekim, Bar{\i}{\c s} and Berger, Bonnie and Chikhi, Rayan},
	title = {Minimizer-space de Bruijn graphs},
	year = {2021},
	doi = {10.1101/2021.06.09.447586},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

Contact

Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.

Comments
  • m1 arm support

    m1 arm support

    Hello rust-mdbg team,

    It seems that there is no support for ARM structure yet, I have the following error when compiling on ARM64:

    The following warnings were emitted during compilation:

    warning: cc: error: unrecognized command-line option '-msse4.2' warning: cc: error: unrecognized command-line option '-maes' warning: cc: error: unrecognized command-line option '-mavx' warning: cc: error: unrecognized command-line option '-mavx2'

    error: failed to run custom build command for fasthash-sys v0.3.2

    Caused by: process didn't exit successfully: /Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-17495dcf061597dc/build-script-build (signal: 6, SIGABRT: process abort signal) --- stdout TARGET = Some("aarch64-apple-darwin") OPT_LEVEL = Some("3") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CC_aarch64-apple-darwin = None CC_aarch64_apple_darwin = None HOST_CC = None CC = None HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CFLAGS_aarch64-apple-darwin = None CFLAGS_aarch64_apple_darwin = None HOST_CFLAGS = None CFLAGS = None DEBUG = Some("false") running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" cargo:warning=cc: error: unrecognized command-line option '-msse4.2' cargo:warning=cc: error: unrecognized command-line option '-maes' cargo:warning=cc: error: unrecognized command-line option '-mavx' cargo:warning=cc: error: unrecognized command-line option '-mavx2' exit status: 1

    --- stderr thread 'main' panicked at '

    Internal error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" with args "cc" did not execute successfully (status code exit status: 1).

    ', /Users/jianshuzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.55/src/lib.rs:1672:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace fatal runtime error: failed to initiate panic, error 5 warning: build failed, waiting for other jobs to finish... error: build failed

    Any possibilities to provide support?

    Thanks,

    Jianshu

    opened by jianshu93 6
  • Recommended parameters for metagenome assembly and a related question

    Recommended parameters for metagenome assembly and a related question

    Hi,

    I want to try mdBG on real metagenome samples. I wonder if you could suggest a parameter combo to use (or combos to try out). And should I do the multi-k mode?

    For the real samples, I could crudely guess the number of species in the library, and perhaps an exaggerated total genome size from it as well. I'm not sure if these could be useful.

    Another question is: could mdBG output contig coverage estimates?

    Thank you!

    question 
    opened by xfengnefx 5
  • Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    thread 'main' panicked at 'called Result::unwrap() on an Err value: Error { kind: BufferLimit }', src/main.rs:187:33

    Reads taken from: https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-17/SRR1023860/SRR10238607.1

    Command: utils/multik <reads> <output prefix> 56

    Happens both with and without homopolymer compression.

    bug 
    opened by sebschmi 5
  • multik executes run with k < l

    multik executes run with k < l

    When assembling E.coli with the multik script, it runs mdbg with k = 10 and l = 12, resulting in mdbg panicking with "Non-ACGTN nucleotide encountered!"

    The multik script then continues silently.

    output
    thread '<unnamed>' panicked at 'Non-ACGTN nucleotide encountered!', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/nthash-0.5.0/src/lib.rs:43:9
    stack backtrace:
       0: std::panicking::begin_panic
       1: <nthash::NtHashIterator as core::iter::traits::iterator::Iterator>::next
       2: rust_mdbg::read::Read::extract
       3: rust_mdbg::main::{{closure}}
       4: rust_mdbg::main::{{closure}}
       5: <F as scoped_threadpool::FnBox>::call_box
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
       1: core::panicking::panic_fmt
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
       2: core::result::unwrap_failed
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
       3: scoped_threadpool::Pool::scoped
       4: core::ops::function::FnOnce::call_once{{vtable.shim}}
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x55e87265b2ec - std::backtrace_rs::backtrace::libunwind::trace::h09f7e4e089375279
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
       1:     0x55e87265b2ec - std::backtrace_rs::backtrace::trace_unsynchronized::h1ec96f1c7087094e
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x55e87265b2ec - std::sys_common::backtrace::_print_fmt::h317b71fc9a5cf964
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x55e87265b2ec - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he3555b48e7dfe7f0
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x55e87267d4fc - core::fmt::write::h513b07ca38f4fb1b
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17
       5:     0x55e872657995 - std::io::Write::write_fmt::haf8c932b52111354
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15
       6:     0x55e87265cec0 - std::sys_common::backtrace::_print::h195c38364780a303
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x55e87265cec0 - std::sys_common::backtrace::print::hc09dfdea923b6730
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x55e87265cec0 - std::panicking::default_hook::{{closure}}::hb2e38ec0d91046a3
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50
       9:     0x55e87265ca75 - std::panicking::default_hook::h60284635b0ad54a8
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9
      10:     0x55e87265d574 - std::panicking::rust_panic_with_hook::ha677a669fb275654
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17
      11:     0x55e87265d050 - std::panicking::begin_panic_handler::{{closure}}::h976246fb95d93c31
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
      12:     0x55e87265b794 - std::sys_common::backtrace::__rust_end_short_backtrace::h38077ee5b7b9f99a
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
      13:     0x55e87265cfb9 - rust_begin_unwind
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
      14:     0x55e872545651 - core::panicking::panic_fmt::h35f3a62252ba0fd2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
      15:     0x55e872545743 - core::result::unwrap_failed::hb53671404b9e33c2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
      16:     0x55e8725e8f9f - scoped_threadpool::Scope::join_all::hcb532061605ab1b0
      17:     0x55e87255ee33 - scoped_threadpool::Pool::scoped::hb64980f16173dad1
      18:     0x55e87255b128 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf2fa39940289df70
      19:     0x55e87255ee5e - std::sys_common::backtrace::__rust_begin_short_backtrace::h6bd664fd6d7bb829
      20:     0x55e87259a883 - core::ops::function::FnOnce::call_once{{vtable.shim}}::ha5de8d6fee3bff3e
      21:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hcbc6d2d80772be64
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      22:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h9bffa2ca65a1d6e6
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      23:     0x55e872660893 - std::sys::unix::thread::Thread::new::thread_start::ha678a8b0caec8f55
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys/unix/thread.rs:106:17
      24:     0x7f16121a96db - start_thread
                                   at /build/glibc-S9d2JN/glibc-2.27/nptl/pthread_create.c:463
      25:     0x7f161193071f - __GI___clone
                                   at /build/glibc-S9d2JN/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      26:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    625.22user 29.59system 1:35.45elapsed 685%CPU (0avgtext+0avgdata 579660maxresident)k
    
    bug 
    opened by sebschmi 5
  • Missing assembly-final.msimpl.fa in multik mode

    Missing assembly-final.msimpl.fa in multik mode

    Hello,

    Thank you for this tool.

    I ran mdbg with the following command line: multik reads.fastq.gz assembly 56 10 1000

    I get the files assembly-k*.gfa, assembly-k*.msimpl.gfa and assembly-k*.msimpl.fa with k from 10 to 1000, but I do not get the final output assembly-final.msimpl.fa.

    bug 
    opened by nadegeguiglielmoni 5
  • example.sequences file

    example.sequences file

    Sorry if I'm being slow but when I create the gfa file multiple .sequences files are created but when then in the readme to_basespace takes only a single example.sequences file. Where does this come from? Do you combine the .sequences files in some way or..?

    Thanks!

    question 
    opened by samlipworth 4
  • KSizeOutOfRange errors during rust-mdbg run

    KSizeOutOfRange errors during rust-mdbg run

    Hi there,

    Trying out multik with some Nanopore metagenomics reads (seqtk-formatted) and I'm currently getting the errors below as it iteratively goes through the different -k values. Any ideas on what might be going wrong and how I might fix it?

    So far, the run hasn't fully aborted and I'm letting it run until I get some output - will let y

    $ ../rust-mdbg/utils/multik sup.fastq.gz std_sipp 10
    avg readlen: 6147875, max k: 17521
    assembly with k=10
        Finished release [optimized] target(s) in 0.09s
         Running `sup.fastq.gz -k 10 -l 12 --density 0.003 --minabund 2 --threads 10 --prefix std_sipp-k10 --bf`
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 11 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 5 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 3 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x5591d2438050 - std::backtrace_rs::backtrace::libunwind::trace::h63b7a90188ab5fb3
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
       1:     0x5591d2438050 - std::backtrace_rs::backtrace::trace_unsynchronized::h80aefbf9b851eca7
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x5591d2438050 - std::sys_common::backtrace::_print_fmt::hbef05ae4237a4d72
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x5591d2438050 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h28abce2fdb9884c2
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x5591d245670f - core::fmt::write::h3b84512577ca38a8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/fmt/mod.rs:1092:17
       5:     0x5591d24352b2 - std::io::Write::write_fmt::h465f8feea02e2aa1
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/io/mod.rs:1572:15
       6:     0x5591d243a185 - std::sys_common::backtrace::_print::h525280ee0d29bdde
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x5591d243a185 - std::sys_common::backtrace::print::h1f0f5b9f3ef8fb78
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x5591d243a185 - std::panicking::default_hook::{{closure}}::ha5838f6faa4a5a8f
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:208:50
       9:     0x5591d2439c33 - std::panicking::default_hook::hfb9fe98acb0dcb3b
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:225:9
      10:     0x5591d243a78d - std::panicking::rust_panic_with_hook::hb89f5f19036e6af8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:591:17
      11:     0x5591d243a327 - std::panicking::begin_panic_handler::{{closure}}::h119e7951427f41da
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:497:13
      12:     0x5591d243850c - std::sys_common::backtrace::__rust_end_short_backtrace::hce386c44bf47a128
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:141:18
      13:     0x5591d243a289 - rust_begin_unwind
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
      14:     0x5591d2323341 - core::panicking::panic_fmt::h2242888e8769cd33
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
      15:     0x5591d2323233 - core::option::expect_none_failed::hb1edf11f73e63728
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:1329:5
      16:     0x5591d23c108f - scoped_threadpool::Scope::join_all::hd6132fc8a04c2f8d
      17:     0x5591d233fcbb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h198262ef865dc7ad
      18:     0x5591d2391412 - std::sys_common::backtrace::__rust_begin_short_backtrace::he7799c2fe1d42088
      19:     0x5591d234f443 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc12e7712db099355
      20:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc444a77f8dd8d825
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      21:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8b68a0a9a2093dfc
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      22:     0x5591d243d28a - std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/thread.rs:71:17
      23:     0x7f00c05ae6ba - start_thread
      24:     0x7f00bfdd751d - clone
      25:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    537.76user 35.81system 3:28.70elapsed 274%CPU (0avgtext+0avgdata 855256maxresident)k
    0inputs+1781280outputs (0major+10608356minor)pagefaults 0swaps
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.gfa -t 10,50000 -t 10,50000 -b 100000 -b 100000 -t 10,50000 -b 100000 -b 100000 -b 100000 -t 10,50000 -b 100000 -t 10,50000 -b 1000000 -t 10,150000 -b 1000000 -u
    ERROR: failed to read the graph
    Command exited with non-zero status 2
    0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 1572maxresident)k
    0inputs+0outputs (0major+68minor)pagefaults 0swaps
    + python /home/andre/rust-mdbg/utils/gfa_break_loops.py std_sipp-k10.tmp1.gfa
    + [[ ! std_sipp-k10 == *--old-behavior* ]]
    + cargo run --manifest-path /home/andre/rust-mdbg/utils/../Cargo.toml --release --bin to_basespace -- --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10
        Finished release [optimized] target(s) in 0.07s
         Running `/home/andre/rust-mdbg/target/release/to_basespace --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10`
    + mv std_sipp-k10.tmp2.gfa.complete.gfa std_sipp-k10.tmp2.gfa
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.tmp2.gfa -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u
    [M::main] Version: 0.4-r214-dirty
    [M::main] CMD: /home/andre/gfatools/gfatools asm -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u std_sipp-k10.tmp2.gfa
    [M::main] Real time: 0.000 sec; CPU: 0.000 sec
    0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1752maxresident)k
    0inputs+0outputs (0major+74minor)pagefaults 0swaps
    ++ stat -c%s std_sipp-k10.tmp2.gfa
    + filesize=9
    + ((  filesize > 100000000 ))
    + mv std_sipp-k10.tmp3.gfa std_sipp-k10.msimpl.gfa
    + [[ std_sipp-k10 != *\-\-\k\e\e\p* ]]
    + rm -rf std_sipp-k10.tmp1.gfa std_sipp-k10.tmp2.gfa
    + bash /home/andre/rust-mdbg/utils/gfa2fasta.sh std_sipp-k10.msimpl
    2.19user 0.28system 0:02.50elapsed 99%CPU (0avgtext+0avgdata 24008maxresident)k
    0inputs+8outputs (0major+9763minor)pagefaults 0swaps
    
    bug 
    opened by GeoMicroSoares 4
  • Nanopore metagenome assembly parameters

    Nanopore metagenome assembly parameters

    Hi there,

    Congratulations, this tool seems amazing and I can't wait to use it with my data! Are there specific parameters that I can use/optimize with rust-mdbg to assemble Nanopore metagenomes?

    Thanks.

    enhancement 
    opened by GeoMicroSoares 4
  • Problems in ruinning `rust-mdbg` without graph simplifications

    Problems in ruinning `rust-mdbg` without graph simplifications

    Hi,

    I've installed rust-mdbg

    git clone --recursive https://github.com/ekimb/rust-mdbg.git
    cd rust-mdbg
    cargo build --release
    

    I've run it

    ~/git/rust-mdbg/target/release/rust-mdbg ~/git/rust-mdbg/example/reads-0.00.fa.gz -k 7 --threads 1 --density 0.0008 -l 10 --minabund 2 --prefix example
    ls example*
    
    example.140646999713344.sequences  example.gfa
    

    and finally tried both approaches to go in base-space

    gfatools asm -u  example.gfa > example.unitigs.gfa
    ~/git/rust-mdbg/target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
    
    [M::main] Version: 0.5-r250-dirty
    [M::main] CMD: gfatools asm -u example.gfa
    [M::main] Real time: 0.001 sec; CPU: 0.003 sec
    Done parsing unitigs GFA, got 1 unitigs.
    Done parsing original GFA, with 0 k-min-mers.
    Done parsing .sequences file, recorded 0 sequences.
    thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/to_basespace.rs:258:55
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    and

    python3 ~/git/rust-mdbg/utils/complete_gfa.py example.*.sequences example.gfa
    
    Traceback (most recent call last):
      File "/home/guarracino/git/rust-mdbg/utils/complete_gfa.py", line 32, in <module>
        source_minims = node_minims[spl[1]]
    KeyError: '7'
    

    Am I doing silly errors somewhere?

    opened by AndreaGuarracino 3
  • Differences between uncompressed & compressed fastq files

    Differences between uncompressed & compressed fastq files

    This seems like a bug, but maybe I'm just misunderstanding something with how mdbg works.

    I discovered this after trying to run several human assemblies of varying input coverage (20x,30x,40x,50x) starting from hifi_reads.fq.gz files.

    The contiguity (n50) of all of the assemblies was in the same ballpark as the read n50 and there appeared to be no benefit to increased coverage. This coupled with the poor results in general had me scratching my head so I tried a different test dataset that was an uncompressed hifi_reads.fq and I got a great assembly.

    Curiosity piqued, I went back an unzipped the 20x coverage point I had tried earlier and got a much better assembly.

    See attached logs for logs from both the 20x assemblies starting from both hifi_reads.fq and hifi_reads.fq.gz

    Is this an actual bug, or is it just user error?

    hifi_reads_gzipped.log hifi_reads.log

    bug 
    opened by gconcepcion 3
  • magic_simplify crashes while running in Docker container on HPC cluster

    magic_simplify crashes while running in Docker container on HPC cluster

    Hey! I'm currently trying to run rust-mdbg as part of a fungi genome assembly pipeline using nextflow and docker containers on an HPC cluster and I'm running into these issues where the magic_simplify script crashes with os error 30 : read-only file system. I already checked the docker container and made sure the rust-mdbg dir is not read-only so I'm not sure what exactly is happening here. Maybe someone knows whats up? I'm using singularity to run the docker containers on the HPC cluster I just hope this is not some compatibility issue with singularity/rust..

    command.log

    bug 
    opened by fischer-hub 3
Releases(v1.0.1)
Owner
Barış Ekim
PhD student in Berger Group at @mit.
Barış Ekim
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022
Signals-backend - A suite of card games written in Python

Card game A suite of card games written in the Python language. Features coming

1 Feb 15, 2022
Some methods for comparing network representations in deep learning and neuroscience.

Generalized Shape Metrics on Neural Representations In neuroscience and in deep learning, quantifying the (dis)similarity of neural representations ac

Alex Williams 45 Dec 27, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit c

Monash Green AI Lab 51 Dec 10, 2022
Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Non-Parametric Prior Actor-Critic (N-PPAC) This repository contains the code for On Pathologies in KL-Regularized Reinforcement Learning from Expert D

Cong Lu 5 May 13, 2022
Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu.

Principled S2R Dehazing This repository contains the official implementation for PSD Framework introduced in the following paper: PSD: Principled Synt

zychen 78 Dec 30, 2022
This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Deep Virtual Markers This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21 Getting Started Get sa

KimHyomin 45 Oct 07, 2022
Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs This repository contains PyTorch implementation of our pa

Shizhe Chen 178 Dec 29, 2022
Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! 👋 This is code for "Active Testing: Sample-Efficient Model Evaluation". P

Jannik Kossen 19 Oct 30, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 05, 2023
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

🦩 Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

Phil Wang 630 Dec 28, 2022
How to Leverage Multimodal EHR Data for Better Medical Predictions?

How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat

13 Dec 13, 2022
Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

[AAAI 2021]DropLoss for Long-Tail Instance Segmentation [AAAI 2021] DropLoss for Long-Tail Instance Segmentation Ting-I Hsieh*, Esther Robb*, Hwann-Tz

Tim 37 Dec 02, 2022
Collaborative forensic timeline analysis

Timesketch Table of Contents About Timesketch Getting started Community Contributing About Timesketch Timesketch is an open-source tool for collaborat

Google 2.1k Dec 28, 2022
A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Stylegan2-Ada-Google-Colab-Starter-Notebook A no thrills colab notebook for training Stylegan2-ada on colab. transfer learning onto your own dataset h

Harnick Khera 66 Dec 16, 2022
PointPillars inference with TensorRT

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.

NVIDIA AI IOT 315 Dec 31, 2022
A deep learning tabular classification architecture inspired by TabTransformer with integrated gated multilayer perceptron.

The GatedTabTransformer. A deep learning tabular classification architecture inspired by TabTransformer with integrated gated multilayer perceptron. C

Radi Cho 60 Dec 15, 2022
Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

D-HAN The source code of D-HAN This is the source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network. However, only the co

30 Sep 22, 2022
Simple machine learning library / 簡單易用的機器學習套件

FukuML Simple machine learning library / 簡單易用的機器學習套件 Installation $ pip install FukuML Tutorial Lesson 1: Perceptron Binary Classification Learning Al

Fukuball Lin 279 Sep 15, 2022
Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021) Official Pytorch implementation of Unbiased Classification

Youngkyu 17 Jan 01, 2023