Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Last update: Dec 28, 2022

Overview

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, which allow you to try out and modify example code and analyses.

In addition to explanations of concepts, Full Spectrum Bioinformatics also includes Bioinformatics Vignettes written by readers of the text. Each vignette is focused around a particular core concept, and show how readers have applied that concepts to their research projects.

If you happen to already be familiar with GitHub and Jupyter Notebooks, you can download the entire project and run it interactively, or click the 'Open in Colab' links to open interactive versions of each section in Google Colab (you will need to 'Save as' your own copy in order to change code). You can also view a static version of each section using the nbviewer links. If using the direct GitHub links, you may sometimes get a GitHub error message. Usually hitting reload page or using the nbviewer link avoids this issue.

Lead Author: Jesse Zaneveld¹
Vignette Authors: Nia Prabhu^*¹, Aziz Bajouri^*^1,2, Ayomikun Akinrinade^{*^{1,3

* Vignette authors contributed equally and are listed in chronological order of first contribution.
1 Division of Biological Sciences, School of STEM, University of Washington, Bothell, Washington, USA
2 Division of Computer and Software Systems, School of STEM, University of Washington, Bothell, Washington, USA
3 Division of Health Studies, School of Nursing and Health Studies, University of Washington, Bothell, Washington, USA}}

The text is currently in prototype status. Chapters with content you can preview are linked below:

Chapter 1. Foreword
Chapter 2. Introduction
- The Many Paths to Bioinformatics
- Speaking Each Other's Language
  - An Absurdly Brief Introduction to Biology
  - An Absurdly Brief Introduction to Computer Science
  - An Absurdly Brief Introduction to Statistics
Chapter 3. The Command Line
- Using the Command Line
- Exercise: Little Brother is Missing
Chapter 4. Exploring Python
- Warm-up Exercise: Spot the Difference
- Exploring Python
- A Tour of Python Data Types
- A Tour of Python Syntax (functions, conditions, iteration, classes)
Chapter 5. Project Design
- Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses
Chapter 6. Biological Sequences
- An introduction to Biological Sequences
- Representing and Manipulating Biological Sequences as Python Strings
- Analyzing Biological Sequences with For Loops and If Statements
- Reading and writing FASTA files using Python
- Bioinformatics Vignette (Aziz Bajouri): Using set objects to find circular RNAs involved in multiple diseases
- Exercise: Error Bingo
- Error Messages in Python
- Bioinformatics Vignette (Nia Prabhu): Using For Loops and Dictionaries to Compare Nucleotide Composition in Pandemic and Non-Pandemic Causing Influenza Strains
- Capstone: testing for depletion of CG dinucleotides in the human genome
Chapter 7. 'Omics
- An Introduction to 'Omics
- Working with Tabular 'Omic data in Python using Pandas
- Analyzing Microbiome Alpha Diversity in Python
- Analyzing Microbiome Beta Diversity in Python
- Simulating the Effect of Sequencing Depth on Diversity Estimates
Chapter 8. Visualization
- Graphs as a Visual Language
- Exercise: Anger Tufte
- Representing Correlation
- Representing Distribution
Chapter 9. Alignment and Phylogenetics
- 9a. Alignment
- Homology and Alignment
- Global Alignment with the Needleman-Wunsch algorithm
- Local Alignment with the Smith-Waterman algorithm
- BLAST and the k-mer trick
- Exercise: Duck vs. Yeast
- 9b. Phylogenetics
- Tree thinking
- Representing Phylogenetic Trees with Python Classes
- Generating Trees Using Birth-Death Models
- Working with Traits on Trees
- Maximum Parsimony Ancestral State Reconstruction
- Hidden State Prediction
- Phylogenetic Comparative Methods
Chapter 10. Simulation
- Simulating Biological Networks
- Simulating the Population Genetics of Natural Selection and Genetic Drift
- Simulating the Evolution of Social Behavior
Chapter 11. Statistics
- Linear Models - a Statistical Swiss Army Knife
- Monte Carlo simulation and the Fundamental Unity of Statistical Hypothesis Tests
- Statistical Distributions and Parametric Tests
- Rank Transformations
- Monte Carlo simulation of Effect Size, Sample Size, and Significance
- Dealing with Multiple Comparisons
- Exercise: Revising your writing about statistical results
- An Introduction to Maximum Likelihood optimization
- The Best Model of A Cat is a Cat - model complexity, overfitting, and the AIC
- An Introduction to Bayesian Approaches
Chapter 12. Multivariate Statistics and Machine Learning
- Unsupervised Classification: of ordination, clustering and fishtanks
- Supervised Classification: from lines to trees to forests.
- Bioinformatics Vignette (Ayomikun Akinrinade): Using K-Nearest Neighbors and Binary Decision Tree Algorithms to Predict Enzyme Function from Protein Sequences
Chapter 13. Presenting Research
- Presentations as Verbal Chess
Chapter 14. Polishing and Publishing
- Presenting Research
- From Data to Conclusion: building a research manuscript brick by brick
- Resistance is Futile: becoming a language Borg
- Exercise: generating a targeted title using templating
- The Inverted Pyramid: optimizing your text from a reader's perspective
Chapter 15. Careers that draw on Bioinformatics
- Fighting for an Inclusive Workplace
  - Examining Privilege and Identity
  - Making Your Science and Teaching Accessible and Inclusive
  - Campus and Local Activism
  - Improving University Policy
- Happiness Matters
- Radical Collaboration
- Cognitive Bias and Networking
- Open-source Science as Shield and Sword
- Applying for Grants
Appendices:
- Appendix A - Data Sources for Bioinformatics Projects
- Appendix B - Timesaving Starter Code
  - Template Script with Interface and Test Code
  - IUPAC codes in python
  - Standard Translation Tables in Python
- Appendix C - Contributing a Community Example
- Appendix D - Paper Formatting Kit
- Appendix E - Project Specifications

This project is being developed with support from NSF Integrative and Organismal Systems award .

Feedback

You can submit feedback about completed chapters at the following link

Comments

Bump nokogiri from 1.10.9 to 1.11.1
Bumps nokogiri from 1.10.9 to 1.11.1.

Release notes

Sourced from nokogiri's releases.

v1.11.1 / 2021-01-06

Fixed

[CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

SHA-256 Checksums of published gems

a41091292992cb99be1b53927e1de4abe5912742ded956b0ba3383ce4f29711c nokogiri-1.11.1-arm64-darwin.gem d44fccb8475394eb71f29dfa7bb3ac32ee50795972c4557ffe54122ce486479d nokogiri-1.11.1-java.gem f760285e3db732ee0d6e06370f89407f656d5181a55329271760e82658b4c3fc nokogiri-1.11.1-x64-mingw32.gem dd48343bc4628936d371ba7256c4f74513b6fa642e553ad7401ce0d9b8d26e1f nokogiri-1.11.1-x86-linux.gem 7f49138821d714fe2c5d040dda4af24199ae207960bf6aad4a61483f896bb046 nokogiri-1.11.1-x86-mingw32.gem 5c26111f7f26831508cc5234e273afd93f43fbbfd0dcae5394490038b88d28e7 nokogiri-1.11.1-x86_64-darwin.gem c3617c0680af1dd9fda5c0fd7d72a0da68b422c0c0b4cebcd7c45ff5082ea6d2 nokogiri-1.11.1-x86_64-linux.gem 42c2a54dd3ef03ef2543177bee3b5308313214e99f0d1aa85f984324329e5caa nokogiri-1.11.1.gem

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

Linux: x86-linux and x86_64-linux -- including musl platforms like alpine

OSX/Darwin: x86_64-darwin and arm64-darwin

We'd appreciate your thoughts and feedback on this work at #2075.

Dependencies

Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

This release ends support for:

Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks @ashmaroli!)

Ruby 2.4, for which official support ended on 2020-04-05

JRuby 9.1, which is the Ruby 2.3-compatible release.

Gems

... (truncated)

Changelog

Sourced from nokogiri's changelog.

v1.11.1 / 2021-01-06

Fixed

[CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

Linux: x86-linux and x86_64-linux -- including musl platforms like alpine

OSX/Darwin: x86_64-darwin and arm64-darwin

We'd appreciate your thoughts and feedback on this work at #2075.

Dependencies

Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

This release ends support for:

Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks @ashmaroli!)

Ruby 2.4, for which official support ended on 2020-04-05

JRuby 9.1, which is the Ruby 2.3-compatible release.

Gems

Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)

[MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

Security

See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

Added

Add Node methods for manipulating "keyword attributes" (for example, class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]

... (truncated)

Commits

7be6f04 version bump to v1.11.1

aa0c399 dev: overhaul .gitignore

3d90c6d Merge pull request #2169 from sparklemotion/2168-active-support-test-failure

bbf850c changelog: update for #2168

ee69772 ci: another valgrind suppression

f9a2c4e fix: restore proper error handling in the SAX push parser

35aa88b fix(cruby): reset libxml2's error handler in sax and push parsers

07459fd fix(test): clobber libxml2's global error handler before every test

b682ac5 ci: ensure all tests are running setup

007662f github: update "installation difficulty" issue template

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 2
Missing reading response link on "Error Messages in Python"

There is no reading response link at the bottom of content/04_exploring_python/error_messages_in_python.ipynb. Additionally, on the reading response form, there is no entry for this reading.

opened by LucaOnline 1
Add discussion of HISAT2 & transcriptomics

HiSat2 https://anaconda.org/bioconda/hisat2

Salmon intro (another alternative that interoperates well with DESeq2) https://combine-lab.github.io/salmon/getting_started/

opened by zaneveld 0
Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

In addition we found a more recent study that showed that [research finding] (cite1;cite2). --> [research finding]

In a 2016 study it was shown that [finding])(cite1) --> finding

opened by zaneveld 0
More database links: https://www.cbioportal.org/ (Cancer research database) https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

Open resources shared in the 2022 AACU Talks (CUREing Cancer: How a Virtual Cancer Genomics CURE Made Research Accessible to Students During COVID and another was on Expanding Access to Undergraduate Research Through BCEENET Cures Using Digitized Collections Data) on CUREs (shared by Robin Angotti):

https://www.cbioportal.org/ (Cancer research database)
https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

opened by zaneveld 0

Releases(release-2022.3.1)

release-2022.3.1(Mar 2, 2022)

What's Changed

The 2022.3.1 Release of Full Spectrum Bioinformatics greatly expands the scope and maturity of the text, including contributions from 3 undergraduate co-authors. This text has now been used to support multiple classes, and has 35 sections that are linked from the table of content and ready for classroom use.

Here are some of the major changes:

The text has several new sections: -- An overview of python syntax now overviews how to recognize python syntax before we dive into studying the details -- A first chapter on sequence alignment now covers Needleman-Wunsch alignment, both as worked by hand using a simple example, and an implementation in numpy. -- The text now discusses linear models, with accompanying illustrations as well as figures -- An Error Bingo exercise now encourages students to intentionally trigger and learn from errors
-- An extensive section has been added discussing common errors in python, why they most commonly occur, and how to fix them.

-- 3 undergraduate contributors have added Bioinformatics Vignettes showing how to apply the principles in the text to biological problems: - Nia Prabhu (nucleotide composition) - Aziz Bajouri (set analysis) - Ayomikun Akinrinade (machine learning)

-- A section has been added on revising writing about statistical results -- An initial draft section on visualizing correlation has been added showing how a scatterplot can be revised to add linear regression results, 95% confidence intervals, and to better meet recommendations for data visualization. -- The Data Sources page has been greatly updated, and now includes logos for linked resources

New Draft Sections: -- A draft section on student activism and fighting for an inclusive workplace has been added. -- A draft section on network analysis has several in-progress code commits (not yet linked from main table of contents)

Other changes: -- Full Spectrum Bioinformatics has now adopted a code of conduct -- Many minor fixes -- Exercises have been added to many sections that previously lacked them -- The exercise on calculating CG content in the human genome has been updated -- Several chapters have been updated to include Feedback links that were previously missing -- Unused Jupyter Book files have been removed

Full Changelog: https://github.com/zaneveld/full_spectrum_bioinformatics/compare/release-2020.12.1...release-2022.3.1
Source code(tar.gz)
Source code(zip)
full_spectrum_bioinformatics_2022.3.0.zip(182.17 MB)
release-2020.12.1(Dec 8, 2020)

This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

NOTE: this is very similar to release-2020.12.0, other than minor edits to the readme but I need to re-release to trigger Zenodo to generate a DOI.
Source code(tar.gz)
Source code(zip)
release-2020.12.0(Dec 7, 2020)

This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants
Source code(tar.gz)
Source code(zip)
full_spectrum_bioinformatics.zip(84.89 MB)

Owner

Jesse Zaneveld

GitHub Repository

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention ACL2021 Findings Usage 0. Prepare environment Requirements: python==3.6 te

8 Dec 16, 2022

基于百度的语音识别，用python实现，pyaudio+pyqt

Speech-recognition 基于百度的语音识别，python3.8(conda)+pyaudio+pyqt+baidu-aip 百度有面向python

1 Jan 03, 2022

Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis 왜 한국어 감정 다중분류 모델은 거의 없는 것일까?에서 시작된 프로젝트 Environment: Pytorch, Da

3 Dec 02, 2022

Pipeline for chemical image-to-text competition

BMS-Molecular-Translation Introduction This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got

7 Sep 20, 2022

Repositório do trabalho de introdução a NLP

Trabalho da disciplina de BI NLP Repositório do trabalho da disciplina Introdução a Processamento de Linguagem Natural da pós BI-Master da PUC-RIO. Eq

1 Jan 18, 2022

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

3 May 25, 2022

Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation This is the implementaion of our paper: Bridging the

20 Dec 12, 2022

Natural Language Processing

NLP Natural Language Processing apps Multilingual_NLP.py start #This script is demonstartion of Mul

1 Oct 31, 2021

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

605 Jan 02, 2023

End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

5.9k Jan 03, 2023

Repository for the paper "Optimal Subarchitecture Extraction for BERT"

Bort Companion code for the paper "Optimal Subarchitecture Extraction for BERT." Bort is an optimal subset of architectural parameters for the BERT ar

461 Nov 21, 2022

Faster, modernized fork of the language identification tool langid.py

py3langid py3langid is a fork of the standalone language identification tool langid.py by Marco Lui. Original license: BSD-2-Clause. Fork license: BSD

12 Nov 05, 2022

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play

91 Dec 23, 2022

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 09, 2021

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Related tags

Overview

Feedback

Comments

Bump nokogiri from 1.10.9 to 1.11.1

v1.11.1 / 2021-01-06

Fixed

SHA-256 Checksums of published gems

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

Dependencies

Ruby

Gems

v1.11.1 / 2021-01-06

Fixed

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

Dependencies

Ruby

Gems

Security

Added

Missing reading response link on "Error Messages in Python"

Add discussion of HISAT2 & transcriptomics

Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

Releases(release-2022.3.1)

release-2022.3.1(Mar 2, 2022)

What's Changed

release-2020.12.1(Dec 8, 2020)

release-2020.12.0(Dec 7, 2020)