For at give dig den bedst mulige oplevelse bruger dette websted cookies. Gennemgå vores Fortrolighedspolitik og Servicevilkår for at lære mere.
Forstået!
Cooking show host Sunny Anderson was only nineteen when she was diagnosed with ulcerative colitis, a form of Inflammatory Bowel Disease. But as Sunny puts it, "it sure as heck didn't pick the weak one!" Alongside board certified gastroenterologist Dr. Aja McCutchen, our guests discuss how this disease specifically affects women and Sunny shares her experience of living loud and proud with UC. If you've been recently diagnosed, this candid and informative discussion is an excellent starting place as you begin to navigate a life with UC. Resources : ThisIsLivingWithUC.com Note : This podcast is provided for educational purposes only and is not meant to replace discussions with a healthcare provider. Please speak with your healthcare provider regarding any health questions. The opinions expressed in this podcast are the opinions of the individuals recorded and are not necessarily opinions endorsed by Pfizer. Guests in this podcast were compensated for their time. This podcast is only intended for residents of the United States. _______________ Women's Health is your destination for compelling, stand-out series that touch on all aspects of women's health. While each series may focus on a different topic, they're united in their commitment to candid conversations that destigmatize women's health issues and highlight real life people. Learn more about each series featured below: menopause: unmuted is honest, hopeful, life-affirming, and features real women talking candidly about their menopause, and its impact on relationships, friends, family, and work. Each episode shares deeply personal accounts of the different emotional and physical symptoms experienced. Our host, leading women’s health expert Dr. Mary Jane Minkin, provides an expert perspective, busting myths and offering evidence-based information. These podcasts are not designed to provide medical advice or promote or recommend any treatment option. This podcast feed is powered by Pfizer. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.…
Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we’ve learnt over the years. If you’re student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you. The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan from the University of Oxford (UK), and Prof. Andrew Page from Theiagen Genomics (UK) and bring together years of experience in microbial bioinformatics. The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention, University of Oxford or Theiagen Genomics. Intro music : Werq - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Question and comments? microbinfie@gmail.com
Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we’ve learnt over the years. If you’re student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you. The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan from the University of Oxford (UK), and Prof. Andrew Page from Theiagen Genomics (UK) and bring together years of experience in microbial bioinformatics. The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention, University of Oxford or Theiagen Genomics. Intro music : Werq - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Question and comments? microbinfie@gmail.com
Nabil and Lee bring a guest host Clint to talk with some of the people behind Pathoplexus http://pathoplexus.org/ * Dr. Emma Hodcroft - https://pathoplexus.org/about/eb * Dr. Theo Sanderson - https://pathoplexus.org/about/development-team * Mr. Arthur Shem Kasambula - https://pathoplexus.org/about/development-team…
The crew talks about the genomeqc ecosystem. Check it out at https://happykhan.github.io/qualibact/ Errata: The project name was chaned to qualibact and is no longer genomeqc as mentioned on the episode.
In this episode of the Podcast, hosts Lee Katz and Andrew Page delve into the intricacies of clinical metagenomics. Joined by experts Torsten Seemann and Finlay Maguire, they explore the challenges and methodologies involved in identifying pathogens in clinical samples when traditional diagnostic methods fail. They discuss the types of samples analyzed, the impact of low DNA concentrations, and the complexities of sequencing data interpretation. The conversation highlights the importance of interdisciplinary collaboration between bioinformaticians and clinicians in managing difficult infections. Despite the hurdles, clinical metagenomics is presented as a complementary tool aiding in diagnosis, emphasizing the need for genomic literacy among healthcare professionals.…
On this episode we are joined by Sara Zufan to discuss her PhD journey and recent projects. The episode covers various research experiences and challenges in the field, including the rapid detection of AMR using nanopore sequencing, COVID-19 projects, hepatitis A outbreak investigations, and Lassa virus surveillance in Liberia. The guests share insights into their professional journeys, their experiences working across different continents, and the future of microbial bioinformatics research.…
In this episode of the Micro Binfie Podcast, host Andrew Page takes listeners to the heart of the microbial genomics hackathon in Bethesda, Maryland, for an engaging conversation with special guest Megan Phillips, a PhD student from Emory University. Megan delves into her research on Staphylococcus aureus (MRSA), highlighting its fascinating dual nature as both a harmless and potentially serious pathogen. Megan discusses the complexities of tetracycline resistance, particularly focusing on plasmid-mediated mechanisms involving the pt181 plasmid. She explains how this plasmid’s efflux pump, encoded by the gene tetK, contributes to variable resistance levels and the factors influencing MIC (Minimum Inhibitory Concentration) variability. Listeners will learn about the intricacies of plasmid copy numbers, their global spread across clonal complexes, and the occurrence of horizontal and vertical gene transfer. Throughout the episode, Megan shares insights on working with short-read sequencing data and the strategies she employs to detect plasmid presence using tools like BLAST. She also touches on the challenges and fascinating discoveries of tracking historical sample data and integrating findings from older research papers, showcasing her appreciation for the poetic style of scientific writing from the 1940s. For those interested in antimicrobial resistance, evolutionary microbiology, and the subtleties of bacterial genome analysis, this episode offers a compelling blend of technical details and engaging storytelling. Tune in to hear more about Megan’s upcoming publications, her experiences navigating complex genomic data, and her thoughts on antimicrobial stewardship and historical perspectives on drug resistance.…
In a two-part discussion, the hosts analyze the movie Contagion from their expert perspectives, focusing on the film's portrayal of epidemiology and genomics. They note that the movie compresses timelines for dramatic effect, speeding up the virus's spread and the response to it, and that decisions about managing a crisis are based on societal values, not just science. In part one, the hosts discuss the movie's depiction of the R0 value. They note that while this explanation is useful for the audience, it is unlikely that an epidemiologist would need to explain this concept to other epidemiologists. The group notes that the MEV1 virus is modeled on the real-life Nipah virus and comment on a scene where the genome of the virus is described as being 15 to 19 kilobases in length with 6 to 10 genes. They also discuss the movie's depiction of virus isolation and the unrealistic speed with which the initial assessment of the virus occurs. The podcasters touch upon the BSL4 lab and how the film depicts how the scientists behave. They also discuss the character of Matt Damon, who is exposed to the virus but does not get sick, and is an example of an asymptomatic carrier. In part two, the bioinformaticians examine the "genome dashboard" scene, noting the software's informative interface, which includes an alignment panel, protein structure, and a recombination map. They also discuss a scene where a phylogenetic tree is used to determine a change in the R0 value. They find this unrealistic because the tree is just a picture that doesn't accurately represent how a virus spreads. The group discusses how the bioinformatician is depicted in the film as moving in and out of the lab, which may have been realistic in 2011 but is less so in the present day. They discuss how the CDC is portrayed in the film, noting scenes that were filmed at the actual CDC, as well as their experiences at the CDC. The podcasters note the movie is very US-centric and that many international partners would be involved in solving a global pandemic.…
In this episode of the Micro Binfie podcast, host Andrew Page is joined by Nikhita Puthuveetil, Senior Bioinformatician at the American Type Culture Collection (ATCC). They delve into ATCC's ambitious project of sequencing a vast array of organisms from their renowned collection, tackling the challenges of assembling complex genomes from bacteria, viruses, fungi, and more. Discover how Nikhita and her team navigate through genomic roadblocks, leverage cutting-edge sequencing technologies, and work to ensure accurate data provenance. Whether it's large viral genomes or evolving taxonomy, this episode offers a deep dive into the fascinating world of microbial bioinformatics and genomic curation.…
In a two-part discussion, the hosts analyze the movie Contagion from their expert perspectives, focusing on the film's portrayal of epidemiology and genomics. They note that the movie compresses timelines for dramatic effect, speeding up the virus's spread and the response to it, and that decisions about managing a crisis are based on societal values, not just science. In part one, the hosts discuss the movie's depiction of the R0 value. They note that while this explanation is useful for the audience, it is unlikely that an epidemiologist would need to explain this concept to other epidemiologists. The group notes that the MEV1 virus is modeled on the real-life Nipah virus and comment on a scene where the genome of the virus is described as being 15 to 19 kilobases in length with 6 to 10 genes. They also discuss the movie's depiction of virus isolation and the unrealistic speed with which the initial assessment of the virus occurs. The podcasters touch upon the BSL4 lab and how the film depicts how the scientists behave. They also discuss the character of Matt Damon, who is exposed to the virus but does not get sick, and is an example of an asymptomatic carrier. In part two, the bioinformaticians examine the "genome dashboard" scene, noting the software's informative interface, which includes an alignment panel, protein structure, and a recombination map. They also discuss a scene where a phylogenetic tree is used to determine a change in the R0 value. They find this unrealistic because the tree is just a picture that doesn't accurately represent how a virus spreads. The group discusses how the bioinformatician is depicted in the film as moving in and out of the lab, which may have been realistic in 2011 but is less so in the present day. They discuss how the CDC is portrayed in the film, noting scenes that were filmed at the actual CDC, as well as their experiences at the CDC. The podcasters note the movie is very US-centric and that many international partners would be involved in solving a global pandemic.…
In this episode of the Micro Binfie podcast, host Andrew Page talks with Dr. Erin Young, a bioinformatician at the Utah Public Health Laboratory, recorded during the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. Erin shares her journey from researching hereditary cancer predisposition to her current role in public health bioinformatics, which she entered through a prestigious CDC and APHL fellowship. The conversation delves into her work with bacterial pathogens, particularly in tracking antimicrobial resistance in organisms like Klebsiella. Erin discusses the tools she uses for genome typing, such as MASH, FastANI, and SKA, and her innovative research on the accuracy of long-read sequencing technologies like Nanopore for detecting antimicrobial resistance genes. She also provides a preview of her upcoming poster for ASM, where she examines how Nanopore reads can be used effectively in public health microbiology. This episode offers a fascinating look at how bioinformatics and genomics are advancing the fight against infectious diseases.…
In this episode of the Micro Binfie Podcast, host Andrew Page speaks with Dr. Brooke Talbot, a recent PhD graduate from Emory University, about her research on Staphylococcus aureus, with a focus on MRSA and antibiotic resistance. Brooke shares insights into her molecular epidemiology work, discussing the complexities of tracking resistant bacterial strains in clinical settings and the significance of genomic epidemiology in public health. From honeybees to foodborne outbreaks, Brooke’s diverse research background offers listeners a fascinating journey through science, microbiology, and epidemiology.…
In this episode of the Micro Binfie podcast, host Andrew Page is live from the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. He sits down with David Mahoney, a PhD student from Dalhousie University in Halifax, Nova Scotia. David shares his research on characterizing antimicrobial resistance (AMR) genes and their transfer within metagenomes, focusing on metagenomic assembly graphs. They delve into David’s background in food safety microbiology and his interest in the public health implications of genomics. He explains his exciting work on analyzing how AMR genes transfer across different environments, such as food production plants and clinical settings, using both new and existing data from Canada’s Genomics Research and Development Initiative. David also highlights his use of innovative methods like assembly graphs and graph-based approaches to uncover AMR gene flow and lateral gene transfers, including the potential of machine learning techniques such as graph convolutional neural networks.…
In this episode of the Micro Binfie Podcast, host Andrew Page catches up with Torsten Seemann at the 10th Microbial Bioinformatics Hackathon in Bethesda, Maryland. They discuss the rapid evolution of bioinformatics, the challenges faced by labs worldwide, and the explosion of tools post-COVID. Torsten shares insights into his work at Melbourne’s Microbiological Diagnostic Unit (MDU), the development of platforms like OzTracker for bacterial genomics, and how his lab plays a national and international role in data sharing. The conversation dives into the future of the widely-used variant calling tool Snippy, as Torsten reveals exciting updates funded by the Chan Zuckerberg Initiative, including nanopore read support and the ability to process pre-assembled genomes. They also explore the importance of maintaining open-source bioinformatics tools to prevent them from becoming obsolete. Tune in for an in-depth discussion on the state of genomics, software development, and the challenges and rewards of open-source collaboration.…
In this episode of the Micro binfie Podcast, host Andrew Page sits down with Tim Dallman at the 10th Bioinformatics Hackathon in Bethesda, Maryland. Tim shares insights from his work at Utrecht University in the Netherlands, where he focuses on genomic surveillance and machine learning models to predict disease risk and severity. They discuss the challenges of integrating genomic variation into predictive models, the importance of high-quality metadata, and the complexities of working with pathogens like Shiga toxin-producing E. coli. Tim also talks about his role at the WHO Pandemic and Epidemic Intelligence Hub and how global collaboration can drive innovation in public health genomics. Tune in to hear about cutting-edge research, the importance of interdisciplinary teamwork, and how genomic data can be harnessed for future pandemic preparedness.…
Host Andrew Page is joined by Robert Petit from the Wyoming Public Health Laboratory. Robert, a key developer of the Bactopia pipeline, shares insights into how this end-to-end tool is transforming bacterial genomic surveillance. They dive into the origins of Bactopia, its applications in public health, and Robert's experience leading genomic projects in a rural setting. Discover how Bactopia streamlines pathogen detection, improves documentation, and integrates with other tools to deliver fast and accurate results. Listen in as they discuss new innovations in bioinformatics, including visualizations and human-read filtering, and explore future projects like CamelHUMP, designed to simplify sequence-based typing. Recorded live at the Microbial Bioinformatics Hackathon in Bethesda, Maryland, this episode brings you the latest in pathogen genomics and the challenges and rewards of working on the frontier of public health.…
Andrew and Lee talk with Christine and Cynney about the Haiti cholera outbreak Cynney Walters: https://www.linkedin.com/in/cynney-walters-763111190 Walters et al, "Genome sequences from a reemergence of Vibrio cholerae in Haiti, 2022 reveal relatedness to previously circulating strains" https://journals.asm.org/doi/abs/10.1128/jcm.00142-23…
Nabil and Lee have a quick chat about minimum spanning trees (MST). Eburst paper: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-152
In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics. The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources. Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions. The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases. Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types. Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics. This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide.…
We discuss GAMBIT, software for accurately classifying bacteria and eukaryotes using a targeted k-mer based approach. GAMBIT software: https://github.com/gambit-suite/gambit GAMBIT suite: https://github.com/gambit-suite GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. https://doi.org/10.1371/journal.pone.0277575 TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1198213/full…
In this episode, Andrew Page and Lee Katz continue their conversation with Titus Brown, diving deeper into his work on k-mers, Sourmash, and open source software development: Topics discussed: K-mers for analyzing sequencing data, and how Sourmash builds on MinHash How Sourmash handles k-mers for metagenomic comparisons vs. MASH The modhash and bottom sketch approaches used in Sourmash Dealing with sequencing errors and noise in k-mer data Sourmash as a reference-based method, and applications for metagenomics Titus' focus on building reusable libraries and APIs vs one-off tools Recruiting collaborators through "nerd sniping" with interesting problems The open source philosophy that motivates Titus' software work Overall, the conversation provides insight into Titus' approach to bioinformatics software through iterating quickly, focusing on usability, and building open source tools. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/…
In this final episode with Titus Brown, the conversation focuses on his work scaling metagenomic search with Sourmash: An overview of what Sourmash does - sketching and comparing large k-mer datasets How the sampling approach enables analyses like containment estimation Exciting capabilities of the Branchwater tool for multi-threaded real-time SRA search Scaling to search across millions of metagenomes in seconds with WebAssembly Potential public health applications for tracking and sourcing pathogens Important caveats around resolution limits and need for follow-up analyses Ongoing work to characterize the technique's specificity and sensitivity Overall, this episode highlights the massive scaling Sourmash enables for metagenomic search, and the potential use cases in public health, while acknowledging current limitations and uncertainties. Titus emphasizes the need to precisely convey what bioinformatic tools can and cannot do as research continues. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/…
In this episode, Andrew Page and Lee Katz continue their chat with Titus Brown, focusing on taxonomy assignment in metagenomics: Topics discussed: Dealing with contamination and low quality genomes in reference databases Sourmash as a versatile search tool, not a curated database The need for high confidence in taxonomic assignment in public health Most microbial assignment tools have low specificity or sensitivity Possible ways to achieve perfect species classification (in theory) The challenges around defining species based on small genomic differences Interesting cryptography concept of 'unicity' distance for classification Conveying the nuances and uncertainties in taxonomic assignment The conversation highlights the difficulties around taxonomic classification, especially at the species level, but explores ideas for improving accuracy. Overall it emphasizes the complexities of biology and need for transparent conveyance of uncertainties. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/…
Andrew, Nabil, and Lee react to the bioinformatics and the science overall in the 1993 film Jurassic Park. We looked at these YouTube clips: * https://youtu.be/mDTaykXudVI?si=I5aiUdBGStpIKHVC * https://youtu.be/RLz5Api676Y?si=D6nps33O42Fmk4Ac * https://youtu.be/0Nz8YrCC9X8?si=KNwkeFoS6Bu4LOpv * https://youtu.be/dxIPcbmo1_U?si=TOBw5AONYVCW0JzV * https://youtu.be/m1lc8GwBKFE?si=rj7Oiq51l2_dB6ro More information on the secret tattoo here: https://underunderstood.com/podcast/episode/jeff-goldblums-secret-tattoo-jurassic-park-ian-malcolm/…
In this episode of the Micro Binfie Podcast, Andrew Page and Lee Katz interview Titus Brown about his journey from studying math and physics as an undergrad to becoming a bioinformatician focusing on metagenomics and software development. Topics discussed: Titus' background in math, physics, digital evolution research, and developmental biology His transition into bioinformatics to analyze the influx of genomic data in the 1990s Developing early tools for comparative genomics and sequence analysis The philosophy of creating usable software with good documentation Work on transcriptomics, metagenomics, and k-mers at Michigan State Digital normalization and dealing with large sequencing datasets Moving to UC Davis and continuing work on metagenomics and software like khmer and sourmash Thoughts on challenges around data reuse and accessibility in science. Papers: Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4 Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2 IBD exploration - https://dib-lab.github.io/2021-paper-ibd/…
The podcast discusses an article co-authored by Andrew Page, examining the use of GPT-4 for research publication. The conversation focuses on the authorship of articles generated by GPT-4 and the implications for academic publishing. Authorship and Ethics: Andrew discusses the question of authorship when AI-generated content is involved in research articles. He explores the ethical implications and potential biases associated with AI-assisted writing, such as the omission of minority figures and novel discoveries. He emphasizes the importance of transparency when using AI and its potential to democratize research, as long as ethical guidelines are maintained. AI & Scientific Journals: The podcast delves into the current landscape of AI in academic publishing. It addresses the commercial use of AI in crafting manuscripts for research articles and the necessity of distinguishing between manual and AI-generated contributions. The possible misalignment of GPT-4's commercial objectives with academic goals is highlighted. Risks and Benefits: Andrew outlines the risks of using AI in publishing, such as unintentional plagiarism, biases, and outdated methods. He provides an example of bioinformatics software recommending deprecated methods, illustrating the need for caution. The conversation also touches upon the AI's potential to introduce bias unintentionally, citing past incidents where AI models quickly adopted extremist views. Andrew's co-authors, Niamh Tumelty and Sam Sheppard, bring different perspectives on ethics and the impact of AI on publishing. Niamh, associated with the London School of Economics, emphasizes ethical considerations, while Sam, editor-in-chief of Microbial Genomics, underscores the need to adapt to the reality of AI contributions in journal submissions. In conclusion, the podcast underscores the importance of recognizing and navigating the ethical challenges posed by AI in academic publishing. It suggests that the technology may evolve faster than policies can adapt, necessitating an ongoing conversation among researchers, publishers, and AI developers. Links: https://microbiologysociety.org/blog/microbe-talk-ai-a-useful-tool-or-dangerous-unstoppable-force.html https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.001049…
We continue our conversation with Wytamma Wirth about write-the and all things AI. It starts with discussing the usage of language models, specifically ChatGPT, in writing boilerplate code, and how it can assist in generating code snippets, unit tests, and even documentation strings. The participants also explore the potential of incorporating it into code editors to make coding more efficient and less error-prone. The conversation then shifts to discuss the generation of research papers, specifically software announcements, by leveraging code documentation. The participants believe ChatGPT could be useful in generating introductions and backgrounds for such publications. They also touch upon the utility of language models in translating documentation into different human languages to assist non-native English speakers. The discussion returns to code documentation, focusing on the tool "write the docs" which auto-generates well-structured and searchable documentation websites. The participants appreciate the tool's ease of use and the potential it has in maintaining proper documentation for projects. The conversation ends with an acknowledgment of the importance of human oversight in automating tasks using language models. Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/…
In this episode, we dive deep into the world of automated code documentation and conversion using ChatGPT through the write-the software developed by Dr Wytamma Wirth from The University of Melbourne. Our guest, an experienced software engineer, takes us on a journey through the challenges and nuances of writing code documentation and the role AI can play in easing this process. We explore the intersection of ChatGPT's capabilities with Write the Docs, a documentation system widely used by developers. From highlighting ChatGPT's ability to understand and generate code snippets, to demonstrating real-time code conversion across multiple programming languages, this episode is a treasure trove for developers looking to enhance their workflow. Whether you're a seasoned developer or just getting started, tune in to discover how the synergy of AI and coding can elevate your documentation game to the next level! Links: Write-the software: https://github.com/Wytamma/write-the Wytamma Wirth: https://www.wytamma.com/…
Andrew and Lee are at the Global Microbial Identifier conference (GMI13) in Vancouver Canada. We talk to Dr William Hsiao, one of the organisers of the conference.
Andrew and Lee are at the Global Microbial Identifier conference 13 in Vancouver Canada. On the first day they talked to Dr Finlay Maguire and Dr Emma Griffiths about microbial genomics and Tim Hortons.
In this episode there is a comprehensive discussion on the influence of AI, especially GPT-4, in the sphere of microbial bioinformatics. They reflect on a study testing GPT-4's problem-solving capabilities, which raises concerns about its potential impact on employment practices and academic integrity. There's speculation that AI's proficiency in tackling standard technical problems could interfere with genuinely evaluating a candidate's knowledge during interviews. Drawing parallels with calculators, the hosts deliberate on whether AI tools should be permitted during assessments. They stress the necessity for individuals to possess a deep understanding of their domain to accurately interpret and validate AI solutions. Discussing the AI's limitations, the hosts highlight its struggles with regular expressions and handling larger scripts. They observe the AI tends to loop and repeat itself, performing better with shorter scripts but faltering on more complex tasks often seen in bioinformatics. This prompts a discussion on how educators should address these developments in their teaching strategies. Moreover, the hosts explore the potential of large language models to improve base calling and read correction in sequencing, drawing on the structured and predictable nature of language and genetic code. They also discuss the idea of introducing randomness in these models to generate creative and varied solutions, potentially predicting future alleles or gene configurations. Ultimately, they express a blend of enthusiasm and apprehension towards the swift advances in this field and the ensuing implications for bioinformatics. They end on a note of anticipation for future developments, with a humorous nod towards AI's potential for automating mundane tasks like auto-correcting sample sheets. References: What Is ChatGPT Doing … and Why Does It Work? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ Many bioinformatics programming tasks can be automated with ChatGPT https://arxiv.org/ftp/arxiv/papers/2303/2303.13528.pdf ChatGPT for bioinformatics https://medium.com/@91mattmoore/chatgpt-for-bioinformatics-404c6d0817a1 Empowering Beginners in Bioinformatics with ChatGPT https://www.biorxiv.org/content/10.1101/2023.03.07.531414v1 Lawyer uses GPT and get ethics violation https://simonwillison.net/2023/May/27/lawyer-chatgpt/ Can ChatGPT solve bioinformatic problems with Python? https://dmnfarrell.github.io/bioinformatics/chatGPT-python…
In this episode of the Micro Binfie Podcast, titled "AI Unleashed: Navigating the Opportunities and Challenges of AI in Microbial Bioinformatics", Lee, Nabil, and Andrew unpack the implications of generative predictive text AI tools, notably GPT, on microbial bioinformatics. They kick off the conversation by outlining the various applications of AI tools in their work, which range from generating boilerplate programs, drafting documents, to summarizing vast tracts of data. Andrew talks about his experience with GPT in coding, specifically via VS Code and GitHub Copilot, highlighting how GPT can generate nearly 90% of the necessary code based on a brief description of the task, thereby accelerating his work. He goes on to discuss the use of GPT in clarifying lines of code and notes that they used AI to generate a paper on the ethical considerations of employing AI in microbial genomics research during a recent hackathon. The conversation then switches gears as Nabil shares his experience of using GPT to standardize date formats in tables and summarize paper abstracts. While GPT is generally accurate in performing simple tasks, he warns that the tool can sometimes provide erroneous answers. Nabil also highlights GPT's ability to generate plausible but inaccurate responses for complex prompts, as illustrated by his experience when he used it to find a route in a video game. Andrew then talks about a script they created during a hackathon, which produces podcast episodes reviewing math tools. He points out the issues encountered, such as GPT providing wrong factual information. Looking ahead, Andrew envisions a future awash with GPT-generated content that may or may not be correct, raising the challenge of discerning real and false information. However, they also acknowledge the potential benefits of AI technologies for those with visual impairments, though it's far from a perfect solution at present. The conversation veers to the use of AI tech in handling boilerplate code and generating code snippets based on predictive text. The hosts further discuss the potential for this tool in rapid language learning. A live experiment ensues where Nabil and Andrew use a Perl script and utilize GPT-4 to convert this script into Python and back again to assess its capabilities in language translation. The AI tool proves proficient, considering comments, usage, and authorship and employing popular libraries like BioPython intelligently, though it does leave a disclaimer about potential inaccuracies. They consider the possibility of using AI to optimize coding, similar to minifying JavaScript, and even the idea of iterating through multiple languages and assessing the output. Nabil initiates a simpler task for the AI, asking it to write a Python script translating DNA into protein, which then gets translated into Rust. Andrew shares his experience of using AI to generate a Python class that compares two spreadsheets using pandas, demonstrating AI's comprehension and execution of complex tasks. In summary, this episode underscores the power and potential of AI in coding and the need for human oversight to ensure the quality and effectiveness of AI-generated content. It offers a glimpse into a future where AI tools, despite their limitations, can revolutionize many aspects of programming, bringing in new efficiencies and methods of working.…
The MicroBinfie podcast discusses the top programming languages for bioinformatics. Andrew, Lee, and Nabil agree that Python is a great starting point for its consistency and rigor. Its strict syntax is ideal for teaching programming fundamentals that are essential in any language. In contrast, Perl encourages multiple ways of doing the same thing, creating confusion and difficulties in keeping track of things. The hosts caution against starting with trendy languages that are constantly changing. Instead, stick with more established languages like Python, which have established libraries and concepts that will help you advance more easily. Trendy languages come and go like changing tides, making them riskier choices. Additionally, they highlight the importance of understanding databases and their primary keys and unique fields. SQL is useful, particularly in dealing with large datasets. It is consistent across flavors and unlikely to go away soon. It takes a lot of skill to optimize queries to work in milliseconds. The hosts emphasize that the language you choose to learn depends on your individual goals and environment. For instance, Lee suggests that you should look to who is in your space and what they are using and who is willing to help you. Once you understand the programming concepts, it is easier to transfer them to other languages, and it is just a question of understanding the syntax. Andrew, Lee, and Nabil also discuss their own trajectories of learning programming languages, revealing that it takes a long time to become an expert in a language, and it is something that needs to be appreciated. They highlight the difference between just learning the basics of a language and really getting into the depths of it and the frameworks and libraries. The hosts also mention languages that are important to pick up, like SQL and bash scripting, and languages that are popular for web development, like JavaScript. However, they caution that JavaScript and Java are not the same thing and that JavaScript has a reputation for being a weird language. When asked what language they would choose for a task, Nabil says he would use Perl, Lee mentions R for stats, while Andrew admits that he has to relearn R every time he comes back to it and therefore prefers Perl for quick scripts. They also discuss their love-hate relationship with R, mentioning that while it has useful libraries like GGplot and GGtree, its syntax is difficult to work with and has separate paradigms of approaching the same problem. The hosts conclude by acknowledging that there is no one-size-fits-all approach to learning programming languages. One should choose based on their goals, environment, and personal preferences. Python is a useful language to learn, even if one is not interested in bioinformatics. Additionally, they note that the fundamentals of databases and how they work are crucial to understand and utilized across fields.…
We are back talking about systematics, and SeqCode; a nomenclatural code for prokaryotes described from sequence data. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center (DiSC). Link to paper: https://www.nature.com/articles/s41564-022-01214-9 History paper: https://www.sciencedirect.com/science/article/pii/S0723202022000121 They discussed the SeqCode, a nomenclature code for Prokaryotes described from sequence data. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer explained that the impetus for the SeqCode was the need to accommodate previously uncultivated organisms under a specific nomenclature code. She emphasized that the SeqCode was written to allow any peer-reviewed publication, but noted that the authors have designed three paths of validation in the SeqCode. They hope that anyone proposing a name will work with the curriculum team to ensure the best quality descriptions, names, etymology, and solidification. Rodriguez discussed the SeqCode's governance, which is already in place, and they have made them public so that anyone interested can join the SeqCode community. The governance structure comprises an executive board, committees, and working groups. The position's co-opted members hold some of the committees of these committees, while some are chosen by ballot. The hosts sought to clarify the relationship between the Isme Society, which is backing the SeqCode, and the wider field in general. Rodriguez explained that ISME is simply providing support as an umbrella organization for the SeqCode. Palmer and Rodriguez clarified that the SeqCode is not a competing code but rather a parallel one that aims to accommodate previously uncultivated organisms. The SeqCode was created to provide a specific nomenclature code for previously uncultivated organisms. Palmer noted that most scientists culture prokaryotes not for naming but to advance their knowledge of these organisms through physiology experiments. They emphasized that the new system is the result of a long collaborative effort that involved many different viewpoints and philosophies. The episode also discussed the practical requirements for naming under the new system, which include standards for the completeness and contamination levels required in the genome sequence data. Palmer noted that while the 16S rRNA gene sequence was not required for naming, it was recommended for improved accuracy in cross-talk between different taxonomies. The conversation highlighted the importance and challenges of naming microorganisms and the ongoing efforts to create a system that is inclusive of all microorganisms, both cultivated and uncultivated. Rodriguez and Palmer also discussed the SeqCode, a nature code for naming prokaryotes described from sequence data. They agreed that high-quality genomes should be the main control types to ensure the system builds up rather than breaks down. They noted the challenge of obtaining full genomes of some organisms, such as obligate intracellular parasites but suggested obtaining housekeeping genes as a potential solution. They further explained the technical issue of estimating completeness or contamination for many taxa, but Palmer confirmed that registering a name on the SeqCode registry requires adding such estimates. It emphasized the importance of collaboration within the scientific community and the need to create a system that is inclusive of all microorganisms. It also highlighted the challenges inherent in the process of naming microorganisms but demonstrated that it is an ongoing process, and that scientists are working to create a system that is accurate, practical, and beneficial for all.…
Today we are talking about systematics, and specifically SeqCode; a nomenclatural code for prokaryotes described from sequence data. Joining us to talk about it are co-authors on the recent publication. Marike Palmer and Miguel Rodriguez. Marike Palmer is a Postdoctoral researcher in the School of Life Sciences at the University of Nevada Las Vegas and Miguel Rodriguez is an Assistant Professor of Bioinformatics at the University of Innsbruck in the departments of Microbiology and the Digital Science Center(DiSC).…
An honest discussion about the up and downsides of doing a postdoc in front of an audience of first year PhD students. Guests Dr Emma Waters, Dr Heather Felgate and Dr Muhammad Yasir are joined by Dr Andrew Page. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK. Emma starts the conversation by sharing that she enjoys research and solving problems with different tools. The thrill of discovery and exploration that comes with the postdoc position is something she loves. Heather echoes Emma's thoughts and believes that she is happy where she is, rather than chasing after a higher paying job in the industry. She appreciates the flexibility that academia offers, which has enabled her to balance her family and personal life. The conversation takes a turn when PhD students ask if any of the postdocs regret the decision of choosing academia despite the evident pay gap between the industry and academia. Emma points out that although she may have earned more in the industry, she is happy where she is, and finds satisfaction in helping people through her work. Chasing profits in the industry would not offer her that kind of gratification. Yasir shares his success story of sequencing 600 samples of the SAR-CoV-2 virus in Pakistan, and how it contributed towards the fight against the pandemic. He credits the freedom and flexibility of academia that allows him to collaborate with colleagues from all over the world. In conclusion, Andrew advises students to explore their options and to keep their careers open-ended. He suggests that if they are after a higher paycheck, they should consider the bioinformatics data science path that offers more earning opportunities in the industry. The postdocs stress the importance of following what makes one happy in life, rather than chasing big salaries.…
This is a panel discussion on mobile genetic elements, guest chaired by Dr Muhammad Yasir with guests Dr Emma Waters, Dr Heather Felgate and Dr Andrew Page. We cover AMR, Salmonella Typhi and Staphylococci and outbreaks and the role of MGEs. It was recorded in front of a live audience of PhD students at the Microbes, Microbiomes and Bioinformatics doctoral training program in the Quadram Institute in Norwich UK.…
We talk about KRAKEN the taxonomic classification software and the software suite around it and are joined by Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology. Dr. Jennifer Lu and Natalia Rincon from the Kraken software development team were interviewed on the MicroBinfie podcast. They discussed the various versions of Kraken and the tools developed around it. They began by explaining the original Kraken, which uses an exact camera matching process and a camera size of 31 based on jellyfish. Kraken Unique is an additional version of Kraken that includes an additional column called unique camera counting, which determines how many unique cameras are covered by each read, providing an additional way to verify microbial identification. Kraken two was developed to accommodate larger databases by using a probabilistic data structure and minimizers to map cameras to a shorter sequence size. They then talked about how Kraken is useful for microbiome analysis, including detecting pathogens. However, the accuracy of the results depends heavily on the availability of genomic data in the database, which emphasizes bacterial and viral data. For infectious pathogen detection, Kraken one unique is combined with Bracken to approximate the abundance of species present. The developers emphasized the importance of users being aware of available genomic data in the database because the results can only be as accurate as the data. They also talked about how Kraken is used widely in bioinformatics and can be used for various scenarios beyond metagenomics. For example, they use Kraken to treat a single genome as a metagenome as part of quality control analysis. In cases where there are conflicting taxa in the reads, Kraken results show it, making it useful in determining the presence of contamination in samples. The Kraken team also talked about how they use Kraken for contamination work to detect contamination in pathogen genomes. They compare all eukaryotic pathogen genomes against bacteria, human genomes, and databases of vertebrates and plants to filter out any contaminants. They have found in some instances where contaminating sequences from hosts such as chicken or cow were present in eukaryotic pathogen genomes. Moving forward, the Kraken team intends to maintain all Kraken repositories, enhance its accuracy, speed, and usefulness, and develop new scripts and downstream analysis for the Kraken Tools suite. They acknowledge the need to make the database smaller as more genomes become available and are exploring ways of indexing and sketching to achieve this. In conclusion, Kraken has been an essential software for metagenomic analysis, and it remains a continually improving tool for pathogen detection and classification. The Kraken team advises users to keep in mind the importance of accurate data for effective pathogen detection and classification.…
We are talking about KRAKEN - the taxonomic classification software and in the hot seat are Dr Jennifer Lu and Natalia Rincon from Johns Hopkins University Center for Computational Biology. The MicroBinfie podcast welcomed Dr. Jennifer Lu and Natalia Rincon to discuss Kraken, a taxonomic classification software. Developed in 2013-2014, Kraken easily identifies and assigns sequencing reads to a specific species, genus, or general bacteria. Its efficiency in classifying millions or billions of reads puts it ahead of other classification methods such as Melan, Mega Blast, and Chime. The tool is known for its ease of use and accuracy. Following the success of Kraken's metagenomic analysis, Florian Breitweiser developed Kraken Unique, which provides more information than the standard Kraken. C Another edition to the Kraken family is Bracken, developed by Jennifer Lu, which estimates abundance, and Nat Rincon contributes to the newest editions, which analyze diversity metrics. Kraken's exact camera matching technology identifies reads and classifies taxonomy IDs, with two outputs: a long text file for every read and a Kraken report that provides a breakdown of reads for each taxonomy ID. The interpretation of the Kraken report relies on the sample and its taxon. Even if there are few reads available, taxons can still be meaningful. For beginners, Kraken simplifies the classification process by providing pre-built databases. There was an interesting discussion about the origin of the Kraken name. It is derived from a mythological creature that relied on Jellyfish, a camera counting tool used to build the Kraken databases. Derek Wood developed the original concept of Kraken. The hosts found a true pathogen in a sample, which was significant for downstream analysis. The number of reads in some samples was very few, and some unclassified reads could also be uninformative or indicate contamination. Being developed for Illumina reads, Kraken's accuracy in classifying Nanopore reads is likely to be affected due to the higher error rate. The Kraken database achieves exact matching of k-mers and fits all genome information into a small space. Tools spawned out of the Kraken world are widely used due to their high accuracy, speed, and simplicity in the classification of taxonomy. Kraken provides an additional column in the report to count the number of unique k-mers to validate the results. The developers worked closely with others to test new Nanopore chemistries due to the frequent changes in the chemistry that affected the accuracy of the reads. Kraken databases contain vector sequence information, and vectors are given their taxonomy ID as "synthetic sequences." The software mixes Pearl and C++, with Pearl processing inputs and C++ managing heavy memory stuff by building and compacting sequences and writing bytes. Dr. Jennifer Lu appreciates the simplicity and accuracy of the classification algorithm, and Nat Rincon takes pride in being part of the Kraken community.…
Ed Feil is a professor of bacterial evolution at the University of Bath, and Natacha Couto, a data scientist at the Center of Genomic Pathogen Surveillance at the University of Oxford. We delve into the concept of multi-locus sequence typing (MLST) in bacterial population genetics. They highlight how the MLST method allows for defining strains based on partial sequences that range up to 500 base pairs. The method measures differences between loci for each strain, offering an allele number while assigning similar numbers to identical sequences. The cumulative sequence number represents the unique identification, which is subsequently referred to as the sequence type (SST). MLST has revolutionized the field by facilitating digital storage and comparison of epidemiological databases, proving particularly useful in investigating transmission events and dissemination of certain strains. Although there are other methods such as Pulse Field Gen Electrophoresis (PFGE) that offer higher resolution when looking for similarities between different strains, MLST remains a versatile and widely used method. They also talk about the shortcomings of MLST and the need for continued improvements in population genetics research. They mention the development of the Eburst program, which uses a circular model, rather than the traditional dendrogram tree structure, to better visualize MLST data and understand the clonal expansion of populations. They also discuss how the original MLST schemes may not have included the best genes for all bacterial species as the genes were chosen before genome sequencing became widely available. Ed and Natacha further elaborate on the concept of clonality among bacterial species. Ed suggests that bacterial population structures have no consistent pattern, with some organisms being well-behaved, while others have a lot of allele shuffling. However, clones have existed since day one, and their presence is still seen today. Natacha adds that although MLST has flaws, it leaves behind the nomenclature for the lineages or clones, which is a lasting legacy. Nabil-Fareed notes that while most reference labs have moved on to genomics, some people still use MLST. He adds that the pipeline is the same for any organism, and the process is efficient in the end. The discussion concludes with the hosts thanking the guests and promising more exciting topics in their next episode. Overall, the hosts highlight the significance of understanding the limitations of MLST and the scope for further research in bacterial population genetics.…
The hosts of the MicroBinfie podcast invite Dr Natacha Couto (University of Oxford) and Professor Ed Feil (University of Bath) as special guests to discuss the concept of "One Health". One Health is a comprehensive approach that seeks to manage the problem of antimicrobial resistance (AMR) by addressing the use of antibiotics in healthcare, agriculture, and the environment. It aims to improve health outcomes across all sectors to create a better planet. However, the diagrams often used to represent One Health are misleading as they do not take into account the complexity of the transmission of AMR. Therefore, there is a need for a quantitative study to understand and identify the ecological and biological barriers to AMR transmission. Visual aids such as these diagrams are not always accurate and should be approached with caution; scientists should be mindful of the implicit confirmation bias in visually-appealing graphics. AMR determinants are found in various settings, including animals, the environment, and humans, due to the derived nature of most antibiotics from natural compounds on Earth. Studies have shown that the presence of AMR determinants is not limited to hospitals; they can be found in the environment and surroundings of hospitals. However, they caution that sampling methods can skew results, and it is essential to use a quantitative approach to understand the transmission of AMR across different sectors. The One Health approach requires understanding the drivers of resistance and virulence and looking beyond human pathogens. Plants, insects, and animals form part of the broader virome and represent systems that are harder to study. There is no clear answer on where to focus resources as both resistant and commensal strains can be important to study. Context is essential when it comes to virulence as the consensual bacteria can become dangerous pathogens in certain situations. They note that environmental factors play a significant role in disease outbreaks, and understanding the habits of hosts like deer or pheasants, on whom ticks feed on, is crucial. Approaches like outbreak analysis that work in hospitals cannot be used in environmental settings. Disease cannot be studied as if it occurs in a vacuum. Covid-19 has shown how host switches can have severe consequences, but spillover events usually fizzle out before causing any harm. Understanding environmental factors like habitat changes may help tackle disease outbreaks better in the future. While tools like sequencing and analysis may be equivalent, questions investigated in different settings are vastly different. It is essential to comprehensively understand social science factors such as people's compliance level and risk perception when studying transmission in human communities. In conclusion, the issue of antimicrobial resistance is complex and requires a multidimensional approach involving different perspectives and fields of study.…
We celebrate having 100 episodes! We look back at the history of our podcast and then talk about what the future might hold. Then: Lee gets his revenge by having Andrew and Nabil pronounce words local to him. We very briefly mentioned this paper: https://www.nature.com/articles/s41586-022-05543-x Nabil was trying to remember this particular site and remembered it after recording: https://phagesdb.org/phages/…
At the 8th Microbial Bioinformatics Hackathon in Bath we talked to a live panel with Kristy Horan, Torsten Seemann, Finlay Maguire and Andrew Page about bioinformatics from the frontlines. We apologise for the poor audio quality, it was recorded in a room with 20 people in the background so at points it got a bit loud, however we felt you might enjoy the discussion regardless.…
We interview Frank Ambrosio. He is embarking on a lifestyle of nomadic bioinformatics, living his best life. * https://www.linkedin.com/in/francis-ambrosio/ In this episode of the MicroBinfie podcast, Frank Ambrosio, a bioinformatician working for Theiagen as a traveling bioinformatician, joins co-hosts Andrew Nabil and Lee to talk about his journey into bioinformatics. Frank shares how he transitioned from being a lab technician and microbiologist to analyzing his own data and pursuing a master's program in bioinformatics at Georgia Tech. He also discusses his experience working at the CDC, where he gained exposure to different laboratories working on tuberculosis, biodefense research and development, surveillance-oriented production laboratories for strep genomes, and the division of HIV/AIDS prevention. Frank gives tips for aspiring bioinformaticians, recommending that early career scientists focus on applying for contracting agencies at the CDC to gain valuable experience and eventually become full-time employees. He also suggests starting with a virtual machine and a cloud-based IDE like Google Cloud and VS Code for ease of use and reliability. The conversation then moves onto Frank's nomadic lifestyle as a traveling bioinformatician, and his desire to connect with the public health community worldwide. Frank shares his recent experience meeting collaborators in Mozambique and the importance of building personal connections with colleagues in public health for collaboration and support. Frank concludes by discussing his approach to routines while traveling and how he uses his Google calendar to plan out his days and weeks. He emphasizes the importance of flexibility and adaptability as a traveling bioinformatician, and his eagerness to continue meeting new people and building connections in the public health community. Moving on to Frank's lifestyle as a digital nomad bioinformatician, he explains how he enjoys enhanced flexibility, better quality of life, and the ability to work anywhere in the world. However, he also highlights that this lifestyle model could be challenging, particularly for those who prefer greater stability and predictability. Nabil wonders how possible it would be for bioinformaticians to engage in mentoring and education while working as digital nomads. Frank acknowledges the concerns but highlights that he has been fortunate enough to maintain his mentor relationships remotely. He talks about how working with someone on a project can facilitate a stronger and more rewarding mentor-mentee relationship. The hosts note that flexibility is not new to bioinformatics and that technological advancement is making it easier to find intelligent people worldwide to join in the missions of organizations like the CDC. Frank reflects on his future, reserving the potential to remain with his current institution, Theiagen. He remains optimistic about the potential of these digital collaborations and is open to new opportunities to help the global bioinformatics community.…
We discuss recent advancements in genome sequencing technologies, based on what we've been hearing at conferences and within the community. The Microbial Bioinformatics podcast brought together three experts, Andrew, Lee, and Nabil, to discuss the latest advances in sequencing technologies. The team explored the new developments in the market, including a cutting-edge instrument from Element Biosciences that captured Nabil's attention. Andrew analyzed the adaptive sequencing feature in Illumina that enables the checkout of unwanted reads. The discussion highlighted how the computing power of sequencing labs has developed due to advancements in computers, with gaming computers being repurposed to aid in data analysis. Illumina's complete long-read solution and NextSeq's kits were also topics of discussion. Moreover, the team also discussed the increasing popularity of pacbio with its hi-fi sequencing capabilities to achieve more high fidelity readings. The experts then discussed how longer reads pave the way for 4th generation sequencing while also acknowledging the challenges posed by software tools catering to the new technology. While the developments in sequencing technology seem exciting, Nabil cautioned the panel to not forget the importance of quality over quantity. In the second part of the episode, the team moved on to analyze the limitations of sequencing software, particularly regarding its long-read handling capabilities. Andrew explained how sequencing software is hard-coded to operate up to 300 paired-ended reads, and exceeding this limit often leads to software crashes. Lee asked if there was a constant limit in the source code of Spades or SKESA to limit the software's ability to handle larger datasets. Andrew answered the query by explaining that developers may have set some limits on the memory or stack size of the software, leading to issues when processing larger datasets. The team concluded by noting that the hard-coding and data processing limitations shouldn't be considered permanent obstacles as software development is a continuous process. As sequencing technologies advance, software solutions must also advance to handle increasingly complex genetic datasets better.…
We've been busy attending in-person conferences such as IMMEM XIII and ASM NGS so we thought we'd give you some of our reflections. We discuss waste water surveillance, hybrid conferences and metadata amongst other things.
For the first time ever all 3 MicroBinfies are together in person to record an episode. We are joined by Torsten Seemann for a conversation about how what we do in research can get lost in translation when applied to public health. We discuss what we did with SARS-CoV-2 genomics and somehow end up chatting about geography and language. Hope you enjoy.…
Over the past few weeks scientists have been swapping Twitter for Mastodon. Our very own Nabil-Fareed Alikhan talks about his experience with setting up and running a Mastodon server called https://mstdn.science which is one of the places where scientists have moved over to. We are joined by Emma Hodcroft to get an independent scientists view on the whole thing. In the MicroBinfie podcast, Andrew and Nabil discuss the migration of academics from Twitter to a new platform called Mastodon, with Nabil playing a significant role in this shift. According to Nabil, Mastodon is a free and open web application designed for micro-blogging. It enables integration and communication between servers, allowing the users to follow, reply, or read content from other servers. The migration happened after Elon Musk bought Twitter and made significant changes that concerned people about freedom of speech and democracy. In response, Nabil and Duncan set up their own Mastodon instance called https://mstdn.science initially planning to create a social network for bioinformaticians, microbial genomics people, and tech-savvy microbiologists. Expected to have only 50-100 users, many more scientists, including Nobel Laureates, journals, and scientists from other disciplines, joined, and Nabil's instance now has almost 2000 users. Meanwhile, other instances around science, like genomic.social or ecoevo.social, also saw a surge in sign-ups. In terms of resources, Nabil and Duncan's virtual server have almost 2000 users costing around £100 per 1000 users, depending on how much interaction and following goes on. The Mastodon network replicates content from other instances, spawning many jobs, even if a user's account doesn't change much. Nabil does not limit which instances of Mastodon communicate with his site but does block domains serving unwanted or unsafe content. Even though the Mastodon network can crash and burn, Nabil thinks it could still work in the long run. The podcast contributors suggest that Twitter's recent changes have left some users feeling dissatisfied, leading them to Mastodon, which is a decentralized social media platform. Some dodgy servers have been blocked by Mastodon for moderation, and people have moved from Twitter to Mastodon as a total replacement for Twitter. Mastodon has become a "sign" for fed-up users. According to Emma, who recently moved from Twitter to Mastodon, Mastodon is a hedge against Twitter's unknown future. Mastodon's decentralized platform allows for a shift of power towards content and interaction, not available in a centrally controlled platform. Mastodon may not replace Twitter as a one-for-one replacement, but it fits certain use cases, such as a place for academics to complain about papers. Mastodon's success is not dependent on Twitter's fate but rather on what "crazy ideas" Twitter comes up with in the future, Emma argues. While Mastodon may never be quite the same as Twitter, it could be even better.…
We go through bug reports and issues and give insights into how bioinformaticians dig into them. We suggest the underlying problems and possible solutions and also provide tips on how to file better bug reports.
Often the hard part of bioinformatics isnt the analysis, its getting all of the software you need setup and installed. Come with us on this journey and avoid dependancy hell. In the MicroBinfie podcast, the hosts discuss the struggles of installing, managing, and dealing with dependencies with bioinformatics software. In the past, software installations were a nightmare, and it was common to edit lines of code and manage dependencies manually, causing conflicts like diamond dependency. To ease this process, the hosts suggest using containers, virtual machines, and local environments. They stress the importance of adhering to semantic versioning guidelines and understanding the end-users' perspective for proper documentation, testing, and clarity regarding dependencies. Additionally, software maintenance is critical for its longevity and usability. The hosts also discuss software dependency management with different chip architectures and operating systems. The M1 Apple architecture's differences from traditional computer processors cause compatibility issues and slow down emulation, leading to difficulties in informatics. Using separate Conda environments for each project or Mamba as a package manager can solve dependency-related problems that can cause significant issues. However, Mamba may take shortcuts and create conflicts with specific programs. Other package managers like Homebrew and APT are also discussed. The episode also covers the benefits of using Docker and Singularity to manage software packages on a local machine. Docker is useful for databases, web servers, and complicated pipelines, while Singularity is perfect for more complex software and plays better with HPC. The hosts provide tips on using containers or virtual machines in a team environment, passing containers instead of binary files, and using Docker and Singularity as tools to ease the process. Overall, the episode offers practical advice to streamline the workflow of researchers who manage software packages.…
The MicroBinfie podcast discusses the top programming languages for bioinformatics. Andrew, Lee, and Nabil agree that Python is a great starting point for its consistency and rigor. Its strict syntax is ideal for teaching programming fundamentals that are essential in any language. In contrast, Perl encourages multiple ways of doing the same thing, creating confusion and difficulties in keeping track of things. The hosts caution against starting with trendy languages that are constantly changing. Instead, stick with more established languages like Python, which have established libraries and concepts that will help you advance more easily. Trendy languages come and go like changing tides, making them riskier choices. Additionally, they highlight the importance of understanding databases and their primary keys and unique fields. SQL is useful, particularly in dealing with large datasets. It is consistent across flavors and unlikely to go away soon. It takes a lot of skill to optimize queries to work in milliseconds. The hosts emphasize that the language you choose to learn depends on your individual goals and environment. For instance, Lee suggests that you should look to who is in your space and what they are using and who is willing to help you. Once you understand the programming concepts, it is easier to transfer them to other languages, and it is just a question of understanding the syntax. Andrew, Lee, and Nabil also discuss their own trajectories of learning programming languages, revealing that it takes a long time to become an expert in a language, and it is something that needs to be appreciated. They highlight the difference between just learning the basics of a language and really getting into the depths of it and the frameworks and libraries. The hosts also mention languages that are important to pick up, like SQL and bash scripting, and languages that are popular for web development, like JavaScript. However, they caution that JavaScript and Java are not the same thing and that JavaScript has a reputation for being a weird language. When asked what language they would choose for a task, Nabil says he would use Perl, Lee mentions R for stats, while Andrew admits that he has to relearn R every time he comes back to it and therefore prefers Perl for quick scripts. They also discuss their love-hate relationship with R, mentioning that while it has useful libraries like GGplot and GGtree, its syntax is difficult to work with and has separate paradigms of approaching the same problem. The hosts conclude by acknowledging that there is no one-size-fits-all approach to learning programming languages. One should choose based on their goals, environment, and personal preferences. Python is a useful language to learn, even if one is not interested in bioinformatics. Additionally, they note that the fundamentals of databases and how they work are crucial to understand and utilized across fields.…
Today on the @microbinfie podcast, we talk about WDL with @sevinsky and @DannyJPark. We learn what widdle means to Andrew and his kids. Joel takes a shot at Lyve-SET and you'll never guess what happens next. In the MicroBinfie podcast, we discuss the workflow description language (WDL) commonly used to describe bioinformatics pipelines in a portable and cross-environmental way. The starting point is the presumption that tools are already containerized, and WDL helps to bind them together. The guests highlighted that this standardizes bioinformatics in the field, making it more reproducible and scalable. It also helps remove the need for excessive CIS admin work, enabling researchers to spend more time on scientific questions. Despite having many workflow languages available, WDL is unique in its formal specification and its orthogonality to the common implementations that are used in executing those things. In the second part of the podcast discussion, guests Joel and Danny talked about workflow languages and public health bioinformatics. They highlighted the challenge of version control to quality management and its effects on the field of bioinformatics. They spoke about the origins of the community, StaphB, which comprises state-level public health bioinformaticians. The community discusses various challenges and contributes to creating links between academia and state public health departments. WDL is a workflow language used for bioinformatics work that the hosts use. Danny shared his story of how they came to use Whittle and how they realized it was the perfect language for portability of pipelines. On the other hand, Joel talked about how they chose WDL for its applicability to public health and the support it received from its creators, particularly the Broad Institute. They both agreed that the choice of workflow language was driven by the environment they could work in and which language was best suited to their needs. In conclusion, the discussion focused on the vital role of workflow languages such as WDL in bridging the gap between bioinformatics and public health. The choice of workflow language was critical and would depend heavily on the environment in which the language was used. Finally, they expressed their support for WDL and how it had helped them streamline their bioinformatics workflows.…
This is an extended directors cut of our chat with Dr Henk den Bakker about Sepia. Its a summer holiday bonus. Some URLs Get Sepia here: https://github.com/hcdenbakker/sepia Some information on the food safety informatics group at UGA: https://www.denglab.site/ Rust: https://www.rust-lang.org/ Kalamari: https://github.com/lskatz/kalamari CAMI: https://www.nature.com/articles/nmeth.4458…
We are joined again by Prof Mark Pallen who takes us through his early experiences in high-throughput microbial genomics. Mark was pleased that he persuaded Nick Loman to join him in Birmingham. Mark tells us how they worked with George Weinstock to perform the first genome sequence analyses of Gram-negatives for genomic epidemiology—in this case of multi-drug resistant Acinetobacter baumannii. After winning an Ion Torrent sequencer in a competition, Mark and Nick then contributed some pioneering genomic analyses of the German STEC outbreak. One of their studies involved crowd sourced approaches, primed by Twitter and was published in the New England Journal of Medicine; the other provided a performance comparison of newly launched bench top sequencing platforms. Mark, Lee and Nabil discuss how this outbreak overturned dogmas concerning the archetypal status of pathotypes of E. coli. The conversation then moves on to the need for evidence trails and challenging assumptions, whether annotating proteins or quoting Darwin (see https://colinpurrington.com/2012/02/darwin-on-the-floor-lhao/). Nabil recalls the excitement of realtime analysis of an epidemic and acknowledges the legacy of Mark and Nick's work in 2011 to current approaches to the Covid pandemic. Mark describes his exciting experiences exploiting metagenomics in clinical and ancient DNA contexts, including analysis of disease-associated stool samples and of 200-year TB genomes in Hungarian mummies. Yet again this led to overturning of assumptions--in this case that people only get infected with a single strain of M. tuberculosis. It turns out that multiple infections were the norm 200 years ago. Shortly afterwards, Pallen helped assemble a team that analysed undersea sediments shedding light on the Neolithic transition in England and culminating in a Science paper. Mark then takes us through his recent metagenomics analyses of critically ill patients and of the chicken gut, emphasising the excitement of finding hundreds of new species in such a commonplace setting. Mark finishes off by sharing his excitement that there is still so much of the microbial world left for us to discover using sequencing and bioinformatics analyses. We are just 2% of the way there!…
Mark Pallen explains how exciting it was to be in microbial bioinformatics around the turn of the millennium, as we gained genomes for the first time from model organisms and fearsome pathogens. He recounts working with his hero David Relman on the genome sequencing of the strange slow-growing organism called Tropheryma whipplei in competition with a French team. Mark moved to Belfast in late 1999 collaborating with another Englishman working on the island of Ireland, Tim Foster in Dublin. Pallen describes the addictive exhilaration of using PSI-BLAST to find new sortases and sortase substrates across a range of new genomes—for him this was the bioinformatics equivalent of crack cocaine. He quotes the philosopher Alfred North Whitehead in saying that the goal of every scientist is to seek simplicity but distrust it. What Mark found was that in most organisms sortases were behaving quite differently from the rather simple scenario seen in Staphylococcus aureus. He made similar observations on the WXG100 proteins and type VII secretion, which he found in many new contexts quite different from the original context of ESAT-6 as an antigen in Mycobacterium tuberculosis. Mark makes clear that we still don't really fully understand the role of ESAT-6 twenty years on. The focus of Pallen's work then shifted to E. coli, where he described vestigial gene clusters for non-functional type III secretion systems in this model organism. He came to realize that E. coli K-12 was not handed to microbiologist by God as a model organism but was just another strain of E. coli and nothing special. Many of the earliest genomes to be sequenced came from worn-out lab strains. To counter this problem, Gordon Dougan at the Wellcome Trust Sanger Institute moved the focus to genome-sequencing freshly isolated minimally passaged isolates. With Brendan Wren, Pallen wrote a review article for nature, emphasizing the importance of adopting an eco-evo perspective when trying to interpret bacterial genomes. Around that time, Scott Beatson joined Pallen's group. Mark managed to persuade Scott to work on type III secretion in E. coli rather than Pseudomonas aeruginosa. The result was the discovery of dozens of new type III secretion effectors, tying together bioinformatics and lab work to culminate in a PNAS paper. References - https://microbinfie.github.io/2022/06/09/bioinformatics-in-the-noughties.html…
We discuss mobile genetic elements in bacteria and find, its really hard. Its just a short chat as Lee lost power, but we will be back with a part 2 sometime soon.
In this episode we talk to Professor Mark Pallen, who discusses the highlights from his long career as a medical microbiologist turned bioinformatician. His bioinformatics journey began in 1977, the year Fred Sanger invented DNA sequencing-as-we know-it, when Mark was tasked with assembling some amino acid sequences under exam conditions. Mark explains how little was know about sequences at the time. Luckily he managed to gain a grasp of molecular biology and joined a group in the late 1908s at Barts Hospital in London, where he met Brendan Wren. Mark's first eureka moment followed shortly afterwards, when he analysed sequences encoding the key enzyme urease from Helicobacter pylori. He also got very excited when he analysed genes from a clostridial butanol fermentation pathway, which he explains, played a central role in the formation of the state of Israel. His next big break came when he got the chance to do a PhD under Gordon Dougan. During this time, Mark not only improved his lab and bioinformatics skills, but captained a winning team in University Challenge and introduced the medical profession to the Internet. He recalls with excitement the moment when he first heard the news that a bacterial genome had been sequenced. Shortly afterwards he recruited an 18-year old gap year student, Nick Loman, to come and work with him analysing the very first Campylobacter jejuni genome. We close this episode just as the new millennium begins, with much more excitement to follow in the next episode. Relevant links: Butanol - https://academic.oup.com/femsle/article/124/1/61/486499 Tree-like thinking for genes, languages and gospel manuscripts - https://www.youtube.com/watch?v=8Ykj5wQs7vU Further references - https://microbinfie.github.io/2022/05/12/bioinformatics-moments-before-the-millennium.html…
We bring on Lingzi Xiaoli and Jill Hagey to talk about their benchmark datasets for SARS-CoV-2. Find out more at https://github.com/CDCgov/datasets-sars-cov-2. See our previous episode for part 1 of the conversation. * Previous paper for bacterial datasets can be found at https://peerj.com/articles/3893/ * Jill can be found on Twitter at @JillHagey and jvhagey.github.io * Lingzi can be found on LinkedIn at https://www.linkedin.com/in/lingzi-xiaoli-27b87174/…
We bring on Lingzi Xiaoli and Jill Hagey to talk about their benchmark datasets for SARS-CoV-2. Find out more at https://github.com/CDCgov/datasets-sars-cov-2 * Previous paper for bacterial datasets can be found at https://peerj.com/articles/3893/ * Jill can be found on Twitter at @JillHagey and jvhagey.github.io * Lingzi can be found on LinkedIn at https://www.linkedin.com/in/lingzi-xiaoli-27b87174/…
Dr Erin Young from the Utah Department of Health and Dr Kelsey Florek from the Wisconsin State Laboratory of Hygiene join us to talk about StaPH-B containers for public health bioinformatics. Its basically how to make biology easier for everyone! Github: https://github.com/StaPH-B
Dr. Erin Young and Dr Kelsey Florek join us to talk about StaPH-B, a US state public health bioinformatics group. They also give some insights into the popular SARS-CoV-2 pipeline cecret. Website: staphb.org/ Cecret Pipeline: github.com/CDCgov/SC2CLIA Kelsey explains that StaPH-B was created to facilitate collaborations between bioinformaticians in state public health laboratories, especially those just getting started with sequencing and understanding the data generated. The organization provides a conduit of communication and expertise among different laboratories, feeding into projects funded by the NIH, CDC, and other grant agencies. Erin highlights StaPH-B's diverse membership with different levels of expertise, which provides excellent learning opportunities. The organization uses a Slack workspace, with almost 400 members and over 50 channels dedicated to different activities related to bioinformatics, providing a valuable resource for bioinformaticians to seek out answers to questions and ideas. The hosts ask Kelsey about who can join StaPH-B, and Kelsey clarifies that while it was initially founded for state public health bioinformaticians, it's open to everyone, and the content is focused on state public health activities. They discuss some of the achievements of StaPH-B, with Kelsey hailing the Slack workspace, collaborations on GitHub, Docker, and collaborative workflows as hugely successful. Additionally, Erin thinks that StaPH-B's training activities, including the Staph-B Toolkit, training sessions, and videos, ensure that knowledge and expertise are shared. The conversation moves towards the Cecret pipeline, one of Erin's bioinformatics pipelines for SARS-CoV-2. She explains that the pipeline was developed during the pandemic, with the idea of using the Arctic group's protocol for sequencing SARS-CoV-2 on the Nanopore sequencing platform. However, Erin needed a bioinformatic pipeline that was Illumina-based, as it would have been easier to sequence SARS-CoV-2 on the MiSeq, rather than the Nanopore sequencing platform. Cecret pipeline was developed using BWA as the default aligner and is for viral-based sequencing with a known, reliable reference. Erin points to the SEQret pipeline tutorials and the monthly videos produced by StaPH-B that outline various state laboratory projects as tips for people entering the field. Lastly, Kelsey emphasizes the importance of finding a use case to start building a centralized source of expertise in bioinformatics and making knowledge accessible by having a common resource that's easy to access. In a previous episode, the guest speakers discussed the evolution of COVID genome analysis workflows and how they have changed over time due to the increasing amount of data being analyzed. They mentioned the use of different workflows such as Secret, NF Core, Monro, and Next Flow Optic Pipeline, each with their own unique features and popularity. Erin, the creator of Secret, talked about how paranoid she was when sharing her workflow publicly and how she would track every fork of her repository to ensure that the changes made to her code were scientifically sound. The workflow has undergone gradual changes and fewer bugs since its creation, with no dramatic turning point. Its name, “Secret,” was inspired by a hiking landmark in Northern Utah that Erin found meaningful. The speakers emphasized the importance of managing and working with the increasing amounts of COVID data being analyzed, as well as connecting it to public health. In conclusion, StaPH-B and workflows such as Secret are playing a significant role in the field of bioinformatics and COVID genome analysis. Collaborations and resources like StaPH-B are essential in sharing knowledge and expertise among different laboratories, allowing for the successful completion of projects funded by the NIH, CDC, and other grant agencies.…
Today we’re talking about some exciting new developments in the area of comparative genomics. We are joined by Dr. Zamin Iqbal who is a Research Group Leader at the European Bioinformatics Institute and Dr. Grace Blackwell who is jointly at the European Bioinformatics Institute, in Zam’s group and Nick Thomson’s team at Wellcome Sanger Institute…
What are the major challenges for getting AMR genomics into the clinic? This was the question poised to a panel of experts at the 7th Microbial Bioinformatics hackathon run in conjunction with JPIAMR, PHA4GE and CLIMB. The panel were: Mark Pallen from the Quadram Institute Bioscience, UK, Finlay Maguire from Dalhousie University, Canada, Anthony Underwood from the Centre for genomic pathogen surveillance, UK and Clement Tsui from the Weill Cornell Medicine, Qatar. Andrew Page was the Chair, supported by Lee Katz.…
We are again joined by Dr Robert Petit from the Wyoming Public Health Laboratory who is talking to us about BACTOPIA, a bioinformatics workflow specifically for bacterial genomes. Docs: https://bactopia.github.io/ Repo: https://github.com/bactopia/bactopia/ Pub: https://doi.org/10.1128/mSystems.00190-20…
We are joined by Dr Robert Petit from the Wyoming Public Health Laboratory who is talking to us about BACTOPIA, a bioinformatics workflow specifically for bacterial genomes. Docs: https://bactopia.github.io/ Repo: https://github.com/bactopia/bactopia/ Pub: https://doi.org/10.1128/mSystems.00190-20
We finish our discussion on bacterial taxonomy, this time looking at new approaches of naming the multitudes of unnamed uncultured organisms and the controversial renaming of phyla. With guests Professor Phil Hugenholtz, Professor Iain Sutcliffe and Professor Mark Pallen. Selective bibliography: https://github.com/MicroBinfie/MicroBinfie.github.io/blob/45db8eb57d732176449073065dbdacc88a288fe9/assets/Taxonomy_Selective_bibliography.pdf…
We continue our discussion on bacterial taxonomy, this time looking at how genomics has changed taxonomy with: Professor Phil Hugenholtz, Professor Iain Sutcliffe and Professor Mark Pallen. Selective bibliography: https://github.com/MicroBinfie/MicroBinfie.github.io/blob/45db8eb57d732176449073065dbdacc88a288fe9/assets/Taxonomy_Selective_bibliography.pdf…
There has been a lot of discussion about bacterial taxonomy recently announced regarding phyla, and this revealed a lot of misconceptions around taxonomy in general. Today we discuss the background to bacterial taxonomy with: Professor Phil Hugenholtz, Professor Iain Sutcliffe and Professor Mark Pallen. Selective bibliography: https://github.com/MicroBinfie/MicroBinfie.github.io/raw/45db8eb57d732176449073065dbdacc88a288fe9/assets/Taxonomy_Selective_bibliography.pdf…
We’re navigating the twisted world of bacterial taxonomy. We have some excellent guides to help us! Our guests today are: Dr. Leighton Pritchard: Who is a Strathclyde Chancellor's Fellow at Strathclyde Institute of Pharmacy and Biomedical Sciences in the University of Strathclyde Dr. Conor Meehan: Dr. Conor Meehan is an assistant professor in molecular microbiology at the University of Bradford.…
Have you ever read a paper and wondered why the author buried their key result on page 39 of a 50 page paper? Bioinformaticians aren't great at communicating themselves or their science to the wider world, so we have a chat about it, specifically with bioinformaticians in mind. Topics include: Online presence/ scholarly communications for bioinformatics Google scholar/Scopus Marketing to a set of peers organisation websites managing publication record librarians citation metrics peer review twitter blogs personal webpage github for personal webpages sign up to github --help citation and name making your code public as you develop it. links in your signature self promote at end of slides just starting out Links: How to pronounce ORCID - https://info.orcid.org/how-should-orcid-be-pronounced/#:~:text=But%2C%20for%20those%20who%20want,pronounced%20%E2%80%9Coar%2Dkid%E2%80%9D. Phil Ashtons blog: https://bitsandbugs.org/ Publons: https://publons.com/about/home/…
We discuss how to effectively review bioinformatics papers, what to look out for and tips for researchers to when writing bioinformatics or microbial genomics papers to ease their way through review.
We’re continuing our series where we examine a particular microbial species in some depth. We’re continuing our look at Mycobacterium tuberculosis, focusing more on bioinformatics, genomics and typing. Our guests are Dr. Suzie Hingley-Wilson Lecturer in Bacteriology at the University of Surrey, Dr. Dany Beste Senior Lecturer in Microbial Metabolism at the University of Surrey and Dr. Conor Meehan assistant professor in molecular microbiology at the University of Bradford.…
We’re continuing our series where we examine a particular microbial species in some depth. We’re talking about mycobacterium tuberculosis, the forgotten pandemic. Our guests are Dr. Suzie Hingley-Wilson Lecturer in Bacteriology at the University of Surrey, Dr. Dany Beste Senior Lecturer in Microbial Metabolism at the University of Surrey and Dr. Conor Meehan assistant professor in molecular microbiology at the University of Bradford.…
Our guest today is Dr Ozan Gundogdu for a deeper dive into the food borne pathogen Campylobacter and how genomics has informed the field over the past 20 years since the publication of the first reference genome in 1999. Ozan leads the foodborne enteric pathogen group at the London school of hygiene and tropical medicine. Where they study the physiology and pathogenesis of Campylobacter and other related enteric microorganisms like Listeria and Vibrio. His background is in Molecular Biology and Computer Science and he completed his PhD at LSHTM (London School of Hygiene & Tropical Medicine) in 2011. www.lshtm.ac.uk/aboutus/people/gundogdu.ozan…
Our guest today is Dr Ozan Gundogdu and he gives us a crash course in food borne pathogen Campylobacter. If you've ever had a dodgy tummy after eating undercooked chicken, Campy is probably the cause. Ozan leads the foodborne enteric pathogen group at the London school of hygiene and tropical medicine. Where they study the physiology and pathogenesis of Campylobacter and other related enteric microorganisms like Listeria and Vibrio. His background is in Molecular Biology and Computer Science and he completed his PhD at LSHTM (London School of Hygiene & Tropical Medicine) in 2011. https://www.lshtm.ac.uk/aboutus/people/gundogdu.ozan…
We look at the crazy world of NFTs (non fungible tokens) and blockchain and explore in a light hearted way how they could be used in genomics and bioinformatics. We propose a way of replacing all the central genome databases and our own cryptocurrency (BioBucks or maybe GenomeCoin). Myriad genetics SCOTUS decision - https://www.supremecourt.gov/opinions/12pdf/12-398_1b7d.pdf Planet money episode on patenting a gene - https://www.npr.org/transcripts/937167323 Beeple - https://www.theverge.com/2021/3/11/22325054/beeple-christies-nft-sale-cost-everydays-69-million Make NFT - https://www.coindesk.com/how-to-create-buy-sell-nfts…
Torsten Seemann joins us to discuss how to write good bioinformatics software. Torsten is the author of many popular bioinformatics tools such as Prokka, Snippy, Barrnap, Abricate, Shovill, and Nullarbor. Links: https://github.com/tseemann
Today we’re talking about getting your head around our favourite enteric microbes. E. coli and Salmonella. Why do they have some of the names they have? Primer on ANI: https://www.pnas.org/content/102/7/2567.short Wikipedia on Salmonella serovars: https://en.wikipedia.org/wiki/Kauffman%E2%80%93White_classification Nabil's viral tweet: https://twitter.com/happy_khan/status/1387804862830809091?s=20…
We dive deeper into Curtis's and Kevin's career! Relevant links: https://www.skypeascientist.com/ https://www.aphl.org/fellowships/Pages/Bioinformatics.aspx https://www.jmu.edu/genomics/index.shtml jmu.edu/biology
The crew talks to Curtis Kapsak and Kevin Libuit about the StaPH-B containers. What a valuable resource! Some URLS: * StaPH-B docker-builds code repository: https://github.com/StaPH-B/docker-builds * StaPH-B DockerHub container repositories: https://hub.docker.com/u/staphb * Guide for contributing: https://staph-b.github.io/docker-builds/contribute/…
Part 2 going through a new method for identifying variants of concern using Sanger sequencing, and we’re joined by two of the authors of this method Kai Blin and Tue Jorgensen, both from the Technical University of Denmark. The preprint: www.medrxiv.org/content/10.1101/2….03.27.21252266v1 The protocol: www.protocols.io/view/sanger-sequ…-2-spik-bsbdnai6 The software: github.com/kblin/covid-spike-classification The web app: ssi.biolib.com/app/covid-spike-classification/run…
Today we’re going through a new method for identifying variants of concern using Sanger sequencing, and we’re joined by two of the authors of this method Kai Blin and Tue Jorgensen, both from the Technical University of Denmark. The preprint: https://www.medrxiv.org/content/10.1101/2021.03.27.21252266v1 The protocol: https://www.protocols.io/view/sanger-sequencing-of-a-part-of-the-sars-cov-2-spik-bsbdnai6 The software: https://github.com/kblin/covid-spike-classification The web app: https://ssi.biolib.com/app/covid-spike-classification/run…
The Canadians have taken over the podcast - AGAIN ! Join guest host Dr. Emma Griffiths, as she talks with Dr. Finn McGuire and Dr. William Hsiao about the SARS-CoV-2 genomics epidemiology efforts in Canada. Cancogen website : https://www.genomecanada.ca/en/cancogen
The Canadians have taken over the podcast ! Join guest host Dr. Emma Griffiths, as she talks with Dr. Finlay McGuire and Dr. William Hsiao about the SARS-CoV-2 genomics epidemiology efforts in Canada. Cancogen website : https://www.genomecanada.ca/en/cancogen
Denmark is one of the leading countries in the world forSARS-CoV-2 genomic surveillance. Prof Mads Albertsen chats to us about SARS-CoV-2 sequencing in Denmark, how it got started, logistics, tweaked protocols and things hes learnt along the way. Edited by Niamh Page
We discuss the latest developments in SARS-CoV-2 genomics over the last 2 weeks with Mads Albertsen and the latest developments in Denmark. Denmark covid stats: www.covid19genomics.dk/statistics Tools and resources mentioned: https://outbreak.info/ https://virological.org/t/outbreak-info-sars-cov-2-mutation-situation-reports/629 https://gitlab.com/johan.bernal.morales/sarscov2 Publications mentioned: A Comparison of Performance for Different SARS-Cov-2 Sequencing Protocols https://www.biorxiv.org/content/10.1101/2021.03.01.433428v1 Before the Surge: Molecular Evidence of SARS-CoV-2 in New York City Prior to the First Report https://www.medrxiv.org/content/10.1101/2021.02.08.21251303v1 SARS-CoV-2 within-host diversity and transmission https://science.sciencemag.org/content/early/2021/03/09/science.abg0821 Info on masking SARSCOV2 sites: https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/subset_vcf/problematic_sites_sarsCov2.mask.vcf…
Invasive non-typhoidal Salmonella is a significant public health challenge in Africa. We talk to Abdoulie Kanteh and Grant Mackenzie about their work in The Gambia, and their use of genomics to understand the serovars in circulation. Manuscript: https://www.biorxiv.org/content/10.1101/2021.02.18.431831v2…
We are joined by Peter van Heusden to discuss all the latest developments in SARS-CoV-2 genomics, particularly around tools and resources. We also discuss the challenges of building and sharing data in large scale sequencing endeavours. Tools and resources: Nextalign github.com/nextstrain/nextclade/releases COG mutation explorer http://sars2.cvr.gla.ac.uk/cog-uk/ Grinch https://cov-lineages.org New US tool for variants: https://www.cdc.gov/coronavirus/2019-ncov/transmission/variant-cases.html https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-surveillance/global-variant-map.html Online lineage assignment: https://pangolin.cog-uk.io/ CoronaHiT paper published: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-021-00839-5…
We discuss the latest developments in SARS-CoV-2 genomics over the last 2 weeks with Peter van Heusden, covering the growing list of Variants of Concern and the latest developments in Africa. Papers & resources mentioned: American birds (677) - https://www.medrxiv.org/content/10.1101/2021.02.12.21251658v2 https://github.com/cov-lineages/pango-designation PHE thresholds for different variants: https://www.gov.uk/government/publications/covid-19-variants-genomically-confirmed-case-numbers/variants-distribution-of-cases-data A.23.1 in Uganda - https://www.medrxiv.org/content/10.1101/2021.02.08.21251393v1 B.1.525 - https://github.com/cov-lineages/pango-designation/issues/4 Zambia sequences - https://www.cdc.gov/mmwr/volumes/70/wr/mm7008e2.htm?s_cid=mm7008e2_w…
We present a rapid round up of SARS-CoV-2 questions and issues, hopefully with some answers, so that you can stay on top of the latest in SARS-CoV-2 genomics. Recorded 5 February 2021. Topics covered: Why missing 1 SNP can cause lineage assignment to break and how it works? How do we describe lineages with a chain of mutation events? Are we seeing convergent evolution? Co-infections of different lineages discovered? For nanopore basecalling do use HAC & should you get a GPU? Basic logistics difficult for sequencing in many parts of world. Can I look at recombination with ARTIC on Illumina? How do you annotate a SARS-CoV-2 sequence? http://cov-glue.cvr.gla.ac.uk/#/home FASTA > nextclade VCF > snpeff https://github.com/cov-ert/type_variants Spotting community spread from NextStrain? Scientists call for fully open sharing of coronavirus genome data: https://www.nature.com/articles/d41586-021-00305-7…
We discuss recent updates to the best SARS-CoV-2 resources, so that you can stay on top of the latest bioinformatics and genomics tools. Recorded 5 February 2021. CoVariants website: http://covariants.org/ Microreact: https://microreact.org/project/cogconsortium/ Lineage reports: https://cov-lineages.org/ CLIMB ARTIC workshop online resources: https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop/ Multiplex PCR for B.1.1.7, B.1.351 and P.1: https://www.protocols.io/view/multiplexed-rt-qpcr-to-screen-for-sars-cov-2-b-1-1-brrhm536 CDC videos: https://www.cdc.gov/amd/training/covid-19-gen-epi-toolkit.html…
We chat with Nabil about EnteroBase, and learn about the background to the project, the general benefits of the platforms and some of the strange quirks users might encounter. EnteroBase is an integrated software environment that supports the identification of global population structures within several bacterial genera that include pathogens. Papers: Mentioned PLOS genetics paper: Alikhan et al. (2018) A genomic overview of the population structure of Salmonella. PLoS Genet 14 (4): e1007261 https://doi.org/10.1371/journal.pgen.1007261 Paper describing Enterobase and the genome fishing expeditions: Zhou et al. (2020) The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny and Escherichia core genomic diversity. Genome Res. 30:138-152. https://doi.org/10.1101/gr.251678.119 rMLST is described in: Jolley et al. 2012 Microbiology 158:1005-15. https://doi.org/10.1099/mic.0.055459-0 Resources Enterobase: http://enterobase.warwick.ac.uk/ PubMLST https://pubmlst.org/ About EnteroBase schemes: https://enterobase.readthedocs.io/en/latest/enterobase-tutorials/deeper-lineages.html Software: Enterobase toolkit and background software: https://github.com/zheminzhou/EToKi Errata. Jay Hinton’s Salmonella is a ST313 (D23580), not ST131 (I always mix the numbers up -- Nabil)…
Joshua Quick from the University of Birmingham talks about "How to sequence SARS-CoV-2 using the ARTIC protocol". This was part of a joint ARTICnetwork & CLIMB-BIG-DATA workshop on COVID-19 data analysis and chaired by Nick Loman. Links: https://twitter.com/Scalene/status/1349402397249056779 https://primalscheme.com/ https://www.protocols.io/view/ncov-2019-sequencing-protocol-v3-locost-bh42j8ye https://github.com/artic-network/rampart…
Sam Sheppard from the University of Bath presents at the ARTICnetwork & CLIMB-BIG-DATA workshop on COVID-19 data analysis, motivating why we should use genomics in an epidemic. He gives background on typing schemes, different ways of sequencing and challenges such as how you can analyse large mounts of genomic data. Resources: https://sheppardlab.com/ https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop/…
ARTICnetwork & CLIMB-BIG-DATA present a panel discussion on overcoming barriers to SARS-CoV-2 data analysis with Nick Loman and Will Rowe from the University of Birmingham, Áine O'Toole from the University of Edinburgh, Andrew Page from the Quadram Institute and Anna Price from MRC CLIMB and Cardiff University. This was part of a workshop on COVID-19 data analysis. Topics covered: Collecting sample metadata intrapatient variability Building bridges with policy makers to start sequencing Data sharing Improving bioinformatics skills Pipeline and software validation Bioinformatics reproducibility and quality Papers: The PHA4GE SARS-CoV-2 Contextual Data Specification for Open Genomic Epidemiology https://www.preprints.org/manuscript/202008.0220/v1 MAJORA: Continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance https://www.biorxiv.org/content/10.1101/2020.10.06.328328v1 Genomic sequencing of SARS-CoV-2: a guide to implementation for maximum impact on public health https://www.who.int/publications/i/item/9789240018440 Resources: https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop/ https://github.com/SamStudio8/majora https://soundcloud.com/microbinfie/majora https://github.com/pha4ge/SARS-CoV-2-Contextual-Data-Specification https://pha4ge.org/ Software: https://github.com/cov-lineages/pangolin https://github.com/artic-network/civet https://github.com/COG-UK/grapevine https://github.com/cov-lineages/pangoLEARN…
We talk to Dr Emma Griffiths (UBC), Dr Ruth Timme (FDA) and Dr Duncan MacCannell (CDC) about the PHA4GE SARS-CoV-2 contextual data specification for open genomic epidemiology. Paper: https://www.preprints.org/manuscript/202008.0220/v1 Specification: https://github.com/pha4ge/SARS-CoV-2-Contextual-Data-Specification Protocols: https://www.protocols.io/workspaces/pha4ge The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatic tools and resources, and advocate for greater openness, interoperability, accessibility and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a clear and present need for a fit-for-purpose, open source SARS-CoV-2 contextual data standard. As such, we have developed an extension to the INSDC pathogen package, providing a SARS-CoV-2 contextual data specification based on harmonisable, publicly available, community standards. The specification is implementable via a collection template, as well as an array of protocols and tools to support the harmonisation and submission of sequence data and contextual information to public repositories. Well-structured, rich contextual data adds value, promotes reuse, and enables aggregation and integration of disparate data sets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19.…
How do you make bioinformatics software sustainable so that we can move our field from academic research into hospitals and doctors offices? We discuss the nuts and bolts of making sustainable bioinformatics software and changes you can make in your own practices: Documentation, Coding styles, Versioning, SOPs and capturing institutional knowledge, Software licencing, Automated testing, Measuring your impact, and going the commercial route.…
Experimental projects lead to experimental software. We discuss the issues around the sustainability of academic bioinformatics software. How it currently works, why software dies, quality issues, and what we can do to keep it going.
We chat with the authors of CoronaHiT which lets you sequence up to 94 SARS-CoV-2 samples on a single MinION flowcell. This reduces the cost of sequencing 3-fold, with a simpler, faster protocol. Justin O'Grady and David Baker join Andrew Page and Nabil-Fareed Alikhan to chat about how it all works, how it came into being and why its awesome. Preprint: https://doi.org/10.1101/2020.06.24.162156…
When you get an assembly the fun doesnt stop there. You then have to fix it up and see how good it is. In this episode we discuss scaffolding, gapfilling, polishing, assembly metrics, quality control, genome structure, and visualisation tools. Tools and papers mentioned: https://github.com/quadram-institute-bioscience/socru https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0004446…
We chat to Justin O'Grady and Andrew Page on how to get a SARS-CoV-2 sequencing effort off the ground in the middle of a pandemic and go on to sequence 1,500 genomes in 2 months. The Quadram institute is one of 16 sequencing centres in the UK which are part of the COVID-19 genome sequencing consortium. Things we touch off include COG, contamination issues, the people, and bioinformatics. Further information from: https://www.cogconsortium.uk/ If you want to access the sequencing data produced by Quadram, please checkout the ENA and GISAID.…
Preprocessing of sequence data in advance of de novo assembly is a critical step to improving the final quality of your assembly. We chat about read trimming, correction and filtering, collectively called 'Read Healing'. Software mentioned: https://github.com/sanger-pathogens/plasmidtron
Short read de novo assembly is discussed in this podcast. We cover the history of assembly and how short read assemblers have evolved into what we use today. The main focus is on bacterial assembly.
A panel discussion held in MRC Gambia at The London School of Hygiene and Tropical Medicine, recorded in front of a live audience of scientists in January 2020, when SARS-CoV-2 was just beginning to be reported as an emerging infectious disease. The panel consisted of Andrew Page and David Baker from the Quadram Institute, Ozan Gundogdu from LSHTM (UK), Abdul Sesay from LSHTM (Gambia) and Nick Loman from the University of Birmingham and chaired by Suzie Hingley-Wilson from the University of Surrey. Questions, discussion and comments from: Mark Pallen - Quadram Institute Muna Anjum - APHA Arnoud van Vliet - University of Surrey Martin Antonio - LSHTM Archie Worwui - LSHTM Unfortunately we weren't able to capture the audio from Nick Loman who joined via Skype, but he did make really great contributions.…
We share short war stories about bioinformatics from the trenches of research. We share tales about a mystery massive Listeria outbreak, bubbles on flowcells, GAII woes, contaminated databases, the mass murderer, water pouring into a sequencing centre, changing protocols without validation, Excel issues and the ultimate complexity in science - Humans.…
Someone asks a simple question like "What programming language should I learn?", without realising just how difficult it is to answer. We discuss different programming languages and their use in Bioinformatics.
ARTICnetwork & CLIMB-BIG-DATA present a panel discussion on SARS-CoV-2 phylogenomics with Nick Loman from the University of Birmingham, Verity Hill from the University of Edinburgh, Andrew Page from the Quadram Institute and Anna Price from MRC CLIMB and Cardiff University. This was part of a workshop on COVID-19 data analysis. The topics covered are: More about Polecat Whats the difference between COG-UK Phylotypes and Pangolin lineages? What is the difference between Civet and Llama Can you use BLAST to find similar SARS-CoV-2 genomes? How do you find similar sequences in the public repositories to give your samples context? How do you manually curate the dataset for PangoLEARN? How does Civet choose what constitutes a subtree? Can Civet be adapted to other viruses? Are the vaccine sequences available? Databases that report variation Quality of variant calls? Groups: https://www.climb.ac.uk/artic-and-climb-big-data-joint-workshop/ https://www.climb.ac.uk/ https://artic.network/ https://cogconsortium.uk/ Software: https://github.com/COG-UK/polecat https://github.com/cov-lineages/pangolin https://github.com/cov-lineages/llama https://github.com/artic-network/civet https://github.com/cov-lineages/pangoLEARN http://tree.bio.ed.ac.uk/software/figtree/ Analysis websites: https://cov-lineages.org/ https://clades.nextstrain.org/ https://pangolin.cog-uk.io/ http://cov-glue.cvr.gla.ac.uk/…
Over the last year we've learnt a lot about SARS-CoV-2 genomics. Lee extracts all the insider knowledge from our brains and we give him the honest truth to his probing questions. We cover: Pipelines for SARS-CoV-2 Archives & metadata Read filtering Assembly vs consensus Amplicon data analysis Controls If things look too good Coverage .... Some URLs: https://github.com/connor-lab/ncov2019-artic-nf https://github.com/jts/ncov-tools https://github.com/lskatz/SARS-CoV-2-trueTree…
Andrew talks to Niamh Tumelty from the University of Cambridge about SARS-CoV-2 'new variants' and tries to clear up some of the confusion around all the names flying around. Hopefully this helps to give some insights into the various names you hear, but probably by the time you listen the whole thing will have changed again since this field moves so rapidly. Andrew apologises in advance for all the errors that will be found in this podcast! If you want to read a bit more theres an interesting news item on Nature: https://www.nature.com/articles/d41586-021-00097-w…
Dr Emma Griffiths and Dr João Carriço join us to give us a crash course on ontologies, the bioinformatics secret sauce that makes all things work well. Microbial genomics data is like the Tower of Babel with public health talking to food regulators talking to agriculture talking to veterinarians talking to healthcare, all with different languages to describe similar things. So how do we get any work done at all? Listen in to why you need ontologies in your life!…
We chat with Andrew about Roary, software for generating a pangenome, and learn about the background to the project, where the name comes from, hidden features and the light hearted FAQ. Paper: https://academic.oup.com/bioinformatics/article/31/22/3691/240757 Software: https://github.com/sanger-pathogens/Roary Documentation: https://sanger-pathogens.github.io/Roary/…
Have you ever wanted to pack in your job and move to the other side of the world? We chat to Phil Ashton on his travels as a bioinformatician, going from the UK to Vietnam to Malawi. We also wander into microbial bioinformatics and his passion for Salmonella and ETEC. Papers mentioned: SNP-sites: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5320690/ Genotyphi: https://www.nature.com/articles/ncomms12827/ ETEC lineages: https://www.biorxiv.org/content/10.1101/2020.07.16.203430v1.abstract…
We chat with Phil Ashton about his move from the wet lab into the dry lab to become a bioinformatician, and his experiences with working in public health and in low and middle income countries.
Someone shows up at your door wanting to get a nature paper in bioinformatics and they only have a week, where do you start? We talk bioinformatics training with Finlay Maguire.
Nick Loman, Sam Nicholls and Radosław Popławski join us to discuss building systems to support the analysis of hundreds of thousands of SARS-CoV-2 genomes in the middle of a pandemic, without using Excel. Preprint about MAJORA: Continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance: https://doi.org/10.1101/2020.10.06.328328 COGUK website: https://www.cogconsortium.uk/…
We chat to Justin O'Grady, Andrew Page and Alison Mather about how they went about sequencing 1500 SARS-CoV-2 genomes from one small region in the UK (Norfolk), and how they went about using the data for genomic epidemiology to help get a detailed, near real-time view of the pandemic as it unfolded. Some of the key points are that in Norfolk and surrounding regions: 100 distinct UK lineages were identified. 16 UK lineages found in key workers were not observed in patients or in community care. 172 genomes from SARS-CoV-2 positive samples sequenced per 100,000 population representing 42.6% of all positive cases. SARS-CoV-2 genomes from 1035 cases sequenced to a high quality. Only 5 countries, out of 103, have sequenced more SARS-CoV-2 genomes than have been sequenced in Norfolk for this paper. Samples covered the entire first wave, March to August 2020. Stable evolutionary rate of 2 SNPs per month. D614G mutation is the dominant genotype and associated with increased transmission. No evidence of reinfection in 42 cases with longitudinal samples. WGS identified a sublineage associated with care facilities. WGS ruled out nosocomial outbreaks. Rapid WGS confirmed the relatedness of cases from an outbreak at a food processing facility. The manuscript is available from: https://www.medrxiv.org/content/10.1101/2020.09.28.20201475v1…
To help us celebrate a full year of the podcast after our first episode on September 19, 2019, we go into how we started the podcast. We talk about all aspects including how we met, why we started it, and then how we actually started it.
Paper: https://joss.theoj.org/papers/10.21105/joss.01762 Repository: https://github.com/lskatz/mashtree We chat to the author of Mashtree, bioinformatics software for creating a very fast tree from genomes. Citation Katz et al., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, https://doi.org/10.21105/joss.01762…
We chat to Nabil-Fareed Alikhan about the bioinformatics software he authored called BRIG, the BLAST Ring Image Generator. Software: http://brig.sourceforge.net/ Paper: https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-12-402
We chat to the author of SNP-sites, bioinformatics software for extracting SNPs from a multi-FASTA alignment. Sounds simple but behind all of our software are quirky details that never make it into the final paper. Software: https://github.com/sanger-pathogens/snp-sites Paper: https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000056 "SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments", Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, Simon R. Harris, Microbial Genomics 2(4), (2016)…
Have you made a nobel prize winning discovery, or are you just looking at contamination. We discuss common sources of contamination in genome sequencing, covering both short and long reads.
So it's the end of 2019 and we thought we'd like to pause and look back at what we were working on . What resonated with us and where we think the micro binfie field will go in the new year.
Torsten Seemann joins us to discuss how to write good bioinformatics software. Torsten is the author of many popular bioinformatics tools such as Prokka, Snippy, Barrnap, Abricate, Shovill, and Nullarbor. Links: https://github.com/tseemann
FASTQ files are the foundation of modern bioinformatics. Tune into an informal chat between Lee, Nabil and Andrew as they tell the story of how FASTQs evolved out of nowhere, with all the backstories and qwerks. You might even learn something.
In this episode we identify areas of “Peak-bioinformatics”. There are a lot of existing bioinformatics software out there - more often than not the new tool you want to write already exists or a new tool cannot effectively improve. We discuss this in terms of metagenomics and anti microbial resistance. Question and comments? microbinfie@gmail.com SHOW NOTES Generally, novel software is not needed if: There are a plethora of existing tools The problem is more or less solved or its been shown to be unsolvable The underlying technology or problem is now obsolete and/or superceded by other methods. Metagenomics Taxonomic classification: Megan, Kraken, SIGMA, MIDAS, metaphlan2, mOTUs. Assemblers: MetaSpades, metaflye,MEGAHIT, MetaVelvet, a lot of single isolate assemblers have been tweaked to run on metagenomes. https://github.com/lskatz/Kalamari AMR https://github.com/arpcard/amr_curation https://food-safety-bioinformatics-hackathon.github.io/AMR-protocols/ ABRICATE https://github.com/tseemann/abricate ARIBA https://www.sanger.ac.uk/science/tools/ariba Too many detection tools: https://docs.google.com/spreadsheets/d/18XGWpDiaE249qQKDAL7gdBCka0Z1drpA_s3FElfMJe0/edit#gid=0 Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we've learnt over the years. If you're student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you. The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan and Dr. Andrew Page both from Quadram Institute Bioscience (UK) and bring together years of experience in microbial bioinformatics. The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention or Quadram Institute Bioscience. Intro music : Werq - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/…
In this episode we identify areas of “Peak-bioinformatics”. There are a lot of existing bioinformatics software out there - more often than not the new tool you want to write already exists or a new tool cannot effectively improve. We discuss this in terms of genome assembly, read mapping and phylogenetics. Question and comments? microbinfie@gmail.com SHOW NOTES Generally, novel software is not needed if: There are a plethora of existing tools The problem is more or less solved or its been shown to be unsolvable The underlying technology or problem is now obsolete and/or superceded by other methods. Multiple sequence aligners: MAFFT https://mafft.cbrc.jp/alignment/software/ MUSCLE https://www.drive5.com/muscle/ Whole genome aligners: Mauve http://darlinglab.org/mauve/mauve.html Mugsy http://mugsy.sourceforge.net/ Sibellia http://bioinf.spbau.ru/sibelia Parsnp https://github.com/marbl/parsnp Assemblers: SPADES https://github.com/ablab/spades Skesa https://github.com/ncbi/SKESA Velvet https://www.ebi.ac.uk/~zerbino/velvet/ Abyss https://github.com/bcgsc/abyss Edena http://www.genomic.ch/edena.php Ray http://denovoassembler.sourceforge.net/ Long read assemblies HGAP https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-2.0 Flye https://github.com/fenderglass/Flye Canu https://github.com/marbl/canu Ra https://www.biorxiv.org/content/10.1101/656306v1 Unicycler https://github.com/rrwick/Unicycler Read mapping BWA http://bio-bwa.sourceforge.net/ Bowtie2 http://bio-bwa.sourceforge.net/ Minimap2 https://github.com/lh3/minimap2 (BWA better for short reads: https://lh3.github.io/2018/04/02/minimap2-and-the-future-of-bwa) BBtools https://jgi.doe.gov/data-and-tools/bbtools/ BLAST/BLAT: https://genome.ucsc.edu/FAQ/FAQblat.html SMALT https://www.sanger.ac.uk/science/tools/smalt-0 Snippy https://github.com/tseemann/snippy Phenix: https://github.com/phe-bioinformatics/PHEnix Variant callers GATK: https://software.broadinstitute.org/gatk/ VIPR: https://www.viprbrc.org/brc/home.spg?decorator=vipr Varscan2 http://varscan.sourceforge.net/ Workflow managers Bespoke example https://github.com/VertebrateResequencing/vr-codebase Snakemake. https://snakemake.readthedocs.io/en/stable/ Nextflow https://www.nextflow.io/ Galaxy https://usegalaxy.org/ Bpipe https://github.com/ssadedin/bpipe Phylogenetics: Raxml - Raxml-NG https://cme.h-its.org/exelixis/software.html IQTREE http://www.iqtree.org/ FastTree http://www.microbesonline.org/fasttree/ BEAST 1&2 https://www.beast2.org/ RevBayes https://revbayes.github.io/ Metagenomics Taxonomic classification: Megan, Kraken, SIGMA, MIDAS, metaphlan2, mOTUs. Assemblers: MetaSpades, metaflye, MEGAHIT, MetaVelvet, a lot of single isolate assemblers have been tweaked to run on metagenomes. https://github.com/lskatz/Kalamari AMR https://github.com/arpcard/amr_curation https://food-safety-bioinformatics-hackathon.github.io/AMR-protocols/ ABRICATE https://github.com/tseemann/abricate ARIBA https://www.sanger.ac.uk/science/tools/ariba Too many detection tools: https://docs.google.com/spreadsheets/d/18XGWpDiaE249qQKDAL7gdBCka0Z1drpA_s3FElfMJe0/edit#gid=0 Other mentioned resources Mentioned Recent review. Zhang W, Chen J, Yang Y, Tang Y, Shang J, Shen B (2011) A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. PLoS ONE 6(3): e17915. https://doi.org/10.1371/journal.pone.0017915 The Assemblerthon: https://assemblathon.org/ Blog post describing that BWA better for short reads: https://lh3.github.io/2018/04/02/minimap2-and-the-future-of-bwa The science web: https://thescienceweb.wordpress.com/2015/03/23/each-bioinformatician-to-have-their-own-personal-short-read-aligner-by-2016/…
The Micro Binfie poscast is available at SoundCloud (https://soundcloud.com/microbinfie) or you can subscribe via iTunes: https://podcasts.apple.com/au/podcast/microbinfie-podcast/id1479852809 or Spotify: https://podcasters.spotify.com/podcast/2zuzT8EVxbU0yOGFDVareK or your favourite podcast software.…
Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.
Slut dig til verdens bedste podcast-app for at styre dine yndlings shows online og afspille dem offline på vores Android og iOS apps. Det er gratis og nemt!