Approaches towards insights on pathogen evolution

Key Points

  • Microbial genomics leverages high-throughput sequencing techniques to study genomic sequences of microorganisms.
  • Ability to provide comprehensive insights into pathogen identity, diversity, and evolution is critical in accelerating vaccine development for human and animal health.
  • Bioinformatics workflows to implement approaches such as phylogenetic analysis and taxonomic classification are key to generating valuable insights.


Background: Importance of microbial genomics

Microbial genomics is a high-throughput OMICS-based technique that entails the study of the genomic sequences of microorganisms. Research in microbial genomics has provided us with many insights on microbiome functioning, and refinements to perturbation methods to ultimately improving both human and animal health.1

The ability to provide comprehensive insights on pathogen identity, diversity, and evolution is invaluable in modern infectious disease research, and has profound implications for the development of vaccines.

Challenges in analyzing microbial genomics data for vaccine development

Due to the several benefits of OMICS-based technologies, analysis and interpretation of omics data for understanding the spread and evolution of pathogens is increasingly important for vaccine development. As more and more variety of omics data is being generated, a number of tools have been developed to process the data from quality checking to downstream analysis. This brings both experimental as well as computational challenges in acquisition and analysis of data2. Thus, we urgently need bioinformatics workflows that leverage the existing tools and are tailored to business requirements.

Approaches to generate insights on pathogen evolution:

Here, we summarize a few approaches to analyze the pathogen evolution from microbial genomics data and highlight our experience and expertise in implementing them in workflows.

  • Phylogenetic analysis:
    • This technique is used for studying evolutionary relatedness among various groups of organisms. A traditional workflow would involve multiple sequence alignment tool such as MAFFT, MUSCLE, ClustalW followed by phylogenetic tree building tool such as PHYLIP, FastTree.
    • Recently we leveraged NextStrain3, an open-source tool for pathogen evolution, to develop a custom solution for a large pharma customer. The solution helped our customer to explore and visualize publicly available data alongside company-internal data. It was deployed in companyโ€™s environment so that the business users can freely and securely utilize it. Figure 1 shows a visualization of publicly available SARS-CoV-2 data on Nextstrainโ€™s dashboard.

Figure 1: NextStrainโ€™s dashboard based on publicly available SARS-COV data (Adapted from https://nextstrain.org/ncov/global)

  • Taxonomic classification:
    • One approach to analyze novel pathogens is to perform assignments of sequences to taxonomic groups based on sequence similarity. Tools such as Kraken2 work by leveraging exact k-mer matches between input sequence and a database containing reference sequences with taxonomic information.
    • We implemented a previously published workflow4 to automate the analysis of multiple runs containing genomic sequences of animal samples from raw reads to interactive visualization for classification of novel pathogens. The workflow filters high quality reads and efficiently reports confidence scores for the classification results. With this workflow, our customer could generate insights on novel pathogens at a much higher pace than before, and could accelerate the vaccine development by focusing on pathogens-of-interest in the downstream validation processes. An exemplary report is provided in figure 2, which depicts the classification of sequences found in a human gut sample.

Figure 2: Exemplary interactive visualization performed on microbial genomics data (Source: Interactive visualization of taxonomic classification in Krona)

Our expertise and learnings to boost vaccine development initiatives

Figure 3: Our expertise and learnings to boost vaccine development initiatives

How can OSTHUS support you in your vaccine development initiatives?

  1. We have Bioinformatics domain knowledge to interpret existing datasets and provide support on its relevancy and better impact based on literature.
  2. We provide technology agnostic scientific advisory and data management solutions from vision and strategy, to market analysis, to implementation.
  3. We develop and/or automate customized workflows that not only fit into existing tool ecosystem but also scale to meet the dynamic data processing demands in terms of compute, storage, performance etc.
  4. We can spearhead the development of standardized workflows and best practices for reuse in order to avoid siloed solutions within the company.

Contact us to get the details on how we are helping our customers in accelerating their vaccine development programs.

References:


Disclaimer

The contents of this blog are solely the opinion of the author and do not represent the opinions of PharmaLex GmbH or its parent Cencora Inc. PharmaLex and Cencora strongly encourage readers to review the references provided with this blog and all available information related to the topics mentioned herein and to rely on their own experience and expertise in making decisions related thereto.

OMICs technologies uncovered several unknown aspects of central dogma

Several scientific and technological advances uncovered new knowledge about each step in central dogma โ€“ unidirectional flow of information from DNA to RNA to Proteins and in many cases from RNA to Proteins. Simultaneously, these advances discovered epigenetic regulation of central dogma and the importance of probing other biomolecules such as lipids and metabolites.

Due to rapidly evolving technologies and the reducing cost of generating OMICs data โ€“ quantitative high throughput data on biomolecules, we require specific tools to analyze it rapidly. Data analysis is a key factor in R&D processes due to the increasing resolution of these measurements in spatio-temporal dimensions from organism to tissues and even to individual cells. This is also evident from the large investments OMICs data attracts across the R&D industry and academia.1 In Jan 2023, EU launched a joint program worth 16.5 million euros for large scale analysis of OMICs data for drug-target finding in neurodegenerative diseases alone.

Fig 1: OMICs technologies centered around the central dogma in Biology

Reproducibility and Ease of Operations are major challenges for OMICs Data Analysis

Scientific workflows or pipelines (Figure 2) โ€“ series of software tools working in a stepwise manner one after the other โ€“ are important to rapidly analyze and interpret vast amounts of data generated using various OMICs techniques. For example, RNA-seq data analysis (Transcriptomics) involves trimming, aligning, quantification, normalization and differential gene expression analysis where output of one tool serves as input for the next tool in the workflow. Permutations of these tools can lead to over 150 sequential workflows or pipelines, so reproducing and comparing their results can be challenging2.

Below are some examples of pipelines and software tools to perform different analysis steps in different OMICs experiments.

Integrative frameworks can help facilitate the execution of these pipelines. Typical no/low code frameworks for non-programmers are Galaxy, Unipro UGENE NGS and MIGNON. For developers, suitable analysis frameworks include Snakemake, Nextflow and Bpipe. Community driven platforms like nf-core provide peer-reviewed best practice analysis pipelines written in Nextflow.

One way to ensure efficient/streamlined R&D is to establish and follow standard practices for OMICs data analysis. Figure 2 shows gold standard software tools for performing various intermediate steps in different OMICs data analysis pipelines. Standardized OMICs practices โ€“ tools and frameworks โ€“ will facilitate accessibility (A in FAIR) and reproducibility of high quality results. It will also enhance their business value (please also refer to the following blog post: Multi-Omics Data Integration in Drug Discovery). We would like to point to the analysis pipelines for major OMICs assay types developed by the ENCODE Data Coordinating Center (DCC).

Although uniformity is valuable in OMICs data analyses, customization is also equally valuable for specific scientific contexts. Different OMICs experiments require different handling of the data and analyses. For example, high variation in signal-to-noise ratio in ChIP-seq experiments to identify transcription factor (TF) binding sites necessitates a wide range of quality thresholds. RNA-seq data analyses are driven by factors such as read size, polyadenylation status, strandedness and require different parameters or settings. Hence, there are multiple โ€œgenericโ€ factors that can be standardized, while individual parameters and settings can be customized to suit specific scientific questions.

Factors to consider while standardizing OMICs data analysis workflows:

  • Knowledge and understanding of central dogma and design of various high throughput experiments
  • Standard data files and formats (e.g. using common reference genome across labs/departments where ever possible)
  • Domain specific language such as Workflow Description Language (WDL) and/or Common Workflow Language (CWL) to enhance Interoperability (I of the FAIR principles)
  • Flexible frameworks that can run locally (e.g. on HPC) and in cloud (e.g. AWS, Azure, Google)
  • Common framework for testing the workflows
  • Wrappers for automated file handling (import and export of data, parameters)
  • User friendly interfaces for interactive usage
  • Platform-agnostic installation of frameworks, packages, libraries
  • Common package managers like Conda
  • Intuitive and interactive visualization of workflows, their progress and their results
  • Metadata tracking
  • Common and portable sharing mechanism for pipelines (e.g. Docker images), data and results

Uniform OMICs Data Analysis Workflows to Empower your Future:

Relatively well-established OMICs data generation techniques demand standardized ways of data analysis while allowing customization necessary for a specific scientific context. To accelerate discovery and actionable data-driven decisions within your organization, take a step closer to FAIR OMICs data by establishing gold standard workflows and frameworks for OMICs data analyses.

How Can OSTHUS Help?

  • Scientific advise (in experimental design)
  • Tool selection for your specific scientific question
  • Developing an automated workflow using different workflow management frameworks
  • Customizing an existing workflow by developing scripts for individual steps
  • Visualizing analysis results using available business intelligence tools and/or developing bespoke interfaces suitable to answer specific questions

Contact us to know more about how we are instrumental in our customersโ€™ goal

to derive scientific insights from their OMICs data.


References:

  1. https://biotechfinance.org/q2/
  2. Corchete LA et al, Scientific Reports, 2020

Disclaimer

The contents of this blog are solely the opinion of the author and do not represent the opinions of PharmaLex GmbH or its parent Cencora Inc. PharmaLex and Cencora strongly encourage readers to review the references provided with this blog and all available information related to the topics mentioned herein and to rely on their own experience and expertise in making decisions related thereto.

Unlocking Precision Medicine: Omics Data Challenges & Solutions

Table of Content:

  • Key points
  • Keywords
  • Background
  • Business value of integrated omics data
  • We at OSTHUS have done it before: Offerings

Key points:

  • Omics technologies is a fast-paced field that produces large amount of data and an effective data management brings opportunities as well as challenges1.
  • Having deeper omics data analysis capabilities and Bioinformatics expertise are becoming central to drug development.
  • Multi-omics data strategies and a long-term vision of enterprise-wide needs is the key to realizing the full potential of its business value.

Background:

The first complete gapless human reference genome was published in 2022 (draft genome that was published in 2003 was incomplete) by Telomere-2-Telomere consortium โ€“ discovering 200 Million more base pairs and 1956 new gene predictions in the process2. It unlocked further potential for functional studies to find new therapeutic targets.

Multi-omics (also called Panomics or integrative omics) is the integration of omics data sets arising from the subfields such as genomics, transcriptomics, proteomics, metabolomics; aimed at increasing our understanding of biological systems3. As the pharmaceutical industry is increasingly embracing the era of precision medicine, the fast-paced omics technologies are becoming the significant driver in this transformation journey. However, as per our experience in the field, gaps remain with respect to data integration, data harmonization, design considerations, and data management strategies for realizing the full potential of omics data.

Genomic databases like GenBank and the Sequence Read Archive (SRA) collectively hold 100+ petabytes of data and are predicted to exceed 2.5 exabytes by 2025. Collecting, integrating, and systematically analyzing heterogeneous big data with distinct characteristics are a challenging task that may lead to data mismanagement. For instance, DNA sequencing data often comes from various platforms like Illumina, Pacific Biosciences, and Oxford Nanopore, each producing data with unique quality thresholds and file types. One specific issue involves the use of multiple identifiers. A protein can have several identifiers depending on the database used, such as UniProt, PDB, or internal source systems. Discrepancies in mapping these identifiers may lead to confusion or misinterpretation of results arising from multiple systems, hindering the downstream data analysis.

Our understanding of the business values of multi-omics data (holistic view):

  • Streamline R&D processes: The use of omics data can streamline R&D processes. For example, it can help in identifying biomarkers with high confidence that support the predictive models of disease progression.
  • Accelerate drug discovery and development: Omics data, when integrated, can provide in-depth molecular insights that can help businesses save time and resources in drug discovery research and predicting drug efficacy and safety at a quicker pace.
  • Gain deeper insights: Integrated omics data allows for a more detailed understanding of individual genetic and molecular profiles, driving personalized healthcare solutions.
  • Cost Reduction: Efficient data strategies enabled by streamlined storage, processing, and analysis of omics data, reduce the costs associated with these processes.

Figure 1: Illustration of our understanding and approaches for leveraging multi-omics data

Our exemplary approaches and considerations to certain challenges in omics data management:

Figure 2: OSTHUSโ€™ exemplary approaches and considerations to certain challenges in omics data management

Leveraging Bioinformatics Expertise for Optimizing Omics Data Management

How can OSTHUS help?

Figure 3: OSTHUS consulting approach from vision to implementation

In our recent project, a pharmaceutical company was struggling with efficiently managing their genomic and protein sequence data. We are implementing a bespoke cloud-based centralized data lake solution that not only consolidates different data and metadata but also offers an intuitive user-interface that provides ability to quickly extract insights from similar sequences in in-house as well as publicly available resources.

OSTHUS offers end-to-end services from vision and strategy, to market analysis, to implementation. Recognizing that one size doesn't fit all, we offer technology agnostic consulting. Our Bioinformatics experts understand the available technologies and their strengths as well as weaknesses, from open-source solutions to commercial offerings, which allows us to recommend and implement the best-fit solution that caters to specific objectives.

Conclusion:

To realize the full potential of these approaches and transform raw omics data into meaningful insights, a strategic and robust data strategy is critical.

With strategic planning and expert guidance, these challenges can be effectively managed, unlocking the immense potential of integrated omics data for accelerated drug development.

Contact us today to revolutionize your bioinformatics journey and empower data-driven decision-making in your drug development efforts.

References:

  1. Omics data science โ€“ an interdisciplinary solution
  2. The complete sequence of a human genome
  3. Integrated Omics: Tools, Advances, and Future Approaches
  4. Undisclosed, unmet and neglected challenges in multi-omics studies

Disclaimer:

OSTHUS GmbH is a subsidiary of AmerisourceBergen Corporation. OSTHUS GmbH and AmerisourceBergen strongly encourages readers to review all available information about the topics contained in this blog and to rely on their own experience and expertise in making decisions related thereto.

If you work within pharmaceutical industry, chances are you are familiar with IDMP (Identification of Medicinal Products) - a set of ISO standards for unique identification of medicinal products. With IDMP becoming a global regulatory requirement, having IDMP-compliant data assets, systems and business processes is becoming essential for leading pharma companies.

However, diverging implementations of IDMP have led the industry into forming a collaborative industry initiative under the wing of Pistoia Alliance, to define the IDMP Ontology, a common semantically concise and industry-oriented implementation of IDMP ISO standards.

IDMP Ontology is a great opportunity for big-pharma to have a holistic cross-function view on the product data assets, while still achieving regulatory compliance. But, where do you start with adopting IDMP Ontology in your organization? Here are some straightforward steps to get started.

Assessment & Concept

Step 1: Determine the scope and key stakeholders

Identify the scope of your implementation: Determine which business functions of your company are affected by IDMP implementation. This could include regulatory affairs, pharmacovigilance, supply chain management, IT and other departments. In addition, it is important to have key stakeholders onboard right from the start.

Step 2: Prioritize Use Case

IDMP domains cover the product lifecycle throughout the complete pharma value chain. Prioritized use cases should help keep the implementation focused and make sure demonstrable results are available early on. Ideally, they should be broken down into concrete competency questions such as โ€œIn which manufacturing steps is substance <S> used?โ€.

Step 3: Capability Assessment

Assess your current data assets, IT systems, business processes and organization to identify gaps and areas that need improvement. Evaluate IDMP maturity level at your company and take a holistic view on all existing digitalization programs. Frameworks such as DCAM can be very helpful to conduct capability assessment in a structured and complete way.

Step 4: Implementation Concept and Plan

Out of the Capability Assessment results, outline a TO-BE state that should address found gaps. Depending on the maturity level of your organization, this step may include refinement of technical and business architecture and even vendor selection. Lastly, outline a plan for your IDMP implementation journey including important milestones, responsibilities and budget.

Implementation & Governance

Step 5: Proof of Concept: IDMP Ontology Alignment

Once the relevant data sources are identified for the prioritized competency questions (e.g. RIM, MDM, Substance registryโ€ฆ), internal terminology can be aligned to the IDMP Ontology. By integrating data into IDMP Knowledge Graph, results of IDMP aligned data can quickly be visible.

Step 6: Implement the solution

While PoC implementation (Step 5) is important to relatively quickly demonstrate value and convince stakeholders, a toolset to facilitate IDMP data assets, which is scalable, easy-to-use, cost-effective and compatible with your existing systems still needs to be stablished. Work with your vendor or implementation partner to configure and implement a fit-for-purpose IDMP solution.

Step 7: Establish Data Governance around IDMP

In order to maintain semantically correct and linked data assets, it is crucial to ensure tools and processes for proper metadata, reference and master data management. Establish roles and responsibilities, define data domains ownership, and provide training for your staff on the new systems and processes.

Step 8: Expand IDMP Knowledge Graph

Established IT infrastructure and governance processes for IDMP data assets allow for further expansion and enrichment of the IDMP Knowledge Graph to further data sources and functional business areas. Continuously onboard further use cases, stakeholders and systems, to leverage the value of linked IDMP data beyond mere compliance.

It is important to keep in mind that IDMP implementation can be a long-term project that requires ongoing commitment and investment, as well as alignment with ongoing digitalization initiatives. Therefore, it is important to involve key stakeholders and subject matter experts in the planning and implementation process, but also to work with a trusted and experienced implementation partner to ensure a successful outcome.

Contact our experts and check out how OSTHUS can support your IDMP Ontology adoption journey!

Register to our upcoming webinar.

Our merger with PharmaLex brings exciting opportunities to expand our global capabilities and build on our commitment to innovation through enhanced tech-enablement and operational excellence and efficiency.

Together weโ€™re making a big difference to life sciences companies by connecting business know-how with IT expertise and integration architecture best practices to drive digital transformation.

Through our trusted scientific advisors, OSTHUS Services strengthens the PharmaLex Data and Information Technology Strategy, Data Governance and Advanced Analytics portfolio.

With our focus on streamlining the overall development process, as well as our shared cultural values and extensive global subject matter expertise, we will deliver even greater benefits to our customers .

To provide our customers with the best service possible, OSTHUS expands its partner network with Alation, an industry leader in metadata management solutions. By combining our domain knowledge with the expertise of Alation, we will enable our customers to manage their data as an asset and maximize its value.

Alation offers customers a platform for a broad range of data intelligence solutions including data search & discovery, data governance, data stewardship, analytics, and digital transformation.

OSTHUS Subject Matter Experts (SMEs) with their deep domain knowledge will deliver technical training empowering our customers to better access and understand their data as well as improve their cataloging and metadata capabilities.

Demo Title


Phase2-OSTHUS-Pop up banner

This will close in 0 seconds

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram