Microbial genomics is a high-throughput OMICS-based technique that entails the study of the genomic sequences of microorganisms. Research in microbial genomics has provided us with many insights on microbiome functioning, and refinements to perturbation methods to ultimately improving both human and animal health.1
The ability to provide comprehensive insights on pathogen identity, diversity, and evolution is invaluable in modern infectious disease research, and has profound implications for the development of vaccines.
Due to the several benefits of OMICS-based technologies, analysis and interpretation of omics data for understanding the spread and evolution of pathogens is increasingly important for vaccine development. As more and more variety of omics data is being generated, a number of tools have been developed to process the data from quality checking to downstream analysis. This brings both experimental as well as computational challenges in acquisition and analysis of data2. Thus, we urgently need bioinformatics workflows that leverage the existing tools and are tailored to business requirements.
Here, we summarize a few approaches to analyze the pathogen evolution from microbial genomics data and highlight our experience and expertise in implementing them in workflows.
Figure 1: NextStrainโs dashboard based on publicly available SARS-COV data (Adapted from https://nextstrain.org/ncov/global)
Figure 2: Exemplary interactive visualization performed on microbial genomics data (Source: Interactive visualization of taxonomic classification in Krona)
Figure 3: Our expertise and learnings to boost vaccine development initiatives
How can OSTHUS support you in your vaccine development initiatives?
Contact us to get the details on how we are helping our customers in accelerating their vaccine development programs.
Disclaimer
The contents of this blog are solely the opinion of the author and do not represent the opinions of PharmaLex GmbH or its parent Cencora Inc. PharmaLex and Cencora strongly encourage readers to review the references provided with this blog and all available information related to the topics mentioned herein and to rely on their own experience and expertise in making decisions related thereto.
Several scientific and technological advances uncovered new knowledge about each step in central dogma โ unidirectional flow of information from DNA to RNA to Proteins and in many cases from RNA to Proteins. Simultaneously, these advances discovered epigenetic regulation of central dogma and the importance of probing other biomolecules such as lipids and metabolites.
Due to rapidly evolving technologies and the reducing cost of generating OMICs data โ quantitative high throughput data on biomolecules, we require specific tools to analyze it rapidly. Data analysis is a key factor in R&D processes due to the increasing resolution of these measurements in spatio-temporal dimensions from organism to tissues and even to individual cells. This is also evident from the large investments OMICs data attracts across the R&D industry and academia.1 In Jan 2023, EU launched a joint program worth 16.5 million euros for large scale analysis of OMICs data for drug-target finding in neurodegenerative diseases alone.
Scientific workflows or pipelines (Figure 2) โ series of software tools working in a stepwise manner one after the other โ are important to rapidly analyze and interpret vast amounts of data generated using various OMICs techniques. For example, RNA-seq data analysis (Transcriptomics) involves trimming, aligning, quantification, normalization and differential gene expression analysis where output of one tool serves as input for the next tool in the workflow. Permutations of these tools can lead to over 150 sequential workflows or pipelines, so reproducing and comparing their results can be challenging2.
Below are some examples of pipelines and software tools to perform different analysis steps in different OMICs experiments.
Integrative frameworks can help facilitate the execution of these pipelines. Typical no/low code frameworks for non-programmers are Galaxy, Unipro UGENE NGS and MIGNON. For developers, suitable analysis frameworks include Snakemake, Nextflow and Bpipe. Community driven platforms like nf-core provide peer-reviewed best practice analysis pipelines written in Nextflow.
One way to ensure efficient/streamlined R&D is to establish and follow standard practices for OMICs data analysis. Figure 2 shows gold standard software tools for performing various intermediate steps in different OMICs data analysis pipelines. Standardized OMICs practices โ tools and frameworks โ will facilitate accessibility (A in FAIR) and reproducibility of high quality results. It will also enhance their business value (please also refer to the following blog post: Multi-Omics Data Integration in Drug Discovery). We would like to point to the analysis pipelines for major OMICs assay types developed by the ENCODE Data Coordinating Center (DCC).
Although uniformity is valuable in OMICs data analyses, customization is also equally valuable for specific scientific contexts. Different OMICs experiments require different handling of the data and analyses. For example, high variation in signal-to-noise ratio in ChIP-seq experiments to identify transcription factor (TF) binding sites necessitates a wide range of quality thresholds. RNA-seq data analyses are driven by factors such as read size, polyadenylation status, strandedness and require different parameters or settings. Hence, there are multiple โgenericโ factors that can be standardized, while individual parameters and settings can be customized to suit specific scientific questions.
Relatively well-established OMICs data generation techniques demand standardized ways of data analysis while allowing customization necessary for a specific scientific context. To accelerate discovery and actionable data-driven decisions within your organization, take a step closer to FAIR OMICs data by establishing gold standard workflows and frameworks for OMICs data analyses.
Contact us to know more about how we are instrumental in our customersโ goal
to derive scientific insights from their OMICs data.
Disclaimer
The contents of this blog are solely the opinion of the author and do not represent the opinions of PharmaLex GmbH or its parent Cencora Inc. PharmaLex and Cencora strongly encourage readers to review the references provided with this blog and all available information related to the topics mentioned herein and to rely on their own experience and expertise in making decisions related thereto.
The first complete gapless human reference genome was published in 2022 (draft genome that was published in 2003 was incomplete) by Telomere-2-Telomere consortium โ discovering 200 Million more base pairs and 1956 new gene predictions in the process2. It unlocked further potential for functional studies to find new therapeutic targets.
Multi-omics (also called Panomics or integrative omics) is the integration of omics data sets arising from the subfields such as genomics, transcriptomics, proteomics, metabolomics; aimed at increasing our understanding of biological systems3. As the pharmaceutical industry is increasingly embracing the era of precision medicine, the fast-paced omics technologies are becoming the significant driver in this transformation journey. However, as per our experience in the field, gaps remain with respect to data integration, data harmonization, design considerations, and data management strategies for realizing the full potential of omics data.
Genomic databases like GenBank and the Sequence Read Archive (SRA) collectively hold 100+ petabytes of data and are predicted to exceed 2.5 exabytes by 2025. Collecting, integrating, and systematically analyzing heterogeneous big data with distinct characteristics are a challenging task that may lead to data mismanagement. For instance, DNA sequencing data often comes from various platforms like Illumina, Pacific Biosciences, and Oxford Nanopore, each producing data with unique quality thresholds and file types. One specific issue involves the use of multiple identifiers. A protein can have several identifiers depending on the database used, such as UniProt, PDB, or internal source systems. Discrepancies in mapping these identifiers may lead to confusion or misinterpretation of results arising from multiple systems, hindering the downstream data analysis.
Figure 1: Illustration of our understanding and approaches for leveraging multi-omics data
Figure 2: OSTHUSโ exemplary approaches and considerations to certain challenges in omics data management
How can OSTHUS help?
Figure 3: OSTHUS consulting approach from vision to implementation
In our recent project, a pharmaceutical company was struggling with efficiently managing their genomic and protein sequence data. We are implementing a bespoke cloud-based centralized data lake solution that not only consolidates different data and metadata but also offers an intuitive user-interface that provides ability to quickly extract insights from similar sequences in in-house as well as publicly available resources.
OSTHUS offers end-to-end services from vision and strategy, to market analysis, to implementation. Recognizing that one size doesn't fit all, we offer technology agnostic consulting. Our Bioinformatics experts understand the available technologies and their strengths as well as weaknesses, from open-source solutions to commercial offerings, which allows us to recommend and implement the best-fit solution that caters to specific objectives.
To realize the full potential of these approaches and transform raw omics data into meaningful insights, a strategic and robust data strategy is critical.
With strategic planning and expert guidance, these challenges can be effectively managed, unlocking the immense potential of integrated omics data for accelerated drug development.
Contact us today to revolutionize your bioinformatics journey and empower data-driven decision-making in your drug development efforts.
OSTHUS GmbH is a subsidiary of AmerisourceBergen Corporation. OSTHUS GmbH and AmerisourceBergen strongly encourages readers to review all available information about the topics contained in this blog and to rely on their own experience and expertise in making decisions related thereto.
If you work within pharmaceutical industry, chances are you are familiar with IDMP (Identification of Medicinal Products) - a set of ISO standards for unique identification of medicinal products. With IDMP becoming a global regulatory requirement, having IDMP-compliant data assets, systems and business processes is becoming essential for leading pharma companies.
However, diverging implementations of IDMP have led the industry into forming a collaborative industry initiative under the wing of Pistoia Alliance, to define the IDMP Ontology, a common semantically concise and industry-oriented implementation of IDMP ISO standards.
IDMP Ontology is a great opportunity for big-pharma to have a holistic cross-function view on the product data assets, while still achieving regulatory compliance. But, where do you start with adopting IDMP Ontology in your organization? Here are some straightforward steps to get started.
Step 1: Determine the scope and key stakeholders
Identify the scope of your implementation: Determine which business functions of your company are affected by IDMP implementation. This could include regulatory affairs, pharmacovigilance, supply chain management, IT and other departments. In addition, it is important to have key stakeholders onboard right from the start.
Step 2: Prioritize Use Case
IDMP domains cover the product lifecycle throughout the complete pharma value chain. Prioritized use cases should help keep the implementation focused and make sure demonstrable results are available early on. Ideally, they should be broken down into concrete competency questions such as โIn which manufacturing steps is substance <S> used?โ.
Step 3: Capability Assessment
Assess your current data assets, IT systems, business processes and organization to identify gaps and areas that need improvement. Evaluate IDMP maturity level at your company and take a holistic view on all existing digitalization programs. Frameworks such as DCAM can be very helpful to conduct capability assessment in a structured and complete way.
Step 4: Implementation Concept and Plan
Out of the Capability Assessment results, outline a TO-BE state that should address found gaps. Depending on the maturity level of your organization, this step may include refinement of technical and business architecture and even vendor selection. Lastly, outline a plan for your IDMP implementation journey including important milestones, responsibilities and budget.
Step 5: Proof of Concept: IDMP Ontology Alignment
Once the relevant data sources are identified for the prioritized competency questions (e.g. RIM, MDM, Substance registryโฆ), internal terminology can be aligned to the IDMP Ontology. By integrating data into IDMP Knowledge Graph, results of IDMP aligned data can quickly be visible.
Step 6: Implement the solution
While PoC implementation (Step 5) is important to relatively quickly demonstrate value and convince stakeholders, a toolset to facilitate IDMP data assets, which is scalable, easy-to-use, cost-effective and compatible with your existing systems still needs to be stablished. Work with your vendor or implementation partner to configure and implement a fit-for-purpose IDMP solution.
Step 7: Establish Data Governance around IDMP
In order to maintain semantically correct and linked data assets, it is crucial to ensure tools and processes for proper metadata, reference and master data management. Establish roles and responsibilities, define data domains ownership, and provide training for your staff on the new systems and processes.
Step 8: Expand IDMP Knowledge Graph
Established IT infrastructure and governance processes for IDMP data assets allow for further expansion and enrichment of the IDMP Knowledge Graph to further data sources and functional business areas. Continuously onboard further use cases, stakeholders and systems, to leverage the value of linked IDMP data beyond mere compliance.
It is important to keep in mind that IDMP implementation can be a long-term project that requires ongoing commitment and investment, as well as alignment with ongoing digitalization initiatives. Therefore, it is important to involve key stakeholders and subject matter experts in the planning and implementation process, but also to work with a trusted and experienced implementation partner to ensure a successful outcome.
Contact our experts and check out how OSTHUS can support your IDMP Ontology adoption journey!
Register to our upcoming webinar.
Our merger with PharmaLex brings exciting opportunities to expand our global capabilities and build on our commitment to innovation through enhanced tech-enablement and operational excellence and efficiency.
Together weโre making a big difference to life sciences companies by connecting business know-how with IT expertise and integration architecture best practices to drive digital transformation.
Through our trusted scientific advisors, OSTHUS Services strengthens the PharmaLex Data and Information Technology Strategy, Data Governance and Advanced Analytics portfolio.
With our focus on streamlining the overall development process, as well as our shared cultural values and extensive global subject matter expertise, we will deliver even greater benefits to our customers .
To provide our customers with the best service possible, OSTHUS expands its partner network with Alation, an industry leader in metadata management solutions. By combining our domain knowledge with the expertise of Alation, we will enable our customers to manage their data as an asset and maximize its value.
Alation offers customers a platform for a broad range of data intelligence solutions including data search & discovery, data governance, data stewardship, analytics, and digital transformation.
OSTHUS Subject Matter Experts (SMEs) with their deep domain knowledge will deliver technical training empowering our customers to better access and understand their data as well as improve their cataloging and metadata capabilities.
This will close in 0 seconds