Skip to Content

Data and metadata organisation

Virtual
Pricing/Discount Options: Call #2
Unique Identifier: baeb819c-05f8-41b1-b758-ae401130955c

Service Description

Overview

This service provides a complete organisation or re-organisation of your datasets as well as guidance in selecting the most appropriate metadata management methodology. We offer data cleaning (e.g., renaming, annotation, formatting, and duplicate removal), quality control, and anonymization tailored to SME needs. This service aims to provide assistance based on a dataset management plan. We follow community data and metadata standards in place and offer curated, up-to-date guidance based on FAIR principles (Findable, Accessible, Interoperable and Reusable) whenever possible. We have expertise in data and metadata standards in many life science fields, including, more specifically, imaging (PET, MRI, MEG, EEG) and structural data, genomics, metabolomics, and proteomics data. Expertise in legal and ethical aspects of dataset management can also be provided.

We provide expertise and technical support in the following areas:

  • Project design & management
  • Legal & ethical support
  • Data cleaning, annotation, anonymisation

How can the service help you?

This service helps SMEs improve the organization and usability of their datasets by providing data cleaning, quality control, and anonymization tailored to their needs. With expert guidance on metadata management and adherence to FAIR principles, it ensures your data is reliable, compliant, and ready for effective use, building trust and confidence in your data-driven initiatives.

How the service will be delivered?

The service will be delivered according to established ethical agreements and guidelines, in collaboration with the SME and researchers from the Swedish TEF-Health node. Data processing is facilitated through a secure virtual environment managed by Karolinska Institutet, ensuring the highest standards of data protection.


Additional information

Provider description

The Swedish TEF-Health node is a collaboration between Karolinska Institutet, SciLifeLab and RISE, and is led by Karolinska Institutet. Together, we offer world-leading services with our unique collection of core facilities. We can grant services in expert consulting, virtual- and physical testing in the range of in vivo imaging, ex vivo OMICS, pharmaceutical development, simulated healthcare environments, AI-system validation and development, advanced data analysis and other data-driven life science.

Technical description

The service restructures and processes datasets within a secure computing environment utilizing advanced data curation and metadata management frameworks. Data preprocessing operations, including nomenclature standardization, semantic annotation and format normalization are executed in compliance with domain-specific ontologies and if possible FAIR principles. Quality assurance protocols are implemented to validate data integrity and compliance. Metadata schemas are optimized for interoperability and reuse, leveraging established standards such as RDF and JSON-LD.

Service customization

The service can be customized according to your specific needs. It may be required to combine this service with other services on offer.


Use case example

Context

A life science SME specializing in metabolomics has developed a novel pipeline for identifying biomarkers in rare metabolic disorders. They have generated extensive LC-MS/MS datasets from patient samples across multiple studies. However, the datasets are stored in a mix of proprietary formats, lack harmonized metadata and do not meet the submission requirements of repositories like MetaboLights. This situation delays the publication of their findings, limiting their visibility and ability to secure collaborations or funding for further pipeline validation.

Objective

To clean, harmonize and annotate the SME’s LC-MS/MS datasets according to MetaboLights requirements. Ensure compliance with FAIR principles to facilitate immediate repository submission and support future scalability.

Solution

The SME engages with the Swedish TEF-Health node to reorganize and optimize their metabolomics datasets for repository submission. Experts provide tailored data cleaning, format conversion, and metadata harmonization, ensuring compatibility with repository standards and enabling wider reuse of the data.

Implementation

Ethical Agreement

The SME enters into an ethical agreement with researchers from the Swedish TEF-Health node, ensuring all data collection and usage complies with GDPR and national Swedish regulations.

Secure Access

Usage of the collected data is facilitated through a secure virtual environment managed by Karolinska Institutet, ensuring the highest standards of data protection.

Data Processing and Format Conversion

The SME’s raw datasets, stored in various proprietary formats, are converted into open formats such as mzML and mzTab, which are compatible with repository requirements. During this process, quality control measures, including noise filtering and peak annotation, are applied to improve data integrity and reliability.

Metadata Harmonization

Metadata schemas based on MetaboLights standards are created. These schemas incorporate essential details about study design, sample preparation and instrument parameters. Ontology based annotation is applied to harmonize metadata across all datasets, ensuring consistency and compliance with repository guidelines.

Benefits

  • FAIR Data Management: Enhances SMEs’ ability to manage and share clinical trial data effectively while ensuring interoperability and quality.
  • Stakeholder Credibility: A well-structured data management plan builds trust with regulators, funders, and collaborators.
  • Regulatory Compliance: Ensures adherence to ethical and legal standards, reducing data handling risks.
  • Data Sustainability: Supports long-term usability and scalability, enabling future research opportunities.

Impact

The SME’s biomarker discovery pipeline gains recognition as a reliable and validated tool in the rare disease research community. Their repository-submitted data fosters new collaborations with academic researchers and industry stakeholders, accelerating the translation of their findings into clinical applications.

Offerings: Data Processing (curation, preprocessing & standardization, mining, visualization, anonymization, FAIR, etc.)
Provider Logo Service Logo

Provider & Contact

Provider Organisation Karolinska Institutet (KI)
Provider Country Sweden
Published Email tef-health@ki.se

Pricing is available to registered users. SMEs receive significant state-aid reductions (GBER) — or, depending on the call, free services during the funded project. Sign in or register to see the price for your organisation.

Operational Details

Service Inputs Data set & detailed requirements
Service Outputs Well organized and annotated data set
Dependencies & Restrictions Depending on needs of the SME the following may be relevant: Ethics vote, GDPR restrictions, other regulations and laws