Data and metadata organisation
Service Description
Overview
This service provides a complete organisation or re-organisation of your datasets as well as guidance in selecting the most appropriate metadata management methodology. We offer data cleaning (e.g., renaming, annotation, formatting, and duplicate removal), quality control, and anonymization tailored to SME needs. This service aims to provide assistance based on a dataset management plan. We follow community data and metadata standards in place and offer curated, up-to-date guidance based on FAIR principles (Findable, Accessible, Interoperable and Reusable) whenever possible. We have expertise in data and metadata standards in many life science fields, including, more specifically, imaging (PET, MRI, MEG, EEG) and structural data, genomics, metabolomics, and proteomics data. Expertise in legal and ethical aspects of dataset management can also be provided.
We provide expertise and technical support in the following areas:
- Project design & management
- Legal & ethical support
- Data cleaning, annotation, anonymisation
How can the service help you?
This service helps SMEs improve the organization and usability of their datasets by providing data cleaning, quality control, and anonymization tailored to their needs. With expert guidance on metadata management and adherence to FAIR principles, it ensures your data is reliable, compliant, and ready for effective use, building trust and confidence in your data-driven initiatives.
How the service will be delivered?
The service will be delivered according to established ethical agreements and guidelines, in collaboration with the SME and researchers from the Swedish TEF-Health node. Data processing is facilitated through a secure virtual environment managed by Karolinska Institutet, ensuring the highest standards of data protection.
Additional information
Provider description
The Swedish TEF-Health node is a collaboration between Karolinska Institutet, SciLifeLab and RISE, and is led by Karolinska Institutet. Together, we offer world-leading services with our unique collection of core facilities. We can grant services in expert consulting, virtual- and physical testing in the range of in vivo imaging, ex vivo OMICS, pharmaceutical development, simulated healthcare environments, AI-system validation and development, advanced data analysis and other data-driven life science.
Technical description
The service restructures and processes datasets within a secure computing environment utilizing advanced data curation and metadata management frameworks. Data preprocessing operations, including nomenclature standardization, semantic annotation and format normalization are executed in compliance with domain-specific ontologies and if possible FAIR principles. Quality assurance protocols are implemented to validate data integrity and compliance. Metadata schemas are optimized for interoperability and reuse, leveraging established standards such as RDF and JSON-LD.
Service customization
The service can be customized according to your specific needs. It may be required to combine this service with other services on offer.
Use case example
Context
A life science SME specializing in metabolomics has developed a novel pipeline for identifying biomarkers in rare metabolic disorders. They have generated extensive LC-MS/MS datasets from patient samples across multiple studies. However, the datasets are stored in a mix of proprietary formats, lack harmonized metadata and do not meet the submission requirements of repositories like MetaboLights. This situation delays the publication of their findings, limiting their visibility and ability to secure collaborations or funding for further pipeline validation.
Objective
To clean, harmonize and annotate the SME’s LC-MS/MS datasets according to MetaboLights requirements. Ensure compliance with FAIR principles to facilitate immediate repository submission and support future scalability.
Solution
The SME engages with the Swedish TEF-Health node to reorganize and optimize their metabolomics datasets for repository submission. Experts provide tailored data cleaning, format conversion, and metadata harmonization, ensuring compatibility with repository standards and enabling wider reuse of the data.
Implementation
Ethical Agreement
The SME enters into an ethical agreement with researchers from the Swedish TEF-Health node, ensuring all data collection and usage complies with GDPR and national Swedish regulations.
Secure Access
Usage of the collected data is facilitated through a secure virtual environment managed by Karolinska Institutet, ensuring the highest standards of data protection.
Data Processing and Format Conversion
The SME’s raw datasets, stored in various proprietary formats, are converted into open formats such as mzML and mzTab, which are compatible with repository requirements. During this process, quality control measures, including noise filtering and peak annotation, are applied to improve data integrity and reliability.
Metadata Harmonization
Metadata schemas based on MetaboLights standards are created. These schemas incorporate essential details about study design, sample preparation and instrument parameters. Ontology based annotation is applied to harmonize metadata across all datasets, ensuring consistency and compliance with repository guidelines.
Benefits
- FAIR Data Management: Enhances SMEs’ ability to manage and share clinical trial data effectively while ensuring interoperability and quality.
- Stakeholder Credibility: A well-structured data management plan builds trust with regulators, funders, and collaborators.
- Regulatory Compliance: Ensures adherence to ethical and legal standards, reducing data handling risks.
- Data Sustainability: Supports long-term usability and scalability, enabling future research opportunities.
Impact
The SME’s biomarker discovery pipeline gains recognition as a reliable and validated tool in the rare disease research community. Their repository-submitted data fosters new collaborations with academic researchers and industry stakeholders, accelerating the translation of their findings into clinical applications.
Provider & Contact
Pricing is available to registered users. SMEs receive significant state-aid reductions (GBER) — or, depending on the call, free services during the funded project. Sign in or register to see the price for your organisation.
Sign in or register to see pricing