This study presents a method for automating the retrieval of key identifies and links to toxicological data from the Joint FAO/WHO Expert Committee on Food Additives (JECFA) database using web scraping techniques. Although the method primarily serves as an automated indexing tool, facilitating organization and access to relevant reports, monographs, and specifications, it significantly enhances the efficiency of navigating the extensive JECFA database. Researchers can then perform more targeted and efficient searches, although additional manual steps are required to extract and structure the detailed toxicological data. We developed R programming scripts to extract key information, such as chemical names, identifiers, and evaluation reports, from JECFA web pages. The resulting data set comprises 6552 records as of May 2024. We validated the dataset through a systematic comparison with manually collected data, ensuring its reliability. Instructions and code for accessing and processing the dataset, facilitating its reuse in research are provided. The code and dataset are openly available, enabling researchers to efficiently access and analyze toxicological data from the JECFA database.

Unveiling insights from the Joint FAO/WHO Expert Committee on Food Additives (JECFA) portal

Ocagli, Honoria;Lanera, Corrado;Zgheib, Rebecca;Belluco, Simone;Dacasto, Mauro;Gregori, Dario;Baldi, Ileana
2024

Abstract

This study presents a method for automating the retrieval of key identifies and links to toxicological data from the Joint FAO/WHO Expert Committee on Food Additives (JECFA) database using web scraping techniques. Although the method primarily serves as an automated indexing tool, facilitating organization and access to relevant reports, monographs, and specifications, it significantly enhances the efficiency of navigating the extensive JECFA database. Researchers can then perform more targeted and efficient searches, although additional manual steps are required to extract and structure the detailed toxicological data. We developed R programming scripts to extract key information, such as chemical names, identifiers, and evaluation reports, from JECFA web pages. The resulting data set comprises 6552 records as of May 2024. We validated the dataset through a systematic comparison with manually collected data, ensuring its reliability. Instructions and code for accessing and processing the dataset, facilitating its reuse in research are provided. The code and dataset are openly available, enabling researchers to efficiently access and analyze toxicological data from the JECFA database.
File in questo prodotto:
File Dimensione Formato  
s41597-024-04294-w.pdf

accesso aperto

Tipologia: Published (publisher's version)
Licenza: Creative commons
Dimensione 1.1 MB
Formato Adobe PDF
1.1 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3543330
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact