A large amount of high-quality data is vital for the AI services provided by companies like Google and Meta and the training of large language models. Collecting and utilizing valuable data enhances user experience. In the meantime, it makes private information leakage feasible. When the data related to a specific user is in the training data of machine learning models, knowing this fact could leak the user's private information. For example, if we know a user's data is in a disease classification model for categorizing different fine-grained diseases, we could know this user has suffered the disease, which is the user's private information. In the machine learning field, Membership Inference Attack (MIA), which was initially proposed to infer whether a data point is in the training data of a machine learning model, leaks training data privacy. From the data owner's or manager's perspective, MIA provides a way to audit the data usage of the final trained model. To better improve, expand, understand, and utilize MIA, this dissertation focuses on two significant aspects of MIA: (i) Improvement, expansion, and understanding of MIA and (ii) Subject data auditing with lessons from MIA. The first part presents improvement, expansion, and understanding of MIA. Considering the ineffectiveness and hardness of previous label-only MIA methods on Graph Neural Networks (GNNs), this dissertation proposes an effective and straightforward method to implement label-only MIA against node-level GNNs. As attractive properties and increasing real-world applications of deep Spiking Neural Networks (SNNs), this dissertation evaluates the membership privacy of deep SNNs based on outputs of SNNs and experiences from MIA against Artificial Neural Networks (ANNs). Due to the lack of intermediate embeddings of the Face Recognition System (FRS) and concrete data used for preparing the FRS, this thesis initially explores the adding of Gaussian noise to given data for user-level MIAs on FRS and finally finds the reloading of the model makes user-level MIAs feasible. Since previous works only consider the vulnerability of a data point under one MIA and one target model, this dissertation defines new and effective metrics to measure the vulnerability of a data point under multiple MIAs and target models for a better understanding of MIA. The second part of the dissertation investigates subject data auditing with lessons from MIA. Considering the inadequate ability and strong assumptions of previous works to detect clients that utilize data from a specific subject (subject data), this dissertation utilizes lessons from MIA and proposes a novel and effective method to precisely detect all clients that train their local models with subject data.

Membership Inference Attack: From Leaking Training Data Privacy to Inspiring Subject Data Auditing / Li, Jiaxin. - (2025 Mar 13).

Membership Inference Attack: From Leaking Training Data Privacy to Inspiring Subject Data Auditing

LI, JIAXIN
2025

Abstract

A large amount of high-quality data is vital for the AI services provided by companies like Google and Meta and the training of large language models. Collecting and utilizing valuable data enhances user experience. In the meantime, it makes private information leakage feasible. When the data related to a specific user is in the training data of machine learning models, knowing this fact could leak the user's private information. For example, if we know a user's data is in a disease classification model for categorizing different fine-grained diseases, we could know this user has suffered the disease, which is the user's private information. In the machine learning field, Membership Inference Attack (MIA), which was initially proposed to infer whether a data point is in the training data of a machine learning model, leaks training data privacy. From the data owner's or manager's perspective, MIA provides a way to audit the data usage of the final trained model. To better improve, expand, understand, and utilize MIA, this dissertation focuses on two significant aspects of MIA: (i) Improvement, expansion, and understanding of MIA and (ii) Subject data auditing with lessons from MIA. The first part presents improvement, expansion, and understanding of MIA. Considering the ineffectiveness and hardness of previous label-only MIA methods on Graph Neural Networks (GNNs), this dissertation proposes an effective and straightforward method to implement label-only MIA against node-level GNNs. As attractive properties and increasing real-world applications of deep Spiking Neural Networks (SNNs), this dissertation evaluates the membership privacy of deep SNNs based on outputs of SNNs and experiences from MIA against Artificial Neural Networks (ANNs). Due to the lack of intermediate embeddings of the Face Recognition System (FRS) and concrete data used for preparing the FRS, this thesis initially explores the adding of Gaussian noise to given data for user-level MIAs on FRS and finally finds the reloading of the model makes user-level MIAs feasible. Since previous works only consider the vulnerability of a data point under one MIA and one target model, this dissertation defines new and effective metrics to measure the vulnerability of a data point under multiple MIAs and target models for a better understanding of MIA. The second part of the dissertation investigates subject data auditing with lessons from MIA. Considering the inadequate ability and strong assumptions of previous works to detect clients that utilize data from a specific subject (subject data), this dissertation utilizes lessons from MIA and proposes a novel and effective method to precisely detect all clients that train their local models with subject data.
Membership Inference Attack: From Leaking Training Data Privacy to Inspiring Subject Data Auditing
13-mar-2025
Membership Inference Attack: From Leaking Training Data Privacy to Inspiring Subject Data Auditing / Li, Jiaxin. - (2025 Mar 13).
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_Revised_Jiaxin.pdf

accesso aperto

Descrizione: PhD_Thesis_Revised_Jiaxin.pdf
Tipologia: Tesi di dottorato
Dimensione 16.86 MB
Formato Adobe PDF
16.86 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3550402
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact