Papers and Code
Asynchronous Federated Optimization https://arxiv.org/pdf/1903.03934
Towards Federated Learning at Scale: System Design https://arxiv.org/pdf/1902.01046
Robust and Communication-Efficient Federated Learning from Non-IID Data https://arxiv.org/pdf/1903.02891
One-Shot Federated Learning https://arxiv.org/pdf/1902.11175
High Dimensional Restrictive Federated Model Selection with multi-objective Bayesian Optimization over shifted distributions https://arxiv.org/pdf/1902.08999
Federated Machine Learning: Concept and Applications https://arxiv.org/pdf/1902.04885
Agnostic Federated Learning https://arxiv.org/pdf/1902.00146
Peer-to-peer Federated Learning on Graphs https://arxiv.org/pdf/1901.11173
Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation Systemhttps://arxiv.org/pdf/1901.09888
SecureBoost: A Lossless Federated Learning Framework https://arxiv.org/pdf/1901.08755
Federated Reinforcement Learning https://arxiv.org/pdf/1901.08277
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systemshttps://arxiv.org/pdf/1901.06455
Federated Learning via Over-the-Air Computation https://arxiv.org/pdf/1812.11750
Broadband Analog Aggregation for Low-Latency Federated Edge Learning (Extended Version)https://arxiv.org/pdf/1812.11494
Multi-objective Evolutionary Federated Learning https://arxiv.org/pdf/1812.07478
Federated Optimization for Heterogeneous Networks https://arxiv.org/pdf/1812.06127
Efficient Training Management for Mobile Crowd-Machine Learning: A Deep Reinforcement Learning Approachhttps://arxiv.org/pdf/1812.03633
No Peek: A Survey of private distributed deep learning https://arxiv.org/pdf/1812.03288
A Hybrid Approach to Privacy-Preserving Federated Learning https://arxiv.org/pdf/1812.03224
Applied Federated Learning: Improving Google Keyboard Query Suggestions https://arxiv.org/pdf/1812.02903
Differentially Private Data Generative Models https://arxiv.org/pdf/1812.02274
Protection Against Reconstruction and Its Applications in Private Federated Learning https://arxiv.org/pdf/1812.00984
Split learning for health: Distributed deep learning without sharing raw patient data https://arxiv.org/pdf/1812.00564
Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learninghttps://arxiv.org/pdf/1812.00535
LoAdaBoost:Loss-Based AdaBoost Federated Machine Learning on medical Data https://arxiv.org/pdf/1811.12629
Analyzing Federated Learning through an Adversarial Lens https://arxiv.org/pdf/1811.12470
Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data https://arxiv.org/pdf/1811.11479
Biscotti: A Ledger for Private and Secure Peer-to-Peer Machine Learning https://arxiv.org/pdf/1811.09904
Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting https://arxiv.org/pdf/1811.09712
Weekly Dig in Privacy-Preserving Machine Learning
15 February 2019
Papers
- Secure Evaluation of Quantized Neural Networks
- TensorSCONE: A Secure TensorFlow Framework using Intel SGX
- Achieving GWAS with Homomorphic Encryption
Bonus
- A Marauder’s Map of Security and Privacy in Machine Learning, a lecture on security and privacy. By Nicolas Papernot.
8 February 2019
Paper
- CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning
Interesting solution for offloading/out-sourcing model training to set of workers while ensuring strong privacy guarantees; based on Lagrange coded computations. - Towards Federated Learning at Scale: System Design
Bonus
- A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance
Some of the greatest minds from cryptography join in on adversarial examples: “We develop a simple mathematical framework which enables us to think about this baffling phenomenon [and] explain why we should expect to find targeted adversarial examples in arbitrarily deep neural networks.”
1 February 2019
Papers
- Privacy-preserving semi-parallel logistic regression training with Fully Homomorphic Encryption
- CaRENets: Compact and Resource-Efficient CNN for Homomorphic Inference on Encrypted Medical Images
Secure predictions using FHE with careful packing. - Differentially Private Markov Chain Monte Carlo
- Improved Accounting for Differentially Private Learning
News
- Videos from Hacking Deep Learning 2 online, including talks on adversarily attacks and privacy. Via @BIUCrypto.
- Videos from CCS’18 online, including presentation of ABY3. Via @lzcarl.
Bonus
- Deep Learning to Evaluate Secure RSA Implementations
- Turbospeedz: Double Your Online SPDZ! Improving SPDZ using Function Dependent Preprocessing
18 January 2019
News
- Simons Institute program on Data Privacy: Foundations and Applications kicked off this week with several workshops around differential privacy.
11 January 2019
Papers
- Secure Computation for Machine Learning With SPDZ
Looks at regression tasks using the general-purpose reference implementation and with active security. - Secure Two-Party Feature Selection
Privacy-preserving chi-squared test for binary feature selection from Paillier encryption. - Contamination Attacks and Mitigation in Multi-Party Machine Learning
Making models more robust to tainted training data by minimizing the ability to predict the providing parties.
News
- Program for SP’19 is out with four accepted papers on differential privacy. Via @IEEESSP.
Bonus
- Excellent summary of what happened last year in the world of privacy-preserving machine learning by Dropout Labs.
- Real World Crypto happened this week, with (temporary?) recordings available on YouTube. Especially the talk on Deploying MPC for Social Good has received significant attention, while the talk on Foreshadow attack on Intel SGXfurthermore reminded us that enclaves are not perfect yet.
31 December 2018
Papers
- Fast Secure Comparison for Medium-Sized Integers and Its Application in Binarized Neural Networks
- Low Latency Privacy Preserving Inference
News
- Google AI team releases new TensorFlow Privacy library for training machine learning models with differential privacy for training data. Via @NicolasPapernot.
14 December 2018
Papers
- Applied Federated Learning: Improving Google Keyboard Query Suggestions
Update on concrete use of federated learning at Google; no secure computation nor differential privacy but including thoughts on dealing with unseen training data. - When Homomorphic Cryptosystem Meets Differential Privacy: Training Machine Learning Classifier with Privacy Protection
- Differentially Private User-based Collaborative Filtering Recommendation Based on K-means Clustering
- Privacy Partitioning: Protecting User Data During the Deep Learning Inference Phase
Optimising for privacy loss at early layers suggests pragmatic approach for protecting privacy of prediction inputs without cryptography nor DP. - A Review of Homomorphic Encryption Libraries for Secure Computation
- Private Polynomial Computation from Lagrange Encoding
News
- NeurIPS workshop on Privacy Preserving Machine Learning happened this week with a very interesting selection of papers.
- Intel’s HE Transformer for nGraph released as open source!
Bonus
30 November 2018
Papers
- nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data
“One of the biggest accelerators in deep learning has been frameworks that allow users to describe networks and operations at a high level while hinding details … A key challenge for building large-scale privacy-preserving ML systems using HE has been the lack of such a framework; as a result data scientists face the formidable task of becoming experts in deep learning, cryptography, and software engineering”. Amen! - CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
“In many respects, programming FHE applications today is akin to low-level assembly … Our central hypothesis is that future applications will benefit from a compiler and runtime that targets a compact and well-reasoned interface”. Amen! Also describes several ways on which the compiler can optimize encrypted computations. - Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference
Solid work on using weights quantization and other ML techniques to adapt neural networks for the encrypted setting, significantly improving performance relative to CryptoNets. Interestingly, second degree approximations of the Swish activation function are used over ReLUs and squaring. Gives plenty of references for those not coming from a ML background. - Privacy-Preserving Collaborative Preduction using Random Forests
Train models locally on independent data sets and apply ensemble techniques to serve private predictions using these. - FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions
Private predictions via FHE and GC. Interestingly, values are first convert to the frequency domain using the FFT and there’s a protocol for softmax. - The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs
- A Fully Private Pipeline for Deep Learning on Electronic Health Records
- Distributed and Secure ML with Self-tallying Multi-party Aggregation
News
- List of accepted papers for NeurIPS’18 privacy workshop is out! Via @mortendahlcs.
31 October 2018
Papers
28 September 2018
Papers
27 July 2018
Papers
- Efficient Logistic Regression on Large Encrypted Data
- Round-Efficient Protocols for Secure Multiparty Fixed-Point Arithmetic
27 June 2018
- Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
- TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service
- DeepObfuscation: Securing the Structure of Convolutional Neural Networks via Knowledge Distillation
- ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models
25 May 2018
Papers
- Logistic Regression over Encrypted Data from Fully Homomorphic Encryption
- From Keys to Databases – Real-World Applications of Secure Multi-Party Computation
Jana: private SQL databases Sharemind: secure analytics Partisia, Sepior: auctions and key management Unbound Technology: enterprise secrets - Minimising Communication in Honest-Majority MPC by Batchwise Multiplication Verification
- SPDZ2k: Efficient MPC mod 2^k for Dishonest Majority
To improve efficiency of MPC it is interesting to perform operations over rings that fit closely with native CPU instructions, as opposed to over e.g. a prime field. Doing so is straight forward when the attacker is honest-but-curious, and this paper addresses the case when he is fully malicious.
News
- GDPR has come into effect!
- Slides from UCL course on privacy enhancing technologies available. Via @emilianoucl.
- Keystone: An Open-source Secure Hardware Enclave. Via @Daeinar.
- The reference SPDZ implementation is being prepared for production. Via @SmartCryptology.
- Next week’s TPMPC workshop will be live-streamed if you happen to be elsewhere than Aarhus! Via @claudiorlandi.
Bonus
18 May 2018
Small but good: we only dug up one paper this week but it comes with very interesting claims.
Papers
- SecureNN: Efficient and Private Neural Network Training
Following recent approachs but reporting significant performance improvements via specialized protocols for the 3 and 4-server setting: the claimed cost of encrypted training is in some cases only 13-33 times that of training on cleartext data. Big factor in this is the avoidance of bit-decomposition and garbled circuits when computing comparisons and ReLUs.
11 May 2018
If anyone had any doubt that private machine learning is a growing area then this week might take care of that.
Papers
Secure multiparty computation:
- ABY3: A Mixed Protocol Framework for Machine Learning
One of big guys in secure computation for ML is back with new protocols in the 3-server setting for training linear regression, logistic regression, and neural network models. Impressive performance improvements for both training and prediction. - EPIC: Efficient Private Image Classification (or: Learning from the Masters)
An update to work from last year on efficient private image classification using SPDZ and support vector machines. Includes great overview of recent related work.
Homomorphic encryption:
- Unsupervised Machine Learning on Encrypted Data
Implements K-means privately using fully homomorphic encryption and a bit-wise rational encoding, with suggestions for tweaking K-means to make it more practical for this setting. The TFHE library (see below) is used for experiments. - TFHE: Fast Fully Homomorphic Encryption over the Torus
Proclaimed as the fastest FHE library currently available, this paper is the extended version of previous descriptions of the underlying scheme and optimizations. - Homomorphic Secret Sharing: Optimizations and Applications
Further work on a hybrid scheme between homomorphic encryption and secret sharing: operations can be performed locally by each share holder as in the former, yet a final combination is needed in the end to recover the result as in the latter: “this enables a level of compactness and efficiency of reconstruction that is impossible to achieve via standard FHE”.
Secure enclaves:
- SecureCloud: Secure Big Data Processing in Untrusted Clouds
An joint European research project to develop a platform for pusing critical applications to untrusted cloud environments, using secure enclaves and supporting big data. Envisioned use cases from finance, health care, and smart grids. - SecureStreams: A Reactive Middleware Framework for Secure Data Stream Processing
Presents concrete work done in the above SecureCloud project, namely a high-level Lua-based framework for privately processing streams at scale using dataflow programming and secure enclaves.
Differential privacy:
- Privately Learning High-Dimensional Distributions
Tackles the problem that privacy “comes almost for free when data is low-dimensional but comes at a steep price when data is high-dimensional” as measured in amount of samples needed. Two mechanisms are presented for learning respectively a multivariate Gaussian and a product distribution. - SynTF: Synthetic and Differentially Private Term Frequency Vectors for Privacy-Preserving Text Mining
A differentially private mechanism is used to prevent author re-identification in texts used for training models where anomymized feature vectors can be used instead of the actual body text. Concrete experiments include topic classification of newsgroups postings. - Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization
Correlated noise is used to privately perform two common operations via a centralized but curious party or directly between data holders, respectively. Interestingly, the correlated noise is not uniform as in typical secure aggregation settings.
Bonus
- An Empirical Analysis of Anonymity in Zcash A little reminder that anonymity is hard.
27 April 2018
Papers
- Towards Dependable Deep Convolutional Neural Networks (CNNs) with Out-distribution Learning
“in this paper we propose to add an additional dustbin class containing natural out-distribution samples” “We show that such an augmented CNN has a lower error rate in the presence of adversarial examples because it either correctly classifies adversarial samples or rejects them to a dustbin class.” - Weak labeling for crowd learning
“weak labeling for crowd learning is proposed, where the annotators may provide more than a single label per instance to try not to miss the real label” - Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles
“In this article, we place ourselves in a context where the amount of transferred data must be anticipated but a limited portion of the local training sets can be shared. We also suppose a minimalist topology where each node can only send information unidirectionally to a single central node which will aggregate models trained by the nodes” “Using shared data on the central node, we then train a probabilistic model to aggregate the base classifiers in a second stage.” - Securing Distributed Machine Learning in High Dimensions
Some results towards the issue of input pollution in federated learning, where a fraction of gradient providers may give arbitrarily malicious inputs to an aggregation protocol. “The core of our method is a robust gradient aggregator based on the iterative filtering algorithm for robust mean estimation”.
20 April 2018
Papers
- Nothing Refreshes Like a RePSI: Reactive Private Set Intersection
PSI was several applications in private data processing, including object linking in advertising and data augmentation. This paper takes a step towards mitigating exhaustive attacks where a party learns too much by simply asking for many intersections.
News
- Sharemind, one of the biggest and earliest players pushing MPC to industry, has launched a new privacy servicebased on secure computation using secure enclaves with the promise that it can handle big data. Via @positium.
- Interesting interview with Lea Kissner, the head of Google’s privacy team NightWatch. Few details are given but “She recently tried to obscure some data using cryptography, so that none of it would be visible to Google upon upload … but it turned out that [it] would require more spare computing power than Google has” sounds like techniques that could be related to MPC or HE. Via @rosa.
- Google had two AI presentations at this year’s RSA conference, one on fraud detection and one on adversarial techniques. Via @goodfellow_ian.
Bonus
- Privacy-Preserving Multibiometric Authentication in Cloud with Untrusted Database Providers
Relevant application of secure computation to authentication using sensitive data. Relative black box use of existing protocols yet experimental performance <1sec. - Private Anonymous Data Access
Interesting mix of private information retrieval and oblivious RAM: “We consider a scenario where a server holds a huge database that it wants to make accessible to a large group of clients while maintaining privacy and anonymity … with the goal of getting the best of both worlds: allow many clients to privately and anonymously access the database as in PIR, while having an efficient server as in ORAM”. - Adversarial Attacks Against Medical Deep Learning Systems
A discussion around some of the concrete consequences the medical profession may face from adversarial examples in machine learning systems with a warning of “caution in employing deep learning systems in clinical settings”.
13 April 2018
Papers
- Differentially Private Confidence Intervals for Empirical Risk Minimization
Addresses the question of computing confidence intervals in a private manner, using either DP or concentrated DP. Gives concrete examples and experiments using logistic regression and SVM.
News
- Facebook host privacy summit but seem a bit sparse on details. Via @sweis.
Bonus
- PowerHammer: Exfiltrating Data from Air-Gapped Computers through Power Lines
More work on leaking data from air-gapped computers through obscure side-channels, this time through power lines by varying the CPU utilization, achieving bit rates of 10-1000 bit/sec for different attacks.
30 March 2018
Papers
- Private Nearest Neighbors Classification in Federated Databases
Great read on custom MPC protocols allowing k-NN classification of a sample (such as document classification with cosine similarity) using a distributed data set, without leaking neither sample nor data set. This includes feature extraction, similarity computation, and top-k selection. - Chiron: Privacy-preserving Machine Learning as a Service
Interesting look at protecting both privacy of training data and model specifics via secure enclaves. The technology is promising despite having experienced a few issues recently and e.g. avoids use of heavy cryptography. - Locally Private Bayesian Inference for Count Models
When applying differential privacy one may either ignore the fact that noise has been added to the data or try to take it into account; the latter is done here with good illustrations of the improvements this can give. - Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries
Interesting peer-to-peer protocol for privately computing the exact average of a distributed data set via gossiping directly between the peers. No heavy cryptography is used in case of honest peers, with a PHE-based extension for detecting malicious cheating. - Comparing Population Means under Local Differential Privacy
- Cloud-based MPC with Encrypted Data
Gives two schemes for private Model Predictive Control by a central authority (who might have a better understanding of the environment than individual sensors), one based on PHE and another on MPC.
16 March 2018
Papers
- Model-Agnostic Private Learning via Stability
More work on ensuring privacy of training data via differential private query mechanisms. Compared to paper from a few weeks ago, this one focuses on “algorithms that are agnostic to the underlying learning problem [with] formal utility guarantees [and] provable accuracy guarantees”. - Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters
The Paillier cryptosystem is used to securely evaluate simplified similarity functions so users don’t leak biometric information during authentication. Performance numbers included. - Efficient Determination of Equivalence for Encrypted Data
Reminder that even a simpler task such as privately linking identities and records together is relevant in industry.
Bonus
- The Morning Paper: When coding style survives compilation Anonymity is hard! Random forests can be trained to identify your coding style from source code as well as compiled programs.
9 March 2018
News
- The 2018 Gödel Prize is awarded to Oded Regev for his paper On lattices, learning with errors, random linear codes, and cryptography. This had a huge influence on later work in cryptography, not least homomorphic encryption. Via @hoonoseme.
- OpenMined is now maintaining a list of papers and tools around private machine learning: https://github.com/OpenMined/awesome-ai-privacy! Via @iamtrask.
- Lab41 has released a Python wrapper around Microsoft’s SEAL homomorphic encryption library: https://github.com/Lab41/PySEAL. Via @mortendahl.
- The list of accepted contributed talks for this year’s Theory and Practice of MPC workshop has been announced. This is the definitive annual event dedicated to secure multi-party computation. Via @claudiorlandi.
Papers
- Generating Differentially Private Datasets Using GANs
Interesting idea of using GANs to produce artificial differential privacy-preserving datasets from sensitive data that are safe to release for further training purposes. This is done on the client side, meaning there’s no need for a trusted aggregator. - Faster Homomorphic Linear Transformations in HElib
The mesters are at it again, giving algorithmic improvements to perhaps the most well-known homomorphic encryption library and thereby making it 30-75 times faster. - Logistic Regression Model Training based on the Approximate Homomorphic Encryption
Private fitting of several logisictic regression models on smaller genomic data sets using the HEAAN homomorphic encryption scheme. Approach is somewhat typical gradient descent and sigmoid polynomial approximation but with significant concrete performance improvements over other work using HEAAN.
Blogs
- The Building Blocks of Interpretability Nothing to do with private machine learning, yet this is so neat that it warrents a mention. Go play!
2 March 2018
News
- @mvaria‘s talk about a real-world application of MPC at this year’s ENIGMA conference is online and well worth a watch: https://www.youtube.com/watch?v=d9rMokeYx9I. Via @lcyqn.
Papers
- Scalable Private Learning with PATE
Follow-up work to the celebrated Student-Teacher way of ensuring privacy of training data via differential privacy, now with better privacy bounds and hence less added noise. This is partially achieved by switching to Gaussian noise and more advanced (trusted) aggregation mechanisms. - Privacy-Preserving Logistic Regression Training
Fitting a logistic model from homomorphically encrypted data using the Newton-Raphson iterative method, but with a fixed and approximated Hessian matrix. Performance is evaluated on the iDASH cancer detection scenario. - Privacy-Preserving Boosting with Random Linear Classifiers for Learning from User-Generated Data
Presents the SecureBoost framework for mixing boosting algorithms with secure computation. The former uses randomly generated linear classifiers at the base and the latter comes in three variants: RLWE+GC, Paillier+GC, and SecretSharing+GC. Performance experiments on both the model itself and on the secure versions are provided. - Machine learning and genomics: precision medicine vs. patient privacy
Non-technical paper illustrating that secure computation techniques are finding their way into otherwise unrelated research areas, and hitting home-run with “data access restrictions are a burden for researchers, particularly junior researchers or small labs that do not have the clout to set up collaborations with major data curators”.
Blogs
- Uber’s differential privacy .. probably isn’t @frankmcsherry looks at Uber’s SQL differential privacy project and shares experience gained from implementing these things in Microsoft’s PINQ.
23 February 2018
Papers
- The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets
Concrete study of what a model can leak about sensitive information in the traning data. Perhaps not surprisingly, “only by developing and training a differential private model are we able to … protect against the extraction of secrets”. - Doing Real Work with FHE: The Case of Logistic Regression
The heavyweights of homomorphic encryption apply HElib to logistic regression with a focus on implementing “optimized versions of many bread and butter FHE tools. These tools include binary arithmetic, comparisons, partial sorting, and low-precision approximation of complicated functions such as reciprocals and logarithms”. - On the Connection between Differential Privacy and Adversarial Robustness in Machine Learning …
- Reading in the Dark: Classifying Encrypted Digits with Functional Encryption
Develops a functional encryption scheme for “efficient computation of quadratic polynomials on encrypted vectors” and applies this to private MNIST prediction (i.e. using a model trained on unencrypted data) via suitable quadractic models.