Statistics Netherlands, Department of Statistical Methods
 Statistical Methods to Limit Disclosure:
 1995 Annotated Bibliography

 

Barnes, C. (1995). 'Local Perturbation.' MSc. thesis, University of Leiden.

In this report a particular statistical disclosure control method is studied, namely local perturbation. The method is similar to local suppression, except that values in records are not replaced by missings but by other values. So local perturbation is an imputation technique. The basis for the method proposed by the author is a method developed at the Statistisches Landesamt Berlin. This method is called perturbation by compensation. The report was used as an MSc. thesis (University of Leiden).(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

de Waal, A.G. and Pieters, A.J. (1995). 'ARGUS Users Guide.' Statistics Netherlands, Department of Statistical Methods.

 In recent years Statistics Netherlands has developed a prototype version of a software package, ARGUS, to protect microdata files against statistical disclosure. In 1995 the present prototype version of ARGUS, ARGUS 1.1, has been released. This paper is the user's guide to ARGUS 1.1 In this user's guide is explained how Statistics Netherlands protects microdata files in practice. Both the rules, based on checking low dimensional combinations of values of so-called identifying variables, and the techniques, global recoding and local suppression, used by Statistics Netherlands are described. Subsequently, it is examined how ARGUS incorporates these ideas. After this somewhat theoretical introduction, the focus is entirely on how to operate ARGUS. Firstly, two data files that are needed by ARGUS are described in detail. Secondly, the interface of ARGUS is explained. It is made clear how to execute the possible actions by means of the menus of the interface. Thirdly, the information generated by ARGUS to determine the global recodings is described. Based on this information the user of ARGUS can decide how the variables should be recoded. Finally, the files that are used and created by ARGUS are summarized.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

de Waal, A.G. and Willenborg, L.C.R.J. (1995). 'Global Recoding and Local Suppression in Microdata Sets.' Statistics Netherlands, Department of Statistical Methods.

Statistics Netherlands applies two techniques to safeguard a microdata set against disclosure, namely local suppression and global recoding. When local suppression is applied some values in some records are replaced by 'missings'. When global recoding is applied some variables are recoded. Ideally, the local suppressions and global recodings should be determined automatically and optimally, i.e. the information loss due to the local suppressions and global recodings should be minimized. In this paper three problems are examined: finding the optimal local suppressions when a microdata set has to be protected by local suppressions only, finding the optimal global recodings when a microdata set has to be protected by global recodings only, and finding the optimal local suppressions and global recodings when a microdata set has to be protected by a mix of both techniques. For the first problem, the so-called local suppression problem, no complicated information measure is required. Several 0-1 integer programming formulations are given depending on the aim of the data protector. For the second and third problem, the so-called global recoding problem and the GR&LS-problem respectively, an elaborate information measure is required, however. In this paper we suggest an information measure based on a suitable entropy measure. Moreover, a verbal description of both the global recoding problem and the GR&LS-problem is presented.

This paper was presented at the 12th International Symposium on Methodology Issues: From data to information, Statistics Canada, Ottawa, 1-3 November, 1995.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

de Waal, A.G. and Willenborg, L.C.R.J. (1995). 'Local Suppression in Statistical Disclosure Control and Data Editing.' Statistics Netherlands, Department of Statistical Methods

 In this paper we give several integer programming formulations of local suppression problems. Local suppression of values in a microdata set consists of replacing values in certain individual records in a microdata set by missing values, i.e. by suppressing the corresponding values. This technique is applied in statistical disclosure control in order to safeguard a microdata set against disclosure. Generally one wants to suppress as few values as possible, under particular restrictions, in order to retain as much information in the data set as possible. Similar problems arise in the area of data editing. The purpose of data editing is to replace values in edits that are violated by a missing value that possibly at a later stage is replaced by a regular value, through some sort of imputation procedure. Again the aim is to preserve as much information in the data set as possible.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

de Waal, A.G. and Willenborg, L.C.R.J. (1995). 'Optimum Global Recoding and Local Suppression.' Statistics Netherlands, Department of Statistical Methods.

Two well-known techniques to safeguard a microdata set against disclosure are local suppression and global recoding. In the present paper a formulation of the 'pure" optimum global recoding problem is given, that is the problem how to eliminate a set of unsafe combinations with minimum information loss by using global recodings only. This requires the presence of proximity structures for each key variable. How to obtain such structures is also described in the paper. Although this 'pure' optimum global recoding problem has an interest in its own right it is mainly introduced as a stepping stone to a more comprehensive problem. This problem concerns the elimination of a set of unsafe combinations by the application of an optimum mix of global recodings and local suppressions.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

de Waal, A.G. and Willenborg, L.C.R.J. (1995). 'Statistical Disclosure Control and Sampling Weights.' Statistics Netherlands, Department of Statistical Methods

Before a microdata set can be disseminated by a statistical office it has to be checked that sensitive information about individual respondents cannot disclosed by a potential intruder. The procedure to check whether the dissemination of a microdata set can lead to disclosure of sensitive information usually amounts to examining how much so-called (indirectly) identifying information is contained in the microdata set. When a statistical office releases a microdata set sampling weights are sometimes included as a service to the public. A description of the used auxiliary variables, their categories and the sampling method is in that case also provided. Unfortunately, the sampling weights, innocent as they may seem, can provide additional identifying information to an intruder. In this paper we demonstrate that in almost any practical case, such an intruder will indeed be able to determine which combination of categories of the auxiliary variables corresponds to a specific weight. We also discuss some measures that can be taken to prevent an intruder from disclosing sensitive information of individual respondents by (mis-)using the identifying information offered by the sampling weights.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

de Waal, A.G. and Willenborg, L.C.R.J. (1995). 'A View on Statistical Disclosure Control for Microdata.' Statistics Netherlands, Department of Statistical Methods.

Problems arising from statistical disclosure control, which aims to prevent that information about individual respondents is disclosed by users of data, have come to the fore rapidly in recent years. The main reason for this is the growing demand for detailed data provided by statistical offices caused by the still increasing use of computers. In former days tables with relatively little information were published. Nowadays the users of data demand much more detailed tables and, moreover, microdata to analyze by themselves. Because of this increase in information content statistical disclosure control has become much more difficult. In this paper the authors give their view on the problems which one encounters when trying to protect microdata against disclosure. This view is based on their experience with statistical disclosure control acquired at Statistics Netherlands.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

Kardaun, J.W.P.F. and Willenborg, L.C.R.J. (1995). 'Cryptological Applications in Official Statistics.' Statistics Netherlands, Department of Statistical Methods

Cryptology is an underutilized tool in Official Statistics to enhance the possibilities of communication and collaboration while maintaining good levels of security and privacy. Straightforward applications are securing communications, stored data and e-mail, which allow to protract the activities of NSI's outside of the limits of their premises.

 Most forms of cryptology make reliable recognition of the sender of a message possible -- so much needed now that electronic communication does not carry the 'personality meta-information' of the sender. Cryptology can support EDI-fication, and is a mandatory prerequisite before wireless communication and computing can take off. Special NSI applications are record matching while preserving confidentiality, which is especially important for panel studies, the limited statistical processing of encrypted microdata, and compartimentalization of information.

 This paper was presented at the 1995 Seminar on New Techniques and Technologies for Statistics, Bonn, November 20-22, 1995.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

Pannekoek, J. (1995). 'Statistical Methods for Some Simple Disclosure Limitation Rules.' Statistics Netherlands, Department of Statistical Methods

To guard the confidentiality of information provided by respondents, statistical offices apply disclosure limitation techniques. An often applied technique is to insure that there are no categories for which the population frequency is presumed to be small ('rare' categories). This is attained by recoding, top-coding or setting values to 'unknown'. Since population frequencies are usually not available, the decision that a category is rare is often based on intuitive considerations. This is a time consuming process, involving many decisions of the disclosure limitation practitioners. In this paper it will be explored as to what extent the sample frequencies can be used to make such decisions. This leads to a procedure which enables to automatically scan a data set for rare category combinations, whereby `rare' is defined by the disclosure limitation policy of the statistical office.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

van Gelderen, R. (1995). 'ARGUS Statistical Disclosure Control of Survey Data.' Statistics Netherlands, Department of Statistical Methods.

This report consists of two parts. In the first part local suppression problems that can be interpreted as set covering problems are studied. Several heuristics are suggested and their performance is empirically tested. In the second part a prototype user interface for ARGUS, the statistical disclosure control package in development at Statistics Netherlands, is presented. This user interface is produced in visual C++ and is supposed to run under Windows. This report was in part used as an MSc. thesis (Free University Amsterdam).(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

Verboon, P. and Willenborg, L.C.R.J. (1995). 'Comparing Two Methods for Recovering Population Uniques in a Sample.' Statistics Netherlands, Department of Statistical Methods

If a statistical office wants to release a microdata set it should make sure that the probability to re-identify a person represented in this data set by an external user of the data is within reasonable bounds. What is reasonable depends on the conditions under which the data are to be released, and on the intended user group. In order to check whether a given microdata file is suitable for release, it would be ideal to use a model for individual re-identifications. In this paper an approach is sketched that ultimately should lead to such a model. For the purposes of statistical disclosure control such a model can be used to guide one in producing a 'safe' microdata set from an 'unsafe' one. Such an aim can be realized by the application of certain modification procedures, such as global recoding (- collapsing of categories), local suppression (- setting a value to 'missing') and perturbation (- replacing a value by another value).

 In this paper we study two classes of methods for recovering rare cases in a sample. The first method consists of computing the city-block distances between the records. Weights are used to vary the relative importance of the variables in computing the distances. The second class of methods is known under the name of homogeneity analysis. By varying the dimensionality of the solution, different members of this class are obtained. In a simulation study the two methods are compared with each other. It appears that using more dimensions in homogeneity analysis yields a better identification of the rare cases. The results of the city-block distances are comparable with the two-dimensional solution of homogeneity analysis.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

Willenborg, L.C.R.J. (1995). 'Outline of the SDC Project.' Statistics Netherlands, Department of Statistical Methods.

This paper gives an overview of the main activities of the Statistical Disclosure Controle (SDC) Project. Statistical Disclosure Control is the area in survey data processing concerned with the production of data that a statistical office considers to be safe enough to be released to (certain) outside users (possibly after creating a suitable legal framework in addition). The SDC project is co-sponsored by the EU through Esprit. Cooperating in this project are statistical offices and universities from three countries: The Netherlands, Italy and The United Kingdom. The main activities of the project are the investigation of certain methodological issues in the area of SDC and the development of specialized software to assist an analyst of a statistical office to produce data safe enough for external release.

 This paper was presented at the 1995 Seminar on New Techniques and Technologies for Statistics, Bonn, November 20-22, 1995.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

Willenborg, L.C.R.J., de Waal, A.G., and Keller, W.J. (1995). 'Some Methodological Issues in Statistical Disclosure Control.' Statistics Netherlands, Department of Statistical Methods.

In the last decade the demand for detailed information has increased considerably. The increased demand for detailed information becomes clear from the data that are released by statistical office. Whereas in the old days relatively small two-ways tables were sufficient to satisfy most of the users' demands, nowadays large three and higher-dimensional tables are no longer an exception. Microdata sets, i.e. data sets containing data on individual respondents, are relatively new products of statistical offices. Such a microdata set contains a wealth of information. However, both the release of large tables and of microdata sets lead to considerable problems when trying to protect the privacy of respondents. In this paper we examine how Statistics Netherlands is dealing with the problems of statistical disclosure control. The emphasis is on disclosure control for microdata sets rather than for tables, because disclosure control for microdata is a relatively new, and still controversial, subject. The current rules and techniques for microdata sets that are applied at Statistics Netherlands are examined. Applying these rules and techniques is not a straightforward matter. A number of methodological problems must be solved in order to apply these rules and techniques appropriately. Moreover, a number of potential improvements of our rules are examined. All these potential improvements require further theoretical research, however. Finally, a number of similarities and differences between statistical disclosure control for microdata and for tables is pointed out.

 This paper was presented at the Second Cathy Marsh Memorial Seminar, November 7th, London, 1995.(Abstract source: 1995 Statistics Netherlands, DSM Research Paper # 9604)

 

For more information about publications from the Department of Statistical Methods

 Mrs. Aranka Olgers
 Statistics Netherlands
 P.O. Box 4000
 2270 JM VOORBURG, The Netherlands
 email:
aols@clb.nl