De-Identification Guidelines

Balancing Data Utility and Privacy

As data becomes central to research, analysis, and decision-making, protecting individual privacy while using data is a priority at USU. Our De-Identification Guidelines lay out strategies to anonymize personal information, allowing data to be used without compromising privacy. This process is essential for research and administrative functions, enabling us to meet regulatory standards while still gaining insights from data.

Why De-Identification is Vital to Compliance and Privacy

De-identification—removing or altering identifiable information—reduces the risk of reidentifying individuals, making it more difficult to trace data back to specific people.
In a world governed by complex privacy regulations, de-identification is not just best practice; it's often mandatory for certain types of data and uses. For example, in certain scenarios, HIPAA requires de-identification for health-related data to protect individuals’ privacy. These guidelines were created to meet such standards through secure techniques that safeguard personal information. Please note, however, that these guidelines are recommendations, not legal advice. Specific types of personal data may need to be de-identified according to contractual obligations or particular international, federal, or state laws. If you need guidance, the Privacy Office or the University Information Security Officer can provide support on de-identification techniques.
De-identification preserves data's utility, allowing it to be used for trend analysis, research, and operational improvements that advance USU’s academic and institutional goals. However, de-identification isn’t a one-time effort—it’s an ongoing commitment to privacy. By continuously refining our methods and monitoring de-identified data, we keep privacy at the forefront of our data practices, enabling responsible data use across all university functions.

De-Identification Techniques

The guidelines emphasize a blend of practical techniques and careful assessment to reduce re-identification risks:

Direct and Indirect Identifier Removal: Information like names or addresses is removed, while less obvious identifiers are generalized or masked to prevent linking.
Advanced Techniques: Techniques such as pseudonymization, generalization, and noise addition provide extra layers of privacy, ensuring that data remains secure even if combined with other datasets.
Ongoing Re-Identification Risk Assessment: Privacy risks change over time, so we regularly assess the effectiveness of de-identification methods, especially for datasets used frequently or shared externally.

Suggested Steps for Applying De-identification Techniques

Step 1: Understand Your Data Types and Attributes. This classification helps you decide which data needs masking, which data can stay, and what requires alteration.

Identify Direct Identifiers: These are clear attributes that directly identify someone (like names, ID numbers, email addresses, medical record numbers, phone numbers, device identifiers and serial numbers, etc).
Identify Indirect Identifiers: These are attributes that, when combined, could potentially identify someone (like age, birthday, gender, race ethnicity, occupation, etc.).
Identify Sensitive Attributes: Recognize any sensitive data that needs extra protection (e.g., health information, biometrics, financial data, political or religious views, etc).

Step 2: Choose Your De-Identification Techniques. Select methods based on whether you need the ability to re-identify or if complete anonymization is required:

Type of Identifier	Techniques
For Direct Identifiers	Complete Removal: Eliminate identifiers completely. Replacement with Category Names: Substitute with generic labels (e.g., "Participant"). Replacement with Symbols: Use placeholders or symbols (e.g., “[NAME]”). Replacement with Random Values: Use randomly generated values to obscure identities. Systematic Replacement with Pseudonyms: Use pseudonyms or codes, enabling controlled re-identification if needed.
For Indirect Identifiers	Suppression: Remove quasi-identifiers entirely. Generalization: Broaden values to general ranges (e.g., “age 42” to “age 40–50”). Perturbation (Noise Addition): Add random variations to obscure exact values. Swapping: Exchange values between records to mask identities. Sub-sampling: Release only a subset of records, limiting exposure.

Step 3: Apply Aggregation and Sampling for Privacy Enhancement. Aggregation and sampling help protect privacy, especially when detailed personal information is not essential.

Aggregation: Summarize data at a group level rather than individual levels, such as reporting average values rather than specific responses.
Sampling: Use a subset of data to reduce risk, which is particularly useful when full data isn’t necessary for analysis.

Step 4: Implement Formal Privacy Models for Added Protection. These models offer mathematically backed privacy guarantees, which help make data even less identifiable: K-Anonymity (Video), L-Diversity (Video), and T-Closeness (Video).

Step 5: Assess Re-Identification Risk and Consider Additional Controls. Evaluate Potential Re-Identification, and limit who can access de-identified data and implement data access restrictions, make sure those handling data understand and stay updated on de-identification techniques and best practices.

Step 6: Monitor and Update De-Identification Practices Regularly. Data is dynamic; regular evaluations help maintain effective de-identification in the face of evolving technology and regulations.