SQL Server 2016 – Data Masking and Always Encrypted

Two brand new and very exciting security features of SQL Server 2016 are Dynamic Data Masking and Always Encrypted

Two brand new and very exciting security features of SQL Server 2016 are Dynamic Data Masking and Always Encrypted

What Dynamic Data Masking Is

Dynamic data masking limits (DDM) sensitive data exposure by masking it to non-privileged users. It can be used to greatly simplify the design and coding of security in your application. The purpose of dynamic data masking is to limit exposure of sensitive data, preventing users who should not have access to the data from viewing it. Dynamic data masking does not aim to prevent database users from connecting directly to the database and running exhaustive queries that expose pieces of the sensitive data.

In the example below, assume there are only two roles of users. The first role is Sally in charge of fraud investigation. The second role is John who is in customer service. Sally requires access to all data to complete her job while John only needs a few elements and maybe the part of the customer’s email address to verify identity. Both users query SQL Server for the customer information, but Sally is able to see the entire email address while John can only see certain parts of the email address.

Benefits of Data Masking

What are the benefits of data masking? What are some specific reasons it should be used?

Data Masking Does

Data Masking Does NOT

  • Limit sensitive data exposure by masking it to non-privileged users
  • Minimally impact the application layer
  • Allow applications to mask sensitive data without modifying existing queries
  • Complement other SQL Server security features (auditing, encryption, row level security, etc.)

Prevent database users from connecting directly to the database and running exhaustive queries that expose pieces of the sensitive data

How Data Masking Works

  • On-the-fly obfuscation of data in query results
  • Policy-driven on the table and column
  • Multiple masking functions available for various sensitive data categories
  • Flexibility to define a set of privileged logins for un-masked data access
  • By default, database owner is unmasked

Data Masking Platform Options

Supported

Platform

Yes

SQL Server 2016

Yes

Azure SQL Database (SQL DB)

No, but it will be supported in the future

Azure SQL Data Warehouse (SQL DW)

No, but it will be supported in the future

Analytics Platform System (APS)

SQL Server 2016 and Azure SQL DB are the only platforms currently supporting data masking.  Azure SQL DW will support this feature in the near future while APS will be supported with AU6.

Data Masking Formats

The key concern I had looking at data masking was whether it would support all formats that I may need.  The short answer is that I don’t feel that it does. I think they try to solve the flexibility issue with the partial function, but it isn’t very flexible.  I have hopes this will be expanded with future updates.  Below are the data masking formats currently available.

Enabling Dynamic Data Masking

To interact with data masking for SQL Server 2016, use the following syntaxes:

— Enable data masking on dbo.DimCustomer.EmailAddress
ALTER TABLE dbo.DimCustomer ALTER COLUMN EmailAddress
ADD MASKED WITH (FUNCTION = ‘partial(3,”XXXXXX”,4)’);
— Grant DataMaskingDemo permission to see the data
GRANT UNMASK TO DataMaskingDemo;
— Revoke DataMaskingDemo permission to see the data
REVOKE UNMASK TO DataMaskingDemo;
— Drop the data masking
ALTER TABLE dbo.DimCustomer
ALTER COLUMN EmailAddress DROP MASKED;

Dynamic Data Masking Limitations and Considerations

So, this is all fine and dandy and seems like it may work alright, but let’s get to the meat and see what limitations there are.

  • Data Masking format disappears during a cast (or convert)
    CAST([EmailAddress] AS CHAR (10))
    Results in ‘XXXX’

 

In the example above, when converting a masked column to another string, all business value disappears and the column is completely masked.

  • Predicate logic can infer values
    WHERE [EmailAddress] =‘shree.khanal@sqlhelps.com
    Returns the correct row with the masked value

 

In the example above, although the column has applied data masking, remember that the data still resides unmasked in the database.  This means that predicate logic can be applied to infer masked values.

Another example would be John wanting to know Frank’s salary information.  The salary column is masked, but this value can be derived using <, >, etc.  Eventually John can pinpoint what Frank’s salary is.

The bottom line is that data masking is not the most secure option, so if data needs to be secure, there are other options such as Transparent Data Encryption and Row Level Security and Always Encrypted.

Always Encrypted

Always Encrypted is a data encryption technology that helps protect sensitive data at rest on the server, during movement between client and server, and while the data is in use, ensuring that sensitive data never appears as plaintext inside the database system.

After you encrypt data, only client applications or app servers that have access to the keys can access plaintext data.

The following illustration depicts the Always Encryption process.

  1. A user tries to execute a SQL statement from an application
  2. The SSN is encrypted in the database, so it is re-routed to the EnhancedNET Library
  3. The SSN the user specified is then encrypted and sent onto SQL Server
  4. SQL Server creates a SQL statement from the request, plus the SSN replaced with the cipher value
  5. The database then filters all rows that DO NOT match the cipher value
  6. The data is re-routed through the EnhancedNET Library so the cipher value can be decrypted
  7. The result set is returned to the user with the decrypted value

There are 2 encryption types that can be selected, deterministic and random, each of which has different attributes and the business needs will determine which one.

Deterministic Encryption

  • Always generates the same encrypted valuefor any given plain text value
  • Allows grouping, filtering by equality, and joining tables based on encrypted values
  • Allows unauthorized users to guess information about encrypted values by examining patterns in the encrypted column
  • Must use a column collation with a binary2 sort order for character columns

Random

  • Encrypts data in a less predictable manner
  • Randomized encryption is more secure, but prevents equality searches, grouping, indexing, and joining on encrypted columns

Encryption Keys

Once the encryption type is chosen, keys will need to be created.  These will generally be created by a security and/or compliance officer.  These keys are what will be used to encrypt and decrypt the data. The only encryption standard currently available is AEAD_AES_256_CBC_HMAC_SHA_256.

Column Master Keys

  • Used to encrypt column encryption keys
  • Must be stored in a trusted key store
  • Information about column master keys, including their location, is stored in the database in system catalog views

Column Encryption Keys

  • Used to encrypt sensitive data stored in database columns
  • Can be encrypted using a single column encryption key
  • Encrypted values of column encryption keys are stored in the database in system catalog views
  • Store column encryption keys in a secure/trusted location for backup

Enabling Always Encrypted

Now that concepts are a little better understood, it’s time to actually implement our chosen security model.

  1. Use SQL Server Management Studio 2016 (for both SQL Server 2016 and Azure SQL DB)
2. Run the Always Encrypted wizard

2. Run the Always Encrypted wizard

3.	Choose the table column to encrypt

3. Choose the table column to encrypt

4.  Choose the Encryption Type

4. Choose the Encryption Type

• Deterministic

• Randomized

5.  Create/Select Keys

5. Create/Select Keys

  • Column Master Key (CMK)
  • Create a Column Encryption Key (CEK)

Finally, the easy part

  1.  Create an application that inserts, selects, and displays data from the encrypted columns
  • After Always Encrypted is initialized, utilize an application that performs inserts and selects on the encrypted columns
  • To successfully run the sample application, run it on thesame computer where the Always Encrypted wizard is run
  • To run the applicationon another computer, you must deploy Always Encrypted certificates to the computer running the client app
Important Notes:
·        The application must use SqlParameter objects when passing plaintext data to the server with Always Encrypted columns
  • Passing literal values without using SqlParameter objects will result in an exception

Always Encrypted Considerations

Create the appropriate certificates

Limitations and Technologies NOT Supported

  • Encrypted columns do not allow range-like operations such as ‘>, <‘ , ‘LIKE’, etc.
  • Passing encrypted values to functions, user-defined or otherwise, isnot allowed
    (the database doesn’t have access to the unencrypted values)
  • Equality comparisons can only be performed on columns that usedeterministic encryption
  • Indexes can only be applied to deterministic encryption columns
  • Need same column encryption key for columns that are joined
  • Constant expressions that refer to encrypted columns not allowed
    WHERE SSN = ‘111-11-1111’, but WHERE SSN = @SSN is because the driver works with the SqlParameter class
  • Unsupported data types: xml, rowversion, image, ntext, text, sql_variant, hierarchyid, geography, geometry, and UDFs
  • Currently theonly driver that supports this feature is .NET 4.6
  • This isNOT TDE
  • Encrypted columns take significantly more space
  • String-based columns that are encrypted with deterministic encryption must use a _BIN2 collation (e.g. Latin1_General_BIN2)
  • The following data types are NOT supported as encrypted columns, per the documentation:
    text/ntext/image
    XML/hierarchyid/geography/geometry
    alias types/user-defined data types
    SQL_VARIANT
    rowversion (timestamp)
  • Sparse columnset (sparse columns are okay, provided the table doesn’t contain a columnset)
  • Built-in alias types, e.g. SYSNAME
  • Identity columns
  • Computed columns
    Temporal tables
  • Triggers are partially supported
  • Full-text search
  • Replication (need more research)
  • In-Memory OLTP
  • Stretch Database

 

Whew, that’s a mouthful.  It’s clear that this is a new technology and a first iteration. Polybase has been around for years, Query Store is a newer version of Extended Events, etc.  So while some SQL Server 2016 technologies are relatively complete and stable, Always Encrypted still has some work to do.  The limitations will weigh heavily in your architectural approach.

Selecting the Right Technology……or Both?

Is Always Encrypted superior to data masking?  Should I should always choose this?

The short answer is that one provides a higher level of security over the other, BUT each business use case will dictate the technology chosen, which sometimes results in them being COMPLEMENTARY!

Combining deterministic encryption AND data masking allows for the highest FUNCTIONAL level of security for data warehouses and relational databases.

If both technologies are selected collectively, be sure to encrypt all columns FIRST, then add Dynamic Data Masking, otherwise: