Handle private/public data in SQL
Let say you have an entity that you want to be able to save in the database but not necessarily make public yet, for instance pages in a CMS that may be a draft before being approved to be published.
How would you proceed?
Ideas
Using a flag only works for the first version, as once published, every updates will be on the published version.
Using two tables, one for the drafts and the other for the published version, where publish means upsert the draft version into the published table, would work. But you would not be able to revert a published version to a previously published version.
Using a version column can keep track of the different version of an entity but the latest version is often the greater one, which is different for every entity, and we would still need a flag to distinguish a draft from a published version.
Introducing the generation
If instead we use a Gen column, everything becomes easier:
- gen #0 is the draft, this is the only record that may be updated
- gen #1 is the published version
- gen # > 1 are older versions
As an illustration, let's create a Pages
table. The slug is the Url of the page, Gen is the generation of the page for a given slug.
CREATE TABLE cms.Pages(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Gen] [int] DEFAULT 0 NOT NULL,
[Slug] [varchar](256) NOT NULL,
[Title] [nvarchar](2000) NOT NULL,
[Content] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_Pages] PRIMARY KEY CLUSTERED([Id] ASC)
)
GO
CREATE UNIQUE NONCLUSTERED INDEX AK_Slugs
ON cms.Pages ([Gen],[Slug])
GO
CREATE UNIQUE NONCLUSTERED INDEX IX_PublishedPages
ON cms.Pages ([Id])
WHERE [Gen]=1
GO
To select published pages, we only have to get those for which the generation if 1.
Thanks to the index, this query, that is the most used in the front office, will be fast.
-- Select published pages
SELECT * FROM cms.Pages
WHERE [Gen]=1
ORDER BY [Slug];
To publish a page that has a draft, we simply have to increment the generation of all records of the given slug. Generation 0 becomes Generation 1, 1 becomes 2, …
-- Publish /about page
UPDATE cms.Pages
SET [Gen]=[Gen]+1
WHERE [Slug]='/about';
To revert the changes, we decrement the generation of all records of the given slug. Generation 1 becomes generation 0.
If there was a generation 2, it becomes the published version, otherwise the page is now unpublished.
UPDATE cms.Pages
SET [Gen]=[Gen]-1
WHERE [Slug]='/about';
So, if we want to unpublish a page, we just have to increment every generation by generation 0. Therefore we keep a draft, if any, but the generation 1 becomes generation 2.
-- Unpublish /privacy
UPDATE cms.Pages
SET [Gen]=[Gen]+1
WHERE [Slug]='/privacy' AND [Gen]>0;
On the backoffice, we will need to have the most recent version of any page, i.e. the draft if there is any, else the published version, if there is any, or else the latest archived version,
-- Get most recent version
WITH v AS (SELECT MIN([Gen]) AS [Gen], [Slug] FROM cms.Pages WHERE [Gen]<=2 GROUP BY [Slug])
SELECT p.* FROM cms.Pages p
JOIN v ON v.[Gen]=p.[Gen] AND v.[Slug]=p.[Slug]
In fact, a backoffice whould need also the current publication status of the page.
Using the generation, we know the publication status of the record:
- 0 is Draft
- 1 is Published
- 2 is Archived
But, even if the most recent version is a Draft, we are interrested to know if there is a published version or not.
Because we are working with only 3 versions of a given page, if we sum the generations, we can distinguish the 3 cases:
- sum is 0, then there is only a draft version
- sum is 2, then there is an Archived version and maybe a Draft version, but the publication status of the page is Archived
- otherwise, there is Published version
-- Get publication status
WITH v AS (SELECT MIN([Gen]) AS [Gen], SUM([Gen]) AS [Hint], [Slug] FROM cms.Pages WHERE [Gen]<=2 GROUP BY [Slug])
SELECT CASE v.[Hint] WHEN 0 THEN 'Draft' WHEN 2 THEN 'Archived' ELSE 'Published' END AS [Status], p.* FROM cms.Pages p
JOIN v ON v.[Gen]=p.[Gen] AND v.[Slug]=p.[Slug]
;
This last query is probably the most complex: for edition purposes, given a slug and a generation, it upsert a draft version.
Please note that only drafts should be updated on any column. For other version, only the generation value should be updated!
-- Create a draft copy of /privacy Gen #2
MERGE INTO cms.Pages AS tgt
USING (SELECT 0 AS [Gen],[Slug],[Title],[Content] FROM cms.Pages p WHERE p.[Slug]='/privacy' AND p.[Gen]=2) AS src
ON (tgt.[Gen] = src.[Gen] AND tgt.[Slug] = src.[Slug])
WHEN MATCHED THEN
UPDATE SET [Title]=src.[Title],[Content]=src.[Content]
WHEN NOT MATCHED THEN
INSERT ([Gen],[Slug],[Title],[Content])
VALUES (DEFAULT, src.[Slug],src.[Title], src.[Content])
Concluding words
By changing our point of view and thinking in terms of generations instead of versions, we were able to defined a simple set of rules for updating the Generation column, as thus provide a simple solution to a complex problem.
References & Useful Links
- CQRS pitfalls and patterns - Udi Dahan - NDC Oslo 2023
Discussion about Private and Public data starts at the 26th minute