Skip to content

Quick Start Guide

This guide gives an overview of the Enprivacy 3.0 platform: what it does, how it is architected, how its services communicate, and the deployment models available to you. It reflects the Enprivacy 3.0 Quick Start Guide (Version 6) and is intended for teams evaluating or preparing to deploy the platform.

Download the PDF (Version 6)

Guided by the philosophy that every piece of data is a piece of someone’s life story, Enprivacy develops solutions that bring a privacy-first design to data categorisation, redaction, and analysis, as well as right-sized governance for confidential data.

The Enprivacy 3.0 platform brings together Enprivacy’s decades of experience in customer trust building, confidential data management, and data breach crisis resolution to offer a unique and unified platform for applying privacy-focused data analytics to your most sensitive information — your customer and strategic documents and data. It provides a control tower for managing your confidential data according to the policies and procedures you define.

The platform is designed for ease of deployment in multiple environments, including on-premises platforms as well as cloud providers. The deployment model itself is purposefully simple, with ready-to-deploy Open Container Initiative (OCI) images available, or replaced with cloud service provider offerings as needed.

Enprivacy 3.0 has been proven for the common use cases below; nonetheless, the platform is highly extensible and many new use cases are supported over time. These use cases include but are not limited to:

  • Retrieval Augmented Generation (“RAG”) with Generative Artificial Intelligence (“AI” or “Gen AI”) solutions without sharing confidential information,
  • Machine Learning (“ML”) using anonymised data,
  • data science and statistical analysis of redacted data,
  • internal training using pseudo-anonymised data,
  • sharing of redacted documents to third parties, and
  • other advanced use cases.

Enprivacy 3.0 unlocks immediate value from your confidential data for use in a variety of use cases, while simultaneously simplifying compliance with privacy regulations such as (but not limited to) the Singapore Personal Data Protection Act (“PDPA”), the European Union (“EU”) General Data Protection Regulation (“GDPR”), the California Consumer Privacy Act (“CCPA”), the Australian Privacy Principles (“APP”), and other regional and industry-specific rules.

The platform allows non-technical end-users to upload files, run automated detection, review proposed redactions, and export compliant versions while maintaining a full audit trail.

Additionally, administrative users may add document repositories for automated monitoring, such as (but not limited to) Gmail inboxes, Simple Storage Service (S3)-compatible services, Microsoft Azure blob storage, network file shares, Microsoft OneDrive drives and SharePoint sites, NetDocuments repositories, and others. These document repositories may be periodically scanned for PII and CII, with the option to generate redacted or anonymised documents automatically.

The platform utilises a proprietary combination of optical character recognition (“OCR”), pattern matching, Named Entity Recognition (“NER”), ML techniques, and other capabilities to generate insights into the confidential data held by structured, semi-structured, and unstructured data sets. Underpinning the analysis is a graph or network topology which supports rapid factual and inferred analysis of the confidential data.

The features of Enprivacy 3.0 may be described under three headings: Manage, Monitor, and Explore.

Administrative users may configure the general settings of the platform, including segregation of data into workspaces, identified categories, categorisation rule sets, and redaction plans. Furthermore, administrative users may define databases and document repositories, as well as set secure credentials.

Designated users may view reports and analytics of activities.

In future, this will include capabilities to write, publish, and link internal policies, procedures, controls, and evidence to external law and regulations, helping compliance and legal teams manage their confidential data effectively.

Designated users may access many tools designed to explore the data processed:

  • Graph Explorer — explore the relationships between data, documents, repositories, databases, and more.
  • Documents Explorer — read redacted versions of documents, and upload documents for analysis.
  • Database Explorer — explore database structures.
  • Chat — ‘talk’ to your data, both anonymised and original, as needed.
  • Search — perform general queries across your data.

The Enprivacy 3.0 solution is composed of four services, one database, and one blob (file) storage.

┌──────────────────────┐
Internet / Intranet │ Auth Services │
│ │ (via API) │
▼ └──────────┬───────────┘
┌─────────────────┐ │
│ Web App Firewall│◄─────────────────┘
└────────┬────────┘
┌─────────────────┐
│ Load Balancer │
└────────┬────────┘
┌────────┴────────────┐
▼ ▼
┌─────────────┐ ┌────────────────┐
│ App Service │ │ Batch / Job │
│ │ │ Service │
└──────┬──────┘ └────────┬───────┘
└──────────┬──────────┘
┌──────────┬─────┴──────┬───────────────┬──────────────────┐
▼ ▼ ▼ ▼ ▼ (GPU)
┌────────┐ ┌────────┐ ┌──────────┐ ┌────────────────┐ ┌─────────────┐
│Postgre │ │ File / │ │ OCR │ │ External svcs/ │ │ LLM Service │
│ SQL │ │ blob │ │ Service │ │ storage / other│ │ │
└────────┘ └────────┘ └──────────┘ └────────────────┘ └─────────────┘

All services are offered as OCI or Docker container images.

Serves the end-user and administrative interfaces. This service also includes the public interfaces for the Job service.

  • Supports horizontal scaling.
  • Stateless — all state is held within the database or the blob storage.
ComponentMinimumRecommended
Count11
CPU1 core2 cores
RAM2 GB4 GB
Disk20 GB20 GB
GPUNilNil

By default, models are downloaded to ~/.cache. It is recommended to make this path a mounted durable storage to accelerate service starts.

Processes background jobs and tasks. This service has no public interface.

  • Supports horizontal scaling.
  • Stateless — all state is held within the database or the blob storage.
ComponentMinimumRecommended
Count11
CPU1 core2 cores
RAM2 GB4 GB
Disk20 GB20 GB
GPUNilNil

By default, models are downloaded to ~/.cache. It is recommended to make this path a mounted durable storage to accelerate service starts.

Manages all inference workloads, including detection and classification processes, via vLLM and an open-source Large Language Model validated for use with Enprivacy 3.0. Generally, Enprivacy 3.0 needs one instance only.

The service is stateless; however, models are cached to local storage for performance. Enprivacy recommends attaching durable storage for this purpose.

ComponentMinimumRecommended
Count11
CPU4 cores8 cores
RAM32 GB58 GB
Disk200 GB200 GB
GPUT4-equivalentT4-equivalent

Provides optical character recognition (OCR) of documents using a configured Docling open-source tool. The service is stateless.

ComponentMinimumRecommended
Count11
CPU1 core2 cores
RAM2 GB4 GB
Disk20 GB20 GB
GPUNilNil

The Enprivacy 3.0 platform uses an external authentication and authorisation platform for user access control. The following endpoint must be reachable from the Web, Job, and LLM services:

  • auth.enprivacy.com

The LLM service (and, where they download models, the Web and Job services) fetches models on first use from the Hugging Face Hub. At the time of writing, the download Content Distribution Network URLs are:

  • huggingface.co
  • cdn-lfs.huggingface.co
  • cdn-lfs-us-1.hf.co
  • cdn-lfs-eu-1.hf.co
  • cdn-lfs.hf.co
  • cas-bridge.xethub.hf.co

Alternatively, the desired images can be loaded into a service’s durable storage as a one-off action.

An always-on PostgreSQL database is critical for the platform, as it holds state for the Web and Job services. It also requires certain extensions to support the vector and graph queries that are key to the platform. Enprivacy recommends following any corporate standards for database security.

ComponentMinimumRecommended
EnginePostgreSQLPostgreSQL
Version16+18+
Count11
CPU2 cores4 cores
RAM4 GB8 GB
Disk100 GB200 GB
Connections50100

A database schema should be created, along with a user with a password and all privileges over that schema. This user is used by the service to prepare the schema on first run and to apply schema updates during version upgrades, as well as for all day-to-day activities. For improved security, it is recommended to perform password rotations in either a single-user or dual-user configuration.

The PostgreSQL database must include the following extensions:

The database may include the following extension:

  • pgRouting — if not available, the platform falls back to less-optimised recursive queries.

Blob (file) storage is needed for holding document uploads as well as redacted documents. This generally grows over time as documents are uploaded or redacted, but there is no minimum size required.

ComponentMinimumRecommended
Storage10 GB10+ GB

Services communicate with each other over the following default ports.

FromToPortProtocol
InternetWeb8080HTTP
WebDatabase5432TCP
JobDatabase5432TCP
WebLLM8000HTTP
JobLLM8000HTTP
WebOCR5001HTTP
JobOCR5001HTTP

Only the Web service needs to be exposed for administrative and end-user access. Enprivacy recommends using HTTPS termination via load balancers or other tools as appropriate for your environment; this requires providing your own certificates for your selected domains.

Alternatively, Enprivacy 3.0 can be configured for HTTPS with certificates provisioned by Let’s Encrypt. Enprivacy can work with your team to select and implement an appropriate approach.

Section titled “Recommended deployment model with containers”

Enprivacy recommends deploying the platform to cloud-based service providers such as AWS, Azure, Google, or Huawei, to make use of the managed-service capabilities these platforms offer.

Enprivacy 3.0 is designed for ease of deployment to many environments and, as such, has adopted Open Container Initiative (OCI) or Docker containers. Enprivacy can provide the needed container images via the GitHub Container Registry, or by pushing images to your own container repository.

  • Each service can be deployed using the cloud provider’s ‘container application’ service, or via Kubernetes.
  • The database can be deployed using the cloud provider’s managed PostgreSQL service, or through an assembled container with all needed extensions.
  • The blob storage can likewise be deployed using the cloud provider’s managed blob storage service, or through an open-source S3-compatible container (e.g. MinIO or Garage).

The cost of GPUs can be significant. Enprivacy can, in many cases, work with your selected LLM providers to re-use existing commitments for platforms such as AWS Bedrock, Azure OpenAI, or Google Vertex.

The architecture is designed for flexibility. Enprivacy will work with your teams to identify and implement any changes needed for compliance or regulatory purposes.

For demonstration purposes, it is possible to operate the Web, Job, OCR, and Database services in a single environment, with durable storage on a local disk. The LLM service is not included in this design; an external LLM service should be used. Enprivacy can provide a single Docker Compose file to enable this environment.

The table below provides the minimum and recommended computing configuration with and without the OCR service. The OCR service is required if demonstrating the content analysis and redaction features of Enprivacy 3.0.

ComponentAIO without OCRAIO with OCR
Count11
Operating systemLinux (Ubuntu 22.04+ or Debian 12), 64-bitLinux (Ubuntu 22.04+ or Debian 12), 64-bit
DockerDocker Engine 24+ with Compose v2Docker Engine 24+ with Compose v2
CPU4 cores8 cores
RAM8 GB16 GB
Disk150 GB200 GB