Quick Start Guide
This guide gives an overview of the Enprivacy 3.0 platform: what it does, how it is architected, how its services communicate, and the deployment models available to you. It reflects the Enprivacy 3.0 Quick Start Guide (Version 6) and is intended for teams evaluating or preparing to deploy the platform.
Download the PDF (Version 6)
Executive Summary
Section titled “Executive Summary”Guided by the philosophy that every piece of data is a piece of someone’s life story, Enprivacy develops solutions that bring a privacy-first design to data categorisation, redaction, and analysis, as well as right-sized governance for confidential data.
The Enprivacy 3.0 platform brings together Enprivacy’s decades of experience in customer trust building, confidential data management, and data breach crisis resolution to offer a unique and unified platform for applying privacy-focused data analytics to your most sensitive information — your customer and strategic documents and data. It provides a control tower for managing your confidential data according to the policies and procedures you define.
The platform is designed for ease of deployment in multiple environments, including on-premises platforms as well as cloud providers. The deployment model itself is purposefully simple, with ready-to-deploy Open Container Initiative (OCI) images available, or replaced with cloud service provider offerings as needed.
Use Cases
Section titled “Use Cases”Enprivacy 3.0 has been proven for the common use cases below; nonetheless, the platform is highly extensible and many new use cases are supported over time. These use cases include but are not limited to:
- Retrieval Augmented Generation (“RAG”) with Generative Artificial Intelligence (“AI” or “Gen AI”) solutions without sharing confidential information,
- Machine Learning (“ML”) using anonymised data,
- data science and statistical analysis of redacted data,
- internal training using pseudo-anonymised data,
- sharing of redacted documents to third parties, and
- other advanced use cases.
Enprivacy 3.0 Features
Section titled “Enprivacy 3.0 Features”Enprivacy 3.0 unlocks immediate value from your confidential data for use in a variety of use cases, while simultaneously simplifying compliance with privacy regulations such as (but not limited to) the Singapore Personal Data Protection Act (“PDPA”), the European Union (“EU”) General Data Protection Regulation (“GDPR”), the California Consumer Privacy Act (“CCPA”), the Australian Privacy Principles (“APP”), and other regional and industry-specific rules.
The platform allows non-technical end-users to upload files, run automated detection, review proposed redactions, and export compliant versions while maintaining a full audit trail.
Additionally, administrative users may add document repositories for automated monitoring, such as (but not limited to) Gmail inboxes, Simple Storage Service (S3)-compatible services, Microsoft Azure blob storage, network file shares, Microsoft OneDrive drives and SharePoint sites, NetDocuments repositories, and others. These document repositories may be periodically scanned for PII and CII, with the option to generate redacted or anonymised documents automatically.
The platform utilises a proprietary combination of optical character recognition (“OCR”), pattern matching, Named Entity Recognition (“NER”), ML techniques, and other capabilities to generate insights into the confidential data held by structured, semi-structured, and unstructured data sets. Underpinning the analysis is a graph or network topology which supports rapid factual and inferred analysis of the confidential data.
The features of Enprivacy 3.0 may be described under three headings: Manage, Monitor, and Explore.
Manage
Section titled “Manage”Administrative users may configure the general settings of the platform, including segregation of data into workspaces, identified categories, categorisation rule sets, and redaction plans. Furthermore, administrative users may define databases and document repositories, as well as set secure credentials.
Monitor
Section titled “Monitor”Designated users may view reports and analytics of activities.
In future, this will include capabilities to write, publish, and link internal policies, procedures, controls, and evidence to external law and regulations, helping compliance and legal teams manage their confidential data effectively.
Explore
Section titled “Explore”Designated users may access many tools designed to explore the data processed:
- Graph Explorer — explore the relationships between data, documents, repositories, databases, and more.
- Documents Explorer — read redacted versions of documents, and upload documents for analysis.
- Database Explorer — explore database structures.
- Chat — ‘talk’ to your data, both anonymised and original, as needed.
- Search — perform general queries across your data.
Overall Architecture
Section titled “Overall Architecture”The Enprivacy 3.0 solution is composed of four services, one database, and one blob (file) storage.
┌──────────────────────┐ Internet / Intranet │ Auth Services │ │ │ (via API) │ ▼ └──────────┬───────────┘ ┌─────────────────┐ │ │ Web App Firewall│◄─────────────────┘ └────────┬────────┘ ▼ ┌─────────────────┐ │ Load Balancer │ └────────┬────────┘ ┌────────┴────────────┐ ▼ ▼ ┌─────────────┐ ┌────────────────┐ │ App Service │ │ Batch / Job │ │ │ │ Service │ └──────┬──────┘ └────────┬───────┘ └──────────┬──────────┘ ┌──────────┬─────┴──────┬───────────────┬──────────────────┐ ▼ ▼ ▼ ▼ ▼ (GPU)┌────────┐ ┌────────┐ ┌──────────┐ ┌────────────────┐ ┌─────────────┐│Postgre │ │ File / │ │ OCR │ │ External svcs/ │ │ LLM Service ││ SQL │ │ blob │ │ Service │ │ storage / other│ │ │└────────┘ └────────┘ └──────────┘ └────────────────┘ └─────────────┘Services
Section titled “Services”All services are offered as OCI or Docker container images.
Web service
Section titled “Web service”Serves the end-user and administrative interfaces. This service also includes the public interfaces for the Job service.
- Supports horizontal scaling.
- Stateless — all state is held within the database or the blob storage.
| Component | Minimum | Recommended |
|---|---|---|
| Count | 1 | 1 |
| CPU | 1 core | 2 cores |
| RAM | 2 GB | 4 GB |
| Disk | 20 GB | 20 GB |
| GPU | Nil | Nil |
By default, models are downloaded to ~/.cache. It is recommended to make this path a
mounted durable storage to accelerate service starts.
Job service
Section titled “Job service”Processes background jobs and tasks. This service has no public interface.
- Supports horizontal scaling.
- Stateless — all state is held within the database or the blob storage.
| Component | Minimum | Recommended |
|---|---|---|
| Count | 1 | 1 |
| CPU | 1 core | 2 cores |
| RAM | 2 GB | 4 GB |
| Disk | 20 GB | 20 GB |
| GPU | Nil | Nil |
By default, models are downloaded to ~/.cache. It is recommended to make this path a
mounted durable storage to accelerate service starts.
LLM service
Section titled “LLM service”Manages all inference workloads, including detection and classification processes, via vLLM and an open-source Large Language Model validated for use with Enprivacy 3.0. Generally, Enprivacy 3.0 needs one instance only.
The service is stateless; however, models are cached to local storage for performance. Enprivacy recommends attaching durable storage for this purpose.
| Component | Minimum | Recommended |
|---|---|---|
| Count | 1 | 1 |
| CPU | 4 cores | 8 cores |
| RAM | 32 GB | 58 GB |
| Disk | 200 GB | 200 GB |
| GPU | T4-equivalent | T4-equivalent |
Text Extraction (OCR) service
Section titled “Text Extraction (OCR) service”Provides optical character recognition (OCR) of documents using a configured Docling open-source tool. The service is stateless.
| Component | Minimum | Recommended |
|---|---|---|
| Count | 1 | 1 |
| CPU | 1 core | 2 cores |
| RAM | 2 GB | 4 GB |
| Disk | 20 GB | 20 GB |
| GPU | Nil | Nil |
Network allow-listing
Section titled “Network allow-listing”The Enprivacy 3.0 platform uses an external authentication and authorisation platform for user access control. The following endpoint must be reachable from the Web, Job, and LLM services:
auth.enprivacy.com
The LLM service (and, where they download models, the Web and Job services) fetches models on first use from the Hugging Face Hub. At the time of writing, the download Content Distribution Network URLs are:
huggingface.cocdn-lfs.huggingface.cocdn-lfs-us-1.hf.cocdn-lfs-eu-1.hf.cocdn-lfs.hf.cocas-bridge.xethub.hf.co
Alternatively, the desired images can be loaded into a service’s durable storage as a one-off action.
Database
Section titled “Database”An always-on PostgreSQL database is critical for the platform, as it holds state for the Web and Job services. It also requires certain extensions to support the vector and graph queries that are key to the platform. Enprivacy recommends following any corporate standards for database security.
| Component | Minimum | Recommended |
|---|---|---|
| Engine | PostgreSQL | PostgreSQL |
| Version | 16+ | 18+ |
| Count | 1 | 1 |
| CPU | 2 cores | 4 cores |
| RAM | 4 GB | 8 GB |
| Disk | 100 GB | 200 GB |
| Connections | 50 | 100 |
First set-up
Section titled “First set-up”A database schema should be created, along with a user with a password and all privileges over that schema. This user is used by the service to prepare the schema on first run and to apply schema updates during version upgrades, as well as for all day-to-day activities. For improved security, it is recommended to perform password rotations in either a single-user or dual-user configuration.
Extensions
Section titled “Extensions”The PostgreSQL database must include the following extensions:
The database may include the following extension:
- pgRouting — if not available, the platform falls back to less-optimised recursive queries.
Blob storage
Section titled “Blob storage”Blob (file) storage is needed for holding document uploads as well as redacted documents. This generally grows over time as documents are uploaded or redacted, but there is no minimum size required.
| Component | Minimum | Recommended |
|---|---|---|
| Storage | 10 GB | 10+ GB |
Communications
Section titled “Communications”Interservice communications
Section titled “Interservice communications”Services communicate with each other over the following default ports.
| From | To | Port | Protocol |
|---|---|---|---|
| Internet | Web | 8080 | HTTP |
| Web | Database | 5432 | TCP |
| Job | Database | 5432 | TCP |
| Web | LLM | 8000 | HTTP |
| Job | LLM | 8000 | HTTP |
| Web | OCR | 5001 | HTTP |
| Job | OCR | 5001 | HTTP |
Public communication
Section titled “Public communication”Only the Web service needs to be exposed for administrative and end-user access. Enprivacy recommends using HTTPS termination via load balancers or other tools as appropriate for your environment; this requires providing your own certificates for your selected domains.
Alternatively, Enprivacy 3.0 can be configured for HTTPS with certificates provisioned by Let’s Encrypt. Enprivacy can work with your team to select and implement an appropriate approach.
Deployment Models
Section titled “Deployment Models”Recommended deployment model with containers
Section titled “Recommended deployment model with containers”Enprivacy recommends deploying the platform to cloud-based service providers such as AWS, Azure, Google, or Huawei, to make use of the managed-service capabilities these platforms offer.
Enprivacy 3.0 is designed for ease of deployment to many environments and, as such, has adopted Open Container Initiative (OCI) or Docker containers. Enprivacy can provide the needed container images via the GitHub Container Registry, or by pushing images to your own container repository.
- Each service can be deployed using the cloud provider’s ‘container application’ service, or via Kubernetes.
- The database can be deployed using the cloud provider’s managed PostgreSQL service, or through an assembled container with all needed extensions.
- The blob storage can likewise be deployed using the cloud provider’s managed blob storage service, or through an open-source S3-compatible container (e.g. MinIO or Garage).
Alternative options for LLMs
Section titled “Alternative options for LLMs”The cost of GPUs can be significant. Enprivacy can, in many cases, work with your selected LLM providers to re-use existing commitments for platforms such as AWS Bedrock, Azure OpenAI, or Google Vertex.
Compliance with client requirements
Section titled “Compliance with client requirements”The architecture is designed for flexibility. Enprivacy will work with your teams to identify and implement any changes needed for compliance or regulatory purposes.
All-In-One
Section titled “All-In-One”For demonstration purposes, it is possible to operate the Web, Job, OCR, and Database services in a single environment, with durable storage on a local disk. The LLM service is not included in this design; an external LLM service should be used. Enprivacy can provide a single Docker Compose file to enable this environment.
The table below provides the minimum and recommended computing configuration with and without the OCR service. The OCR service is required if demonstrating the content analysis and redaction features of Enprivacy 3.0.
| Component | AIO without OCR | AIO with OCR |
|---|---|---|
| Count | 1 | 1 |
| Operating system | Linux (Ubuntu 22.04+ or Debian 12), 64-bit | Linux (Ubuntu 22.04+ or Debian 12), 64-bit |
| Docker | Docker Engine 24+ with Compose v2 | Docker Engine 24+ with Compose v2 |
| CPU | 4 cores | 8 cores |
| RAM | 8 GB | 16 GB |
| Disk | 150 GB | 200 GB |