docs(notes): add database documentation

lzwjava · lzwjava · commit f69bb62b4791 · 2025-03-24T02:06:18.000+08:00
diff --git a/notes/2025-03-24-database-acid-en.md b/notes/2025-03-24-database-acid-en.md
@@ -0,0 +1,97 @@
+---
+title: Understanding Full ACID Compliance in Databases
+lang: en
+layout: post
+audio: false
+translated: false
+generated: true
+---
+
+ACID compliance represents one of the fundamental frameworks for ensuring database reliability and data integrity, particularly in transaction processing systems. Let's explore what ACID means in depth and why it matters.
+
+## What is ACID?
+
+ACID is an acronym that stands for:
+
+- **Atomicity**
+- **Consistency**
+- **Isolation**
+- **Durability**
+
+These four properties together guarantee that database transactions are processed reliably, even in the face of errors, power failures, and other issues. Let's examine each property in detail.
+
+## Atomicity
+
+Atomicity ensures that a transaction is treated as a single, indivisible unit of work. This means:
+
+- Either all operations within the transaction complete successfully (commit)
+- Or none of them take effect (rollback)
+
+### Deep Dive:
+When a transaction involves multiple operations (such as debiting one account and crediting another), atomicity guarantees that either both operations succeed or neither does. The database maintains this property through mechanisms like write-ahead logging (WAL) and rollback segments, which record the state before changes so the system can undo partial transactions.
+
+## Consistency
+
+Consistency ensures that a transaction brings the database from one valid state to another valid state, maintaining all predefined rules, constraints, and triggers.
+
+### Deep Dive:
+Consistency works on multiple levels:
+- **Database consistency**: Enforcing data integrity constraints, foreign keys, unique constraints, and check constraints
+- **Application consistency**: Ensuring business rules are maintained
+- **Transaction consistency**: Guaranteeing that invariants are preserved before and after transaction execution
+
+A consistent transaction preserves the database's semantic integrity - it cannot violate any defined rules. For example, if a rule states an account balance cannot be negative, a consistent transaction cannot result in a negative balance.
+
+## Isolation
+
+Isolation ensures that concurrent execution of transactions leaves the database in the same state as if the transactions were executed sequentially.
+
+### Deep Dive:
+Isolation prevents problems like:
+- **Dirty reads**: Reading uncommitted data from another transaction
+- **Non-repeatable reads**: Getting different results when reading the same data twice in the same transaction
+- **Phantom reads**: When new rows appear in a range scan due to another transaction's insert
+
+Databases implement various isolation levels through techniques like:
+- **Pessimistic concurrency control**: Locking resources to prevent conflicts
+- **Optimistic concurrency control**: Allowing concurrent access but validating before commit
+- **Multiversion concurrency control (MVCC)**: Maintaining multiple versions of data to allow concurrent reads without blocking
+
+## Durability
+
+Durability guarantees that once a transaction has been committed, it remains committed even in the case of system failure.
+
+### Deep Dive:
+Durability is typically achieved through:
+- **Write-ahead logging**: Changes are first recorded in logs before being applied to the actual data
+- **Redundant storage**: Multiple copies of data stored across different locations
+- **Checkpoint mechanisms**: Ensuring changes are periodically flushed from memory to persistent storage
+
+In practical terms, this means that committed transactions survive power failures, system crashes, or hardware failures, as they've been permanently stored on non-volatile memory.
+
+## Implementation Challenges and Considerations
+
+Achieving full ACID compliance involves significant tradeoffs:
+
+1. **Performance impact**: Strict ACID properties can reduce throughput and increase latency
+2. **Scalability limitations**: Some ACID guarantees become harder to maintain in distributed systems
+3. **Implementation complexity**: Maintaining these properties requires sophisticated algorithms and mechanisms
+4. **Resource utilization**: Additional storage and memory may be required for logs, lock tables, and multiple data versions
+
+## Real-World Applications
+
+Different database systems provide varying levels of ACID compliance:
+
+- **Traditional RDBMSs** (Oracle, SQL Server, PostgreSQL, MySQL with InnoDB): Full ACID compliance
+- **NoSQL databases**: Often sacrifice some ACID properties for scalability and performance (typically following BASE principles instead)
+- **NewSQL databases**: Attempt to provide both scalability and ACID properties
+
+## Beyond ACID: Modern Developments
+
+While ACID remains fundamental, distributed systems have introduced additional concepts:
+
+- **CAP Theorem**: States that distributed systems can provide only two of three guarantees: Consistency, Availability, and Partition tolerance
+- **BASE** (Basically Available, Soft state, Eventually consistent): An alternative approach for distributed systems
+- **Saga Pattern**: A sequence of local transactions that maintain eventual consistency in distributed systems
+
+Full ACID compliance remains crucial for applications where data integrity is paramount, such as financial systems, healthcare applications, and critical record-keeping systems.
diff --git a/notes/2025-03-24-db2-data-type-en.md b/notes/2025-03-24-db2-data-type-en.md
@@ -0,0 +1,70 @@
+---
+title: IBM Db2 Data Types
+lang: en
+layout: post
+audio: false
+translated: false
+generated: true
+---
+
+Db2 supports a wide range of data types to accommodate various data storage needs. Here's a comprehensive overview of Db2 data types by category:
+
+## Numeric Data Types
+
+### Integer Types
+- **SMALLINT**: 16-bit integer (-32,768 to 32,767)
+- **INTEGER** or **INT**: 32-bit integer (-2,147,483,648 to 2,147,483,647)
+- **BIGINT**: 64-bit integer (approximately -9.2E18 to 9.2E18)
+
+### Decimal Types
+- **DECIMAL(p,s)** or **DEC(p,s)** or **NUMERIC(p,s)**: Exact decimal values where 'p' is precision (total digits) and 's' is scale (decimal places)
+- **DECFLOAT**: Decimal floating-point that follows the IEEE 754r standard, available in 16-digit or 34-digit precision
+
+### Floating-Point Types
+- **REAL** or **FLOAT(1-24)**: 32-bit floating-point number
+- **DOUBLE** or **DOUBLE PRECISION** or **FLOAT(25-53)**: 64-bit floating-point number
+
+## Character Data Types
+
+### Fixed-Length
+- **CHAR(n)** or **CHARACTER(n)**: Fixed-length character string (1-255 bytes)
+
+### Variable-Length
+- **VARCHAR(n)** or **CHARACTER VARYING(n)**: Variable-length character string (1-32,672 bytes)
+- **CLOB** or **CHARACTER LARGE OBJECT**: Large character objects (up to 2GB)
+
+### Graphic Types (for DBCS characters)
+- **GRAPHIC(n)**: Fixed-length double-byte string
+- **VARGRAPHIC(n)**: Variable-length double-byte string
+- **DBCLOB**: Double-byte character large object
+
+## Binary Data Types
+- **BINARY(n)**: Fixed-length binary data
+- **VARBINARY(n)** or **BINARY VARYING(n)**: Variable-length binary data
+- **BLOB** or **BINARY LARGE OBJECT**: Binary large objects (up to 2GB)
+
+## Date and Time Data Types
+- **DATE**: Calendar date (YYYY-MM-DD)
+- **TIME**: Time of day (HH:MM:SS)
+- **TIMESTAMP**: Date and time (YYYY-MM-DD-HH.MM.SS.nnnnnn)
+- **TIMESTAMP WITH TIME ZONE**: Timestamp that includes time zone information
+
+## Boolean Type
+- **BOOLEAN**: TRUE, FALSE, or NULL values
+
+## XML Data Type
+- **XML**: Stores XML documents in their hierarchical form
+
+## Row Types
+- **ROW**: Structured type comprising a sequence of named fields with associated data types
+
+## User-Defined Types
+- **Distinct Types**: Based on built-in data types but incompatible with their source types
+- **Structured Types**: User-defined types with attributes and methods
+
+## Special Types
+- **ROWID**: Row identifier, a value that uniquely identifies a row
+- **DATALINK**: References to external files
+- **ARRAY**: Ordered collection of elements of the same type
+
+Each data type in Db2 has specific storage requirements, performance characteristics, and usage scenarios. The choice of data type affects database performance, storage efficiency, and data integrity.
diff --git a/notes/2025-03-24-db2-history-en.md b/notes/2025-03-24-db2-history-en.md
@@ -0,0 +1,8 @@
+---
+title: db2-history
+lang: en
+layout: post
+audio: false
+translated: false
+generated: true
+---
diff --git a/notes/2025-03-24-db2-postgresql-en.md b/notes/2025-03-24-db2-postgresql-en.md
@@ -0,0 +1,173 @@
+---
+title: IBM DB2 vs PostgreSQL
+lang: en
+layout: post
+audio: false
+translated: false
+generated: true
+---
+
+Here's a comprehensive comparison between IBM DB2 and PostgreSQL across multiple dimensions:
+
+## Foundational Differences
+
+**IBM DB2:**
+- Commercial database management system developed by IBM
+- Primarily designed for enterprise environments with high-volume transaction processing
+- Available in multiple editions targeting different platforms (z/OS, Linux, UNIX, Windows)
+- Closed-source with proprietary licensing model
+
+**PostgreSQL:**
+- Open-source, community-driven RDBMS
+- Started as an academic project at UC Berkeley
+- Available on virtually all operating systems
+- Free to use and modify under PostgreSQL License (similar to MIT/BSD)
+
+## Architecture
+
+**IBM DB2:**
+- Leverages a shared-disk architecture in some configurations
+- Uses buffer pools, tablespaces, and storage groups for data organization
+- Instance-based architecture with database partitioning capabilities
+- Offers pureScale clustering for high availability
+
+**PostgreSQL:**
+- Process-based architecture with a postmaster process that spawns backend processes
+- Uses a multiversion concurrency control (MVCC) system to handle concurrent transactions
+- Shared-nothing architecture that scales horizontally through solutions like Postgres-XL
+- Uses a write-ahead log (WAL) for durability and recovery
+
+## Performance Characteristics
+
+**IBM DB2:**
+- Excels at high-volume OLTP workloads, especially on mainframe systems
+- Advanced query optimization with cost-based optimizer
+- Built-in workload management for prioritizing resources
+- Memory optimized tables and columnar storage options
+- Adaptive compression technologies
+
+**PostgreSQL:**
+- Strong performance for mixed workloads (OLTP and OLAP)
+- Excellent geospatial query performance with PostGIS extension
+- Advanced indexing capabilities (B-tree, GiST, GIN, SP-GiST, BRIN)
+- Parallel query execution for analytical workloads
+- Table partitioning for improved query performance
+
+## SQL Compliance and Extensions
+
+**IBM DB2:**
+- High SQL standards compliance
+- Robust support for SQL PL (procedural language)
+- Integrated XML capabilities with pureXML
+- JSON support with specialized functions
+- Support for temporal data and bi-temporal tables
+
+**PostgreSQL:**
+- Strong ANSI SQL compliance
+- Rich procedural language support (PL/pgSQL, PL/Python, PL/Perl, etc.)
+- Native support for JSON/JSONB with powerful operators
+- Custom data types and operators
+- Advanced array support and range types
+- Full-text search capabilities
+
+## Security Features
+
+**IBM DB2:**
+- Row and column level access control (RCAC)
+- Label-based access control (LBAC)
+- Robust auditing capabilities
+- Integration with enterprise security frameworks
+- Advanced encryption for data at rest and in transit
+
+**PostgreSQL:**
+- Role-based access control
+- Row-level security policies
+- Column-level privileges
+- SSL support for encrypted connections
+- Password policies and authentication methods
+- Integration with external authentication systems (LDAP, Kerberos)
+
+## High Availability and Disaster Recovery
+
+**IBM DB2:**
+- HADR (High Availability Disaster Recovery)
+- pureScale clustering for near-continuous availability
+- Log shipping and read-on-standby capabilities
+- Q-replication for low-latency replication
+
+**PostgreSQL:**
+- Streaming replication with hot standby
+- Logical replication for version 10+
+- Point-in-time recovery
+- Third-party solutions like Patroni for automated failover
+- Connection pooling with pgBouncer or Pgpool-II
+
+## Management and Administration
+
+**IBM DB2:**
+- IBM Data Studio and IBM Data Server Manager for administration
+- Comprehensive monitoring through IBM tooling
+- Automated maintenance and health monitoring
+- Automated storage management
+
+**PostgreSQL:**
+- Various open-source and commercial tools (pgAdmin, DBeaver, etc.)
+- Command-line utilities (psql, pg_dump, etc.)
+- Extensible statistics collection
+- Vacuum process for reclaiming space
+- Manual configuration with high tunability
+
+## Cost Structure
+
+**IBM DB2:**
+- Significant licensing costs based on cores/PVUs
+- Different editions with varying costs
+- Support and maintenance contracts
+- Additional costs for advanced features and tools
+
+**PostgreSQL:**
+- Free to use regardless of scale or application
+- No licensing costs
+- Support available through community or commercial vendors
+- Commercial hosting options available
+
+## Ecosystem and Extensions
+
+**IBM DB2:**
+- Integration with IBM analytics suite
+- Limited third-party extensions
+- Enterprise-focused tooling ecosystem
+
+**PostgreSQL:**
+- Rich ecosystem of extensions (PostGIS, TimescaleDB, etc.)
+- Active community development
+- Wide range of third-party tools and integrations
+- Strong ORM support across programming languages
+
+## Cloud Deployments
+
+**IBM DB2:**
+- Available on IBM Cloud and other major cloud providers
+- DB2 Warehouse for cloud data warehousing
+- Managed service options with limited customization
+
+**PostgreSQL:**
+- Available on all major cloud platforms as managed services
+- Numerous specialized PostgreSQL-as-a-service offerings
+- Highly customizable deployments
+
+## Use Case Fit
+
+**IBM DB2:**
+- Ideal for large enterprises with existing IBM infrastructure
+- Mission-critical applications requiring maximum reliability
+- High-performance OLTP systems, especially on mainframe
+- Legacy system integration
+
+**PostgreSQL:**
+- Well-suited for a wide range of applications from small to enterprise
+- Web applications and services
+- Geospatial applications (with PostGIS)
+- Applications requiring complex data types or JSON/document storage
+- Startups and cost-sensitive organizations
+