Skip to content

Commit 86a2cd3

Browse files
committed
docs(notes): add 2025-04-14-csv-compare-en.md
1 parent b9ef734 commit 86a2cd3

File tree

1 file changed

+143
-0
lines changed

1 file changed

+143
-0
lines changed

notes/2025-04-14-csv-compare-en.md

+143
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
---
2+
title: Comparing Two Single-row CSV
3+
lang: en
4+
layout: post
5+
audio: false
6+
translated: false
7+
generated: true
8+
---
9+
10+
Below is a Python script that compares two single-row CSV files containing strings (e.g., "P0001") and identifies:
11+
- Common items (present in both files)
12+
- Items unique to the first file
13+
- Items unique to the second file
14+
15+
The script uses `argparse` to accept two CSV file paths, reads the first row from each, performs a set-based comparison, and writes the results to `comparison.txt`.
16+
17+
```python
18+
#!/usr/bin/env python3
19+
import argparse
20+
import csv
21+
import os
22+
23+
def main():
24+
# Set up argument parser
25+
parser = argparse.ArgumentParser(description='Compare two single-row CSV files for common and unique items')
26+
parser.add_argument('file1_path', help='Path to the first CSV file')
27+
parser.add_argument('file2_path', help='Path to the second CSV file')
28+
29+
# Parse arguments
30+
args = parser.parse_args()
31+
32+
try:
33+
# Read the first row of the first CSV file
34+
with open(args.file1_path, 'r') as file1:
35+
reader1 = csv.reader(file1)
36+
for row in reader1:
37+
values1 = row
38+
break
39+
40+
# Read the first row of the second CSV file
41+
with open(args.file2_path, 'r') as file2:
42+
reader2 = csv.reader(file2)
43+
for row in reader2:
44+
values2 = row
45+
break
46+
47+
# Convert to sets for comparison
48+
set1 = set(values1)
49+
set2 = set(values2)
50+
51+
# Find common and unique items
52+
common_items = set1.intersection(set2)
53+
unique_to_file1 = set1.difference(set2)
54+
unique_to_file2 = set2.difference(set1)
55+
56+
# Determine output directory (use directory of first file)
57+
output_dir = os.path.dirname(args.file1_path) or '.'
58+
output_file = os.path.join(output_dir, 'comparison.txt')
59+
60+
# Write results to output file
61+
with open(output_file, 'w') as file:
62+
file.write("Comparison Results:\n\n")
63+
64+
file.write("Common Items (in both files):\n")
65+
if common_items:
66+
for item in sorted(common_items):
67+
file.write(f"{item}\n")
68+
else:
69+
file.write("None\n")
70+
71+
file.write("\nItems only in first file:\n")
72+
if unique_to_file1:
73+
for item in sorted(unique_to_file1):
74+
file.write(f"{item}\n")
75+
else:
76+
file.write("None\n")
77+
78+
file.write("\nItems only in second file:\n")
79+
if unique_to_file2:
80+
for item in sorted(unique_to_file2):
81+
file.write(f"{item}\n")
82+
else:
83+
file.write("None\n")
84+
85+
print(f"Comparison complete. Results saved to {output_file}")
86+
87+
except FileNotFoundError as e:
88+
print(f"Error: File not found - {e}")
89+
except Exception as e:
90+
print(f"Error: {e}")
91+
92+
if __name__ == "__main__":
93+
main()
94+
```
95+
96+
### How it works:
97+
1. **Input**: Takes two CSV file paths via command-line arguments.
98+
2. **Reading**: Reads only the first row from each CSV file.
99+
3. **Comparison**:
100+
- Converts the rows to sets for efficient comparison.
101+
- Finds:
102+
- **Common items**: Items present in both rows (set intersection).
103+
- **Unique to first file**: Items only in the first row (set difference: file1 - file2).
104+
- **Unique to second file**: Items only in the second row (set difference: file2 - file1).
105+
4. **Output**: Writes results to `comparison.txt` in the directory of the first file, listing:
106+
- Common items
107+
- Items only in the first file
108+
- Items only in the second file
109+
- Each section is sorted alphabetically; "None" is written if a section is empty.
110+
5. **Error Handling**: Catches file-not-found errors and other exceptions.
111+
112+
### Example usage:
113+
Save the script as `compare_csv.py`. Create two CSV files, e.g.:
114+
- `file1.csv`: `P0001,P0002,P0003`
115+
- `file2.csv`: `P0001,P0004,P0003`
116+
117+
Run the script:
118+
```
119+
python compare_csv.py file1.csv file2.csv
120+
```
121+
122+
**Output** (`comparison.txt`):
123+
```
124+
Comparison Results:
125+
126+
Common Items (in both files):
127+
P0001
128+
P0003
129+
130+
Items only in first file:
131+
P0002
132+
133+
Items only in second file:
134+
P0004
135+
```
136+
137+
### Notes:
138+
- The script ignores the position of items and focuses only on presence/absence.
139+
- Duplicate items in a row are automatically handled (sets remove duplicates).
140+
- The output is written to `comparison.txt` in the directory of the first input file.
141+
- If a file is empty or has no row, the script will treat it as an empty set.
142+
143+
Would you like me to modify the script (e.g., change the output format or add more details)?

0 commit comments

Comments
 (0)