Skip to content

Commit 7b79257

Browse files
committed
[csv] shorten and move quote rules to first example
1 parent 49be924 commit 7b79257

File tree

1 file changed

+40
-72
lines changed

1 file changed

+40
-72
lines changed

csv.md

Lines changed: 40 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,94 +1,62 @@
11
---
2-
language: CSV
2+
name: CSV
33
contributors:
4-
- [Timon Erhart, 'https://github.com/turbotimon/']
4+
- [Timon Erhart, 'https://github.com/turbotimon/']
55
---
66

7-
CSV (Comma-Separated Values) is a lightweight file format used to store tabular
8-
data in plain text, designed for easy data exchange between programs,
9-
particularly spreadsheets and databases. Its simplicity and human readability
10-
have made it a cornerstone of data interoperability. It is often used for
11-
moving data between programs with incompatible or proprietary formats.
12-
13-
While RFC 4180 provides a standard for the format, in practice, the term "CSV"
14-
is often used more broadly to refer to any text file that:
15-
16-
- Can be interpreted as tabular data
17-
- Uses a delimiter to separate fields (columns)
18-
- Uses line breaks to separate records (rows)
19-
- Optionally includes a header in the first row
7+
CSV (Comma-Separated Values) is a file format used to store tabular
8+
data in plain text.
209

2110
```csv
22-
Name, Age, DateOfBirth
23-
Alice, 30, 1993-05-14
24-
Bob, 25, 1998-11-02
25-
Charlie, 35, 1988-03-21
11+
Name,Age,DateOfBirth,Comment
12+
Alice,30,1993-05-14,
13+
Bob,25,1998-11-02,
14+
Eve,,,data might be missing because it's just text
15+
"Charlie Brown",35,1988-03-21,strings can be quoted
16+
"Louis XIV, King of France",76,1638-09-05,strings containing commas must be quoted
17+
"Walter ""The Danger"" White",52,1958-09-07,quotes are escaped by doubling them up
18+
Joe Smith,33,1990-06-02,"multi line strings
19+
span multiple lines
20+
there are no escape characters"
2621
```
2722

28-
## Delimiters for Rows and Columns
23+
The first row might be a header of field names or there might be no header and
24+
the first line is already data.
2925

30-
Rows are typically separated by line breaks (`\n` or `\r\n`), while columns
31-
(fields) are separated by a specific delimiter. Although commas are the most
32-
common delimiter for fields, other characters, such as semicolons (`;`), are
33-
commonly used in regions where commas are decimal separators (e.g., Germany).
34-
Tabs (`\t`) are also used as delimiters in some cases, with such files often
35-
referred to as "TSV" (Tab-Separated Values).
26+
## Delimiters
3627

37-
Example using semicolons as delimiter and comma for decimal separator:
28+
Rows are separated by line breaks (`\n` or `\r\n`), columns are separated by a comma.
3829

39-
```csv
40-
Name; Age; Grade
41-
Alice; 30; 50,50
42-
Bob; 25; 45,75
43-
Charlie; 35; 60,00
44-
```
30+
Tabs (`\t`) are sometimes used instead of commas and those files are called "TSVs"
31+
(Tab-Separated Values). They are easier to paste into Excel.
4532

46-
## Data Types
47-
48-
CSV files do not inherently define data types. Numbers and dates are stored as
49-
plain text, and their interpretation depends on the software importing the
50-
file. Typically, data is interpreted as follows:
33+
Occasionally other characters can be used, for example semicolons (`;`) may be used
34+
in Europe because commas are [decimal separators](https://en.wikipedia.org/wiki/Decimal_separator)
35+
instead of the decimal point.
5136

5237
```csv
53-
Data, Comment
54-
100, Interpreted as a number (integer)
55-
100.00, Interpreted as a number (floating-point)
56-
2024-12-03, Interpreted as a date or a string (depending on the parser)
57-
Hello World, Interpreted as text (string)
58-
"1234", Interpreted as text instead of a number
38+
Name;Age;Grade
39+
Alice;30;50,50
40+
Bob;25;45,75
41+
Charlie;35;60,00
5942
```
6043

61-
## Quoting Strings and Special Characters
44+
## Data Types
6245

63-
Quoting strings is only required if the string contains the delimiter, special
64-
characters, or otherwise could be interpreted as a number. However, it is
65-
often considered good practice to quote all strings to enhance readability and
66-
robustness.
46+
CSV files do not inherently define data types. Numbers and dates are stored as
47+
text. Interpreting and parsing them is left up to software using them.
48+
Typically, data is interpreted as follows:
6749

6850
```csv
69-
Quoting strings examples,
70-
Unquoted string,
71-
"Optionally quoted string (good practice)",
72-
"If it contains the delimiter, it needs to be quoted",
73-
"Also, if it contains special characters like \n newlines or \t tabs",
74-
"The quoting "" character itself typically is escaped by doubling the quote ("")",
75-
"or in some systems with a backslash \" (like other escapes)",
51+
Data,Comment
52+
100,Interpreted as a number (integer)
53+
100.00,Interpreted as a number (floating-point)
54+
2024-12-03,Interpreted as a date or a string (depending on the parser)
55+
Hello World,Interpreted as text (string)
56+
"1234",Interpreted as text instead of a number
7657
```
7758

78-
However, make sure that for one document, the quoting method is consistent.
79-
For example, the last two examples of quoting with either "" or \" would
80-
not be consistent and could cause problems.
81-
82-
## Encoding
83-
84-
Different encodings are used. Most modern CSV files use UTF-8 encoding, but
85-
older systems might use others like ASCII or ISO-8859.
86-
87-
If the file is transferred or shared between different systems, it is a good
88-
practice to explicitly define the encoding used, to avoid issues with
89-
character misinterpretation.
90-
91-
## More Resources
59+
## Further reading
9260

93-
+ [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
94-
+ [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)
61+
* [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
62+
* [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)

0 commit comments

Comments
 (0)