Skip to content

Commit b3b6dc3

Browse files
committed
[csv] shorten and move quote rules to first example
1 parent 49be924 commit b3b6dc3

File tree

1 file changed

+41
-72
lines changed

1 file changed

+41
-72
lines changed

csv.md

Lines changed: 41 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,94 +1,63 @@
11
---
2-
language: CSV
2+
name: CSV
33
contributors:
4-
- [Timon Erhart, 'https://github.com/turbotimon/']
4+
- [Timon Erhart, 'https://github.com/turbotimon/']
55
---
66

7-
CSV (Comma-Separated Values) is a lightweight file format used to store tabular
8-
data in plain text, designed for easy data exchange between programs,
9-
particularly spreadsheets and databases. Its simplicity and human readability
10-
have made it a cornerstone of data interoperability. It is often used for
11-
moving data between programs with incompatible or proprietary formats.
12-
13-
While RFC 4180 provides a standard for the format, in practice, the term "CSV"
14-
is often used more broadly to refer to any text file that:
15-
16-
- Can be interpreted as tabular data
17-
- Uses a delimiter to separate fields (columns)
18-
- Uses line breaks to separate records (rows)
19-
- Optionally includes a header in the first row
7+
CSV (Comma-Separated Values) is a file format used to store tabular
8+
data in plain text. Its simplicity and human readability
9+
have made it a cornerstone of data interoperability.
2010

2111
```csv
22-
Name, Age, DateOfBirth
23-
Alice, 30, 1993-05-14
24-
Bob, 25, 1998-11-02
25-
Charlie, 35, 1988-03-21
12+
Name,Age,DateOfBirth,Comment
13+
Alice,30,1993-05-14,
14+
Bob,25,1998-11-02,
15+
Eve,,,data might be missing because it's just text
16+
"Charlie Brown",35,1988-03-21,strings can be quoted
17+
"Louis XIV, King of France",76,1638-09-05,strings containing commas must be quoted
18+
"Walter ""The Danger"" White",52,1958-09-07,quotes in quotes are escaped by doubling them up
19+
Joe Smith,33,1990-06-02,"multi line strings
20+
span multiple lines
21+
there are no escape characters"
2622
```
2723

28-
## Delimiters for Rows and Columns
24+
The first row might be a header of field names or there might be no header and the first
25+
line is just data.
2926

30-
Rows are typically separated by line breaks (`\n` or `\r\n`), while columns
31-
(fields) are separated by a specific delimiter. Although commas are the most
32-
common delimiter for fields, other characters, such as semicolons (`;`), are
33-
commonly used in regions where commas are decimal separators (e.g., Germany).
34-
Tabs (`\t`) are also used as delimiters in some cases, with such files often
35-
referred to as "TSV" (Tab-Separated Values).
27+
## Delimiters
3628

37-
Example using semicolons as delimiter and comma for decimal separator:
29+
Rows are separated by line breaks (`\n` or `\r\n`), columns are separated by a comma.
3830

39-
```csv
40-
Name; Age; Grade
41-
Alice; 30; 50,50
42-
Bob; 25; 45,75
43-
Charlie; 35; 60,00
44-
```
31+
Tabs (`\t`) are sometimes used instead of commas and those files are called "TSVs"
32+
(Tab-Separated Values). They are easier to paste into Excel.
4533

46-
## Data Types
47-
48-
CSV files do not inherently define data types. Numbers and dates are stored as
49-
plain text, and their interpretation depends on the software importing the
50-
file. Typically, data is interpreted as follows:
34+
Occasionally other characters can be used, for example semicolons (`;`) may be used
35+
in Europe because commas are [decimal separators](https://en.wikipedia.org/wiki/Decimal_separator)
36+
instead of the decimal point.
5137

5238
```csv
53-
Data, Comment
54-
100, Interpreted as a number (integer)
55-
100.00, Interpreted as a number (floating-point)
56-
2024-12-03, Interpreted as a date or a string (depending on the parser)
57-
Hello World, Interpreted as text (string)
58-
"1234", Interpreted as text instead of a number
39+
Name;Age;Grade
40+
Alice;30;50,50
41+
Bob;25;45,75
42+
Charlie;35;60,00
5943
```
6044

61-
## Quoting Strings and Special Characters
45+
## Data Types
6246

63-
Quoting strings is only required if the string contains the delimiter, special
64-
characters, or otherwise could be interpreted as a number. However, it is
65-
often considered good practice to quote all strings to enhance readability and
66-
robustness.
47+
CSV files do not inherently define data types. Numbers and dates are stored as
48+
text. Interpreting and parsing them is left up to software using them.
49+
Typically, data is interpreted as follows:
6750

6851
```csv
69-
Quoting strings examples,
70-
Unquoted string,
71-
"Optionally quoted string (good practice)",
72-
"If it contains the delimiter, it needs to be quoted",
73-
"Also, if it contains special characters like \n newlines or \t tabs",
74-
"The quoting "" character itself typically is escaped by doubling the quote ("")",
75-
"or in some systems with a backslash \" (like other escapes)",
52+
Data,Comment
53+
100,Interpreted as a number (integer)
54+
100.00,Interpreted as a number (floating-point)
55+
2024-12-03,Interpreted as a date or a string (depending on the parser)
56+
Hello World,Interpreted as text (string)
57+
"1234",Interpreted as text instead of a number
7658
```
7759

78-
However, make sure that for one document, the quoting method is consistent.
79-
For example, the last two examples of quoting with either "" or \" would
80-
not be consistent and could cause problems.
81-
82-
## Encoding
83-
84-
Different encodings are used. Most modern CSV files use UTF-8 encoding, but
85-
older systems might use others like ASCII or ISO-8859.
86-
87-
If the file is transferred or shared between different systems, it is a good
88-
practice to explicitly define the encoding used, to avoid issues with
89-
character misinterpretation.
90-
91-
## More Resources
60+
## Further reading
9261

93-
+ [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
94-
+ [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)
62+
* [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
63+
* [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)

0 commit comments

Comments
 (0)