Skip to content

Commit 17188a6

Browse files
committed
[csv] shorten and move quote rules to first example
1 parent 49be924 commit 17188a6

File tree

1 file changed

+31
-62
lines changed

1 file changed

+31
-62
lines changed

csv.md

Lines changed: 31 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,39 @@
11
---
2-
language: CSV
2+
name: CSV
33
contributors:
4-
- [Timon Erhart, 'https://github.com/turbotimon/']
4+
- [Timon Erhart, 'https://github.com/turbotimon/']
55
---
66

7-
CSV (Comma-Separated Values) is a lightweight file format used to store tabular
8-
data in plain text, designed for easy data exchange between programs,
9-
particularly spreadsheets and databases. Its simplicity and human readability
10-
have made it a cornerstone of data interoperability. It is often used for
11-
moving data between programs with incompatible or proprietary formats.
12-
13-
While RFC 4180 provides a standard for the format, in practice, the term "CSV"
14-
is often used more broadly to refer to any text file that:
15-
16-
- Can be interpreted as tabular data
17-
- Uses a delimiter to separate fields (columns)
18-
- Uses line breaks to separate records (rows)
19-
- Optionally includes a header in the first row
7+
CSV (Comma-Separated Values) is a file format used to store tabular
8+
data in plain text. Its simplicity and human readability
9+
have made it a cornerstone of data interoperability.
2010

2111
```csv
22-
Name, Age, DateOfBirth
23-
Alice, 30, 1993-05-14
24-
Bob, 25, 1998-11-02
25-
Charlie, 35, 1988-03-21
12+
Name,Age,DateOfBirth,Comment
13+
Alice,30,1993-05-14,
14+
Bob,25,1998-11-02,
15+
Eve,,,data might be missing because it's just text
16+
"Charlie Brown",35,1988-03-21,strings can be quoted
17+
"Louis XIV, King of France",76,1638-09-05,strings containing commas must be quoted
18+
"Walter ""The Danger"" White",52,1958-09-07,quotes in quotes are escaped by doubling them up
19+
Joe Smith,33,1990-06-02,"multi line strings
20+
span multiple lines
21+
there are no escape characters"
2622
```
2723

28-
## Delimiters for Rows and Columns
24+
The first row might be a header of field names or there might be no header and the first
25+
line is just data.
26+
27+
## Delimiters
28+
29+
Rows are separated by line breaks (`\n` or `\r\n`), columns are separated by a comma.
2930

30-
Rows are typically separated by line breaks (`\n` or `\r\n`), while columns
31-
(fields) are separated by a specific delimiter. Although commas are the most
32-
common delimiter for fields, other characters, such as semicolons (`;`), are
33-
commonly used in regions where commas are decimal separators (e.g., Germany).
34-
Tabs (`\t`) are also used as delimiters in some cases, with such files often
35-
referred to as "TSV" (Tab-Separated Values).
31+
Tabs (`\t`) are sometimes used instead of commas and those files are called "TSVs"
32+
(Tab-Separated Values). They are easier to paste into Excel.
3633

37-
Example using semicolons as delimiter and comma for decimal separator:
34+
Occasionally other characters can be used, for example semicolons (`;`) may be used
35+
in Europe because commas are [decimal separators](https://en.wikipedia.org/wiki/Decimal_separator)
36+
instead of the decimal point.
3837

3938
```csv
4039
Name; Age; Grade
@@ -46,8 +45,8 @@ Charlie; 35; 60,00
4645
## Data Types
4746

4847
CSV files do not inherently define data types. Numbers and dates are stored as
49-
plain text, and their interpretation depends on the software importing the
50-
file. Typically, data is interpreted as follows:
48+
text. Interpreting and parsing them is left up to software using them.
49+
Typically, data is interpreted as follows:
5150

5251
```csv
5352
Data, Comment
@@ -58,37 +57,7 @@ Hello World, Interpreted as text (string)
5857
"1234", Interpreted as text instead of a number
5958
```
6059

61-
## Quoting Strings and Special Characters
62-
63-
Quoting strings is only required if the string contains the delimiter, special
64-
characters, or otherwise could be interpreted as a number. However, it is
65-
often considered good practice to quote all strings to enhance readability and
66-
robustness.
67-
68-
```csv
69-
Quoting strings examples,
70-
Unquoted string,
71-
"Optionally quoted string (good practice)",
72-
"If it contains the delimiter, it needs to be quoted",
73-
"Also, if it contains special characters like \n newlines or \t tabs",
74-
"The quoting "" character itself typically is escaped by doubling the quote ("")",
75-
"or in some systems with a backslash \" (like other escapes)",
76-
```
77-
78-
However, make sure that for one document, the quoting method is consistent.
79-
For example, the last two examples of quoting with either "" or \" would
80-
not be consistent and could cause problems.
81-
82-
## Encoding
83-
84-
Different encodings are used. Most modern CSV files use UTF-8 encoding, but
85-
older systems might use others like ASCII or ISO-8859.
86-
87-
If the file is transferred or shared between different systems, it is a good
88-
practice to explicitly define the encoding used, to avoid issues with
89-
character misinterpretation.
90-
91-
## More Resources
60+
## Further reading
9261

93-
+ [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
94-
+ [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)
62+
* [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
63+
* [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)

0 commit comments

Comments
 (0)