1
1
---
2
- language : CSV
2
+ name : CSV
3
3
contributors :
4
- - [Timon Erhart, 'https://github.com/turbotimon/']
4
+ - [Timon Erhart, 'https://github.com/turbotimon/']
5
5
---
6
6
7
- CSV (Comma-Separated Values) is a lightweight file format used to store tabular
8
- data in plain text, designed for easy data exchange between programs,
9
- particularly spreadsheets and databases. Its simplicity and human readability
10
- have made it a cornerstone of data interoperability. It is often used for
11
- moving data between programs with incompatible or proprietary formats.
12
-
13
- While RFC 4180 provides a standard for the format, in practice, the term "CSV"
14
- is often used more broadly to refer to any text file that:
15
-
16
- - Can be interpreted as tabular data
17
- - Uses a delimiter to separate fields (columns)
18
- - Uses line breaks to separate records (rows)
19
- - Optionally includes a header in the first row
7
+ CSV (Comma-Separated Values) is a file format used to store tabular
8
+ data in plain text. Its simplicity and human readability
9
+ have made it a cornerstone of data interoperability.
20
10
21
11
``` csv
22
- Name, Age, DateOfBirth
23
- Alice, 30, 1993-05-14
24
- Bob, 25, 1998-11-02
25
- Charlie, 35, 1988-03-21
12
+ Name,Age,DateOfBirth,Comment
13
+ Alice,30,1993-05-14,
14
+ Bob,25,1998-11-02,
15
+ Eve,,,data might be missing because it's just text
16
+ "Charlie Brown",35,1988-03-21,strings can be quoted
17
+ "Louis XIV, King of France",76,1638-09-05,strings containing commas must be quoted
18
+ "Walter ""The Danger"" White",52,1958-09-07,quotes in quotes are escaped by doubling them up
19
+ Joe Smith,33,1990-06-02,"multi line strings
20
+ span multiple lines
21
+ there are no escape characters"
26
22
```
27
23
28
- ## Delimiters for Rows and Columns
24
+ The first row might be a header of field names or there might be no header and the first
25
+ line is just data.
26
+
27
+ ## Delimiters
28
+
29
+ Rows are separated by line breaks (` \n ` or ` \r\n ` ), columns are separated by a comma.
29
30
30
- Rows are typically separated by line breaks (` \n ` or ` \r\n ` ), while columns
31
- (fields) are separated by a specific delimiter. Although commas are the most
32
- common delimiter for fields, other characters, such as semicolons (` ; ` ), are
33
- commonly used in regions where commas are decimal separators (e.g., Germany).
34
- Tabs (` \t ` ) are also used as delimiters in some cases, with such files often
35
- referred to as "TSV" (Tab-Separated Values).
31
+ Tabs (` \t ` ) are sometimes used instead of commas and those files are called "TSVs"
32
+ (Tab-Separated Values). They are easier to paste into Excel.
36
33
37
- Example using semicolons as delimiter and comma for decimal separator:
34
+ Occasionally other characters can be used, for example semicolons (` ; ` ) may be used
35
+ in Europe because commas are [ decimal separators] ( https://en.wikipedia.org/wiki/Decimal_separator )
36
+ instead of the decimal point.
38
37
39
38
``` csv
40
39
Name; Age; Grade
@@ -46,8 +45,8 @@ Charlie; 35; 60,00
46
45
## Data Types
47
46
48
47
CSV files do not inherently define data types. Numbers and dates are stored as
49
- plain text, and their interpretation depends on the software importing the
50
- file. Typically, data is interpreted as follows:
48
+ text. Interpreting and parsing them is left up to software using them.
49
+ Typically, data is interpreted as follows:
51
50
52
51
``` csv
53
52
Data, Comment
@@ -58,37 +57,7 @@ Hello World, Interpreted as text (string)
58
57
"1234", Interpreted as text instead of a number
59
58
```
60
59
61
- ## Quoting Strings and Special Characters
62
-
63
- Quoting strings is only required if the string contains the delimiter, special
64
- characters, or otherwise could be interpreted as a number. However, it is
65
- often considered good practice to quote all strings to enhance readability and
66
- robustness.
67
-
68
- ``` csv
69
- Quoting strings examples,
70
- Unquoted string,
71
- "Optionally quoted string (good practice)",
72
- "If it contains the delimiter, it needs to be quoted",
73
- "Also, if it contains special characters like \n newlines or \t tabs",
74
- "The quoting "" character itself typically is escaped by doubling the quote ("")",
75
- "or in some systems with a backslash \" (like other escapes)",
76
- ```
77
-
78
- However, make sure that for one document, the quoting method is consistent.
79
- For example, the last two examples of quoting with either "" or \" would
80
- not be consistent and could cause problems.
81
-
82
- ## Encoding
83
-
84
- Different encodings are used. Most modern CSV files use UTF-8 encoding, but
85
- older systems might use others like ASCII or ISO-8859.
86
-
87
- If the file is transferred or shared between different systems, it is a good
88
- practice to explicitly define the encoding used, to avoid issues with
89
- character misinterpretation.
90
-
91
- ## More Resources
60
+ ## Further reading
92
61
93
- + [ Wikipedia] ( https://en.wikipedia.org/wiki/Comma-separated_values )
94
- + [ RFC 4180] ( https://datatracker.ietf.org/doc/html/rfc4180 )
62
+ * [ Wikipedia] ( https://en.wikipedia.org/wiki/Comma-separated_values )
63
+ * [ RFC 4180] ( https://datatracker.ietf.org/doc/html/rfc4180 )
0 commit comments