|
1 | 1 | ---
|
2 |
| -language: CSV |
| 2 | +name: CSV |
3 | 3 | contributors:
|
4 |
| -- [Timon Erhart, 'https://github.com/turbotimon/'] |
| 4 | + - [Timon Erhart, 'https://github.com/turbotimon/'] |
5 | 5 | ---
|
6 | 6 |
|
7 |
| -CSV (Comma-Separated Values) is a lightweight file format used to store tabular |
8 |
| -data in plain text, designed for easy data exchange between programs, |
9 |
| -particularly spreadsheets and databases. Its simplicity and human readability |
10 |
| -have made it a cornerstone of data interoperability. It is often used for |
11 |
| -moving data between programs with incompatible or proprietary formats. |
12 |
| - |
13 |
| -While RFC 4180 provides a standard for the format, in practice, the term "CSV" |
14 |
| - is often used more broadly to refer to any text file that: |
15 |
| - |
16 |
| -- Can be interpreted as tabular data |
17 |
| -- Uses a delimiter to separate fields (columns) |
18 |
| -- Uses line breaks to separate records (rows) |
19 |
| -- Optionally includes a header in the first row |
| 7 | +CSV (Comma-Separated Values) is a file format used to store tabular |
| 8 | +data in plain text. |
20 | 9 |
|
21 | 10 | ```csv
|
22 |
| -Name, Age, DateOfBirth |
23 |
| -Alice, 30, 1993-05-14 |
24 |
| -Bob, 25, 1998-11-02 |
25 |
| -Charlie, 35, 1988-03-21 |
| 11 | +Name,Age,DateOfBirth,Comment |
| 12 | +Alice,30,1993-05-14, |
| 13 | +Bob,25,1998-11-02, |
| 14 | +Eve,,,data might be missing because it's just text |
| 15 | +"Charlie Brown",35,1988-03-21,strings can be quoted |
| 16 | +"Louis XIV, King of France",76,1638-09-05,strings containing commas must be quoted |
| 17 | +"Walter ""The Danger"" White",52,1958-09-07,quotes are escaped by doubling them up |
| 18 | +Joe Smith,33,1990-06-02,"multi line strings |
| 19 | +span multiple lines |
| 20 | +there are no escape characters" |
26 | 21 | ```
|
27 | 22 |
|
28 |
| -## Delimiters for Rows and Columns |
| 23 | +The first row might be a header of field names or there might be no header and |
| 24 | +the first line is already data. |
29 | 25 |
|
30 |
| -Rows are typically separated by line breaks (`\n` or `\r\n`), while columns |
31 |
| - (fields) are separated by a specific delimiter. Although commas are the most |
32 |
| - common delimiter for fields, other characters, such as semicolons (`;`), are |
33 |
| - commonly used in regions where commas are decimal separators (e.g., Germany). |
34 |
| - Tabs (`\t`) are also used as delimiters in some cases, with such files often |
35 |
| - referred to as "TSV" (Tab-Separated Values). |
| 26 | +## Delimiters |
36 | 27 |
|
37 |
| -Example using semicolons as delimiter and comma for decimal separator: |
| 28 | +Rows are separated by line breaks (`\n` or `\r\n`), columns are separated by a comma. |
38 | 29 |
|
39 |
| -```csv |
40 |
| -Name; Age; Grade |
41 |
| -Alice; 30; 50,50 |
42 |
| -Bob; 25; 45,75 |
43 |
| -Charlie; 35; 60,00 |
44 |
| -``` |
| 30 | +Tabs (`\t`) are sometimes used instead of commas and those files are called "TSVs" |
| 31 | +(Tab-Separated Values). They are easier to paste into Excel. |
45 | 32 |
|
46 |
| -## Data Types |
47 |
| - |
48 |
| -CSV files do not inherently define data types. Numbers and dates are stored as |
49 |
| - plain text, and their interpretation depends on the software importing the |
50 |
| - file. Typically, data is interpreted as follows: |
| 33 | +Occasionally other characters can be used, for example semicolons (`;`) may be used |
| 34 | +in Europe because commas are [decimal separators](https://en.wikipedia.org/wiki/Decimal_separator) |
| 35 | +instead of the decimal point. |
51 | 36 |
|
52 | 37 | ```csv
|
53 |
| -Data, Comment |
54 |
| -100, Interpreted as a number (integer) |
55 |
| -100.00, Interpreted as a number (floating-point) |
56 |
| -2024-12-03, Interpreted as a date or a string (depending on the parser) |
57 |
| -Hello World, Interpreted as text (string) |
58 |
| -"1234", Interpreted as text instead of a number |
| 38 | +Name;Age;Grade |
| 39 | +Alice;30;50,50 |
| 40 | +Bob;25;45,75 |
| 41 | +Charlie;35;60,00 |
59 | 42 | ```
|
60 | 43 |
|
61 |
| -## Quoting Strings and Special Characters |
| 44 | +## Data Types |
62 | 45 |
|
63 |
| -Quoting strings is only required if the string contains the delimiter, special |
64 |
| - characters, or otherwise could be interpreted as a number. However, it is |
65 |
| - often considered good practice to quote all strings to enhance readability and |
66 |
| - robustness. |
| 46 | +CSV files do not inherently define data types. Numbers and dates are stored as |
| 47 | +text. Interpreting and parsing them is left up to software using them. |
| 48 | +Typically, data is interpreted as follows: |
67 | 49 |
|
68 | 50 | ```csv
|
69 |
| -Quoting strings examples, |
70 |
| -Unquoted string, |
71 |
| -"Optionally quoted string (good practice)", |
72 |
| -"If it contains the delimiter, it needs to be quoted", |
73 |
| -"Also, if it contains special characters like \n newlines or \t tabs", |
74 |
| -"The quoting "" character itself typically is escaped by doubling the quote ("")", |
75 |
| -"or in some systems with a backslash \" (like other escapes)", |
| 51 | +Data,Comment |
| 52 | +100,Interpreted as a number (integer) |
| 53 | +100.00,Interpreted as a number (floating-point) |
| 54 | +2024-12-03,Interpreted as a date or a string (depending on the parser) |
| 55 | +Hello World,Interpreted as text (string) |
| 56 | +"1234",Interpreted as text instead of a number |
76 | 57 | ```
|
77 | 58 |
|
78 |
| -However, make sure that for one document, the quoting method is consistent. |
79 |
| - For example, the last two examples of quoting with either "" or \" would |
80 |
| - not be consistent and could cause problems. |
81 |
| - |
82 |
| -## Encoding |
83 |
| - |
84 |
| -Different encodings are used. Most modern CSV files use UTF-8 encoding, but |
85 |
| - older systems might use others like ASCII or ISO-8859. |
86 |
| - |
87 |
| -If the file is transferred or shared between different systems, it is a good |
88 |
| - practice to explicitly define the encoding used, to avoid issues with |
89 |
| - character misinterpretation. |
90 |
| - |
91 |
| -## More Resources |
| 59 | +## Further reading |
92 | 60 |
|
93 |
| -+ [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values) |
94 |
| -+ [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180) |
| 61 | +* [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values) |
| 62 | +* [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180) |
0 commit comments