Skip to content

Commit 0bbd44d

Browse files
committed
📝 Add csv.Sniffer methods
1 parent 9eee052 commit 0bbd44d

File tree

2 files changed

+64
-12
lines changed

2 files changed

+64
-12
lines changed

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Notfälle, wenn Zweige für ältere Versionen erstellt werden müssen.
2222
Added
2323
~~~~~
2424

25+
* 📝 Add csv.Sniffer methods
2526
* 📝 Add the removal of git lfs
2627

2728
`24.3.0 <https://github.com/cusyio/Python4DataScience-de/compare/24.2.0...24.3.0>`_: 2024-11-19

docs/data-processing/serialisation-formats/csv/example.ipynb

Lines changed: 63 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1514,7 +1514,7 @@
15141514
{
15151515
"data": {
15161516
"text/plain": [
1517-
"<pandas.io.parsers.readers.TextFileReader at 0x13295f2d0>"
1517+
"<pandas.io.parsers.readers.TextFileReader at 0x13442aa10>"
15181518
]
15191519
},
15201520
"execution_count": 16,
@@ -1745,7 +1745,7 @@
17451745
},
17461746
{
17471747
"cell_type": "markdown",
1748-
"id": "1d72fdfb",
1748+
"id": "0f9db2a8-7291-4db9-89bc-ef5def432dae",
17491749
"metadata": {},
17501750
"source": [
17511751
"## Arbeiten mit dem csv-Modul von Python\n",
@@ -1756,7 +1756,7 @@
17561756
{
17571757
"cell_type": "code",
17581758
"execution_count": 25,
1759-
"id": "1207f91c",
1759+
"id": "d4ed9b30-594c-4e83-a5f2-460b36cb6bab",
17601760
"metadata": {},
17611761
"outputs": [
17621762
{
@@ -1782,6 +1782,57 @@
17821782
" print(line)"
17831783
]
17841784
},
1785+
{
1786+
"cell_type": "markdown",
1787+
"id": "0ed726c4-5e09-4676-bcf0-f78e9f7a10e0",
1788+
"metadata": {},
1789+
"source": [
1790+
"Mit [Sniffer.has_header](https://docs.python.org/3/library/csv.html#csv.Sniffer.has_header) wird eure csv-Datei analysiert und gibt ``True`` zurück, wenn die erste Zeile eine Reihe von Spaltenüberschriften zu sein scheint.\n",
1791+
"\n",
1792+
"<div class=\"alert alert-block alert-info\">\n",
1793+
"\n",
1794+
"**Bemerkung:**\n",
1795+
"\n",
1796+
"Diese Methode ist nur eine grobe Heuristik und kann sowohl falsch-positive als auch falsch-negative Ergebnisse liefern.\n",
1797+
"</div>"
1798+
]
1799+
},
1800+
{
1801+
"cell_type": "markdown",
1802+
"id": "a19c05c1-e947-471b-8089-8e36e65b4268",
1803+
"metadata": {},
1804+
"source": [
1805+
"Auch [Sniffer.sniff](https://docs.python.org/3/library/csv.html#csv.Sniffer.sniff) analysiert eure csv-Datei, gibt aber eine der folgenden Dialekt-Unterklassen zurück."
1806+
]
1807+
},
1808+
{
1809+
"cell_type": "code",
1810+
"execution_count": 26,
1811+
"id": "263a8cb4-4ae1-46f0-963f-9d2df2de45ed",
1812+
"metadata": {},
1813+
"outputs": [
1814+
{
1815+
"name": "stdout",
1816+
"output_type": "stream",
1817+
"text": [
1818+
"['', 'Titel', 'Sprache', 'Autor*innen', 'Lizenz', 'Veröffentlichungsdatum', 'doi']\n",
1819+
"['0', 'Python basics', 'en', 'Veit Schiele', '', '2021-10-28', '']\n",
1820+
"['1', 'Jupyter Tutorial', 'en', 'Veit Schiele', '', '2019-06-27', '']\n",
1821+
"['2', 'Jupyter Tutorial', 'de', 'Veit Schiele', '', '2020-10-26', '']\n",
1822+
"['3', 'PyViz Tutorial', 'en', 'Veit Schiele', '', '2020-04-13', '']\n"
1823+
]
1824+
}
1825+
],
1826+
"source": [
1827+
"with open('out.csv') as f:\n",
1828+
" dialect = csv.Sniffer().sniff(f.read(1024))\n",
1829+
" f.seek(0)\n",
1830+
" reader = csv.reader(f, dialect)\n",
1831+
"\n",
1832+
" for line in reader:\n",
1833+
" print(line)"
1834+
]
1835+
},
17851836
{
17861837
"cell_type": "markdown",
17871838
"id": "e70392b5",
@@ -1791,7 +1842,7 @@
17911842
"\n",
17921843
"csv-Dateien gibt es in vielen verschiedenen Varianten. Das Python csv-Modul kommt bereits mit drei verschiedenen Dialekten:\n",
17931844
"\n",
1794-
"Parameter | excel | excel-tab | unix\n",
1845+
"Parameter | [excel](https://docs.python.org/3/library/csv.html#csv.excel) | [excel-tab](https://docs.python.org/3/library/csv.html#csv.excel_tab) | [unix](https://docs.python.org/3/library/csv.html#csv.unix_dialect)\n",
17951846
":--- | :--- | :--- | :--- \n",
17961847
"`delimiter` | `','` | `'\\t'` | `','` |\n",
17971848
"`quotechar` | `'\"'` | `'\"'` | ` '\"'` |\n",
@@ -1816,7 +1867,7 @@
18161867
},
18171868
{
18181869
"cell_type": "code",
1819-
"execution_count": 26,
1870+
"execution_count": 27,
18201871
"id": "8d765adf",
18211872
"metadata": {},
18221873
"outputs": [],
@@ -1840,7 +1891,7 @@
18401891
},
18411892
{
18421893
"cell_type": "code",
1843-
"execution_count": 27,
1894+
"execution_count": 28,
18441895
"id": "69fff7dd",
18451896
"metadata": {},
18461897
"outputs": [
@@ -1873,7 +1924,7 @@
18731924
},
18741925
{
18751926
"cell_type": "code",
1876-
"execution_count": 28,
1927+
"execution_count": 29,
18771928
"id": "e9c0a9c2",
18781929
"metadata": {},
18791930
"outputs": [
@@ -1898,7 +1949,7 @@
18981949
" 'doi': ('', '', '', '')}"
18991950
]
19001951
},
1901-
"execution_count": 28,
1952+
"execution_count": 29,
19021953
"metadata": {},
19031954
"output_type": "execute_result"
19041955
}
@@ -1923,7 +1974,7 @@
19231974
},
19241975
{
19251976
"cell_type": "code",
1926-
"execution_count": 29,
1977+
"execution_count": 30,
19271978
"id": "5a43af52",
19281979
"metadata": {},
19291980
"outputs": [],
@@ -1937,7 +1988,7 @@
19371988
},
19381989
{
19391990
"cell_type": "code",
1940-
"execution_count": 30,
1991+
"execution_count": 31,
19411992
"id": "a65c4cef",
19421993
"metadata": {},
19431994
"outputs": [
@@ -1949,7 +2000,7 @@
19492000
" '2,Jupyter Tutorial,en,Veit Schiele\\n']"
19502001
]
19512002
},
1952-
"execution_count": 30,
2003+
"execution_count": 31,
19532004
"metadata": {},
19542005
"output_type": "execute_result"
19552006
}
@@ -1975,7 +2026,7 @@
19752026
"name": "python",
19762027
"nbconvert_exporter": "python",
19772028
"pygments_lexer": "ipython3",
1978-
"version": "3.11.4"
2029+
"version": "3.11.10"
19792030
},
19802031
"widgets": {
19812032
"application/vnd.jupyter.widget-state+json": {

0 commit comments

Comments
 (0)