|
1514 | 1514 | {
|
1515 | 1515 | "data": {
|
1516 | 1516 | "text/plain": [
|
1517 |
| - "<pandas.io.parsers.readers.TextFileReader at 0x13295f2d0>" |
| 1517 | + "<pandas.io.parsers.readers.TextFileReader at 0x13442aa10>" |
1518 | 1518 | ]
|
1519 | 1519 | },
|
1520 | 1520 | "execution_count": 16,
|
|
1745 | 1745 | },
|
1746 | 1746 | {
|
1747 | 1747 | "cell_type": "markdown",
|
1748 |
| - "id": "1d72fdfb", |
| 1748 | + "id": "0f9db2a8-7291-4db9-89bc-ef5def432dae", |
1749 | 1749 | "metadata": {},
|
1750 | 1750 | "source": [
|
1751 | 1751 | "## Arbeiten mit dem csv-Modul von Python\n",
|
|
1756 | 1756 | {
|
1757 | 1757 | "cell_type": "code",
|
1758 | 1758 | "execution_count": 25,
|
1759 |
| - "id": "1207f91c", |
| 1759 | + "id": "d4ed9b30-594c-4e83-a5f2-460b36cb6bab", |
1760 | 1760 | "metadata": {},
|
1761 | 1761 | "outputs": [
|
1762 | 1762 | {
|
|
1782 | 1782 | " print(line)"
|
1783 | 1783 | ]
|
1784 | 1784 | },
|
| 1785 | + { |
| 1786 | + "cell_type": "markdown", |
| 1787 | + "id": "0ed726c4-5e09-4676-bcf0-f78e9f7a10e0", |
| 1788 | + "metadata": {}, |
| 1789 | + "source": [ |
| 1790 | + "Mit [Sniffer.has_header](https://docs.python.org/3/library/csv.html#csv.Sniffer.has_header) wird eure csv-Datei analysiert und gibt ``True`` zurück, wenn die erste Zeile eine Reihe von Spaltenüberschriften zu sein scheint.\n", |
| 1791 | + "\n", |
| 1792 | + "<div class=\"alert alert-block alert-info\">\n", |
| 1793 | + "\n", |
| 1794 | + "**Bemerkung:**\n", |
| 1795 | + "\n", |
| 1796 | + "Diese Methode ist nur eine grobe Heuristik und kann sowohl falsch-positive als auch falsch-negative Ergebnisse liefern.\n", |
| 1797 | + "</div>" |
| 1798 | + ] |
| 1799 | + }, |
| 1800 | + { |
| 1801 | + "cell_type": "markdown", |
| 1802 | + "id": "a19c05c1-e947-471b-8089-8e36e65b4268", |
| 1803 | + "metadata": {}, |
| 1804 | + "source": [ |
| 1805 | + "Auch [Sniffer.sniff](https://docs.python.org/3/library/csv.html#csv.Sniffer.sniff) analysiert eure csv-Datei, gibt aber eine der folgenden Dialekt-Unterklassen zurück." |
| 1806 | + ] |
| 1807 | + }, |
| 1808 | + { |
| 1809 | + "cell_type": "code", |
| 1810 | + "execution_count": 26, |
| 1811 | + "id": "263a8cb4-4ae1-46f0-963f-9d2df2de45ed", |
| 1812 | + "metadata": {}, |
| 1813 | + "outputs": [ |
| 1814 | + { |
| 1815 | + "name": "stdout", |
| 1816 | + "output_type": "stream", |
| 1817 | + "text": [ |
| 1818 | + "['', 'Titel', 'Sprache', 'Autor*innen', 'Lizenz', 'Veröffentlichungsdatum', 'doi']\n", |
| 1819 | + "['0', 'Python basics', 'en', 'Veit Schiele', '', '2021-10-28', '']\n", |
| 1820 | + "['1', 'Jupyter Tutorial', 'en', 'Veit Schiele', '', '2019-06-27', '']\n", |
| 1821 | + "['2', 'Jupyter Tutorial', 'de', 'Veit Schiele', '', '2020-10-26', '']\n", |
| 1822 | + "['3', 'PyViz Tutorial', 'en', 'Veit Schiele', '', '2020-04-13', '']\n" |
| 1823 | + ] |
| 1824 | + } |
| 1825 | + ], |
| 1826 | + "source": [ |
| 1827 | + "with open('out.csv') as f:\n", |
| 1828 | + " dialect = csv.Sniffer().sniff(f.read(1024))\n", |
| 1829 | + " f.seek(0)\n", |
| 1830 | + " reader = csv.reader(f, dialect)\n", |
| 1831 | + "\n", |
| 1832 | + " for line in reader:\n", |
| 1833 | + " print(line)" |
| 1834 | + ] |
| 1835 | + }, |
1785 | 1836 | {
|
1786 | 1837 | "cell_type": "markdown",
|
1787 | 1838 | "id": "e70392b5",
|
|
1791 | 1842 | "\n",
|
1792 | 1843 | "csv-Dateien gibt es in vielen verschiedenen Varianten. Das Python csv-Modul kommt bereits mit drei verschiedenen Dialekten:\n",
|
1793 | 1844 | "\n",
|
1794 |
| - "Parameter | excel | excel-tab | unix\n", |
| 1845 | + "Parameter | [excel](https://docs.python.org/3/library/csv.html#csv.excel) | [excel-tab](https://docs.python.org/3/library/csv.html#csv.excel_tab) | [unix](https://docs.python.org/3/library/csv.html#csv.unix_dialect)\n", |
1795 | 1846 | ":--- | :--- | :--- | :--- \n",
|
1796 | 1847 | "`delimiter` | `','` | `'\\t'` | `','` |\n",
|
1797 | 1848 | "`quotechar` | `'\"'` | `'\"'` | ` '\"'` |\n",
|
|
1816 | 1867 | },
|
1817 | 1868 | {
|
1818 | 1869 | "cell_type": "code",
|
1819 |
| - "execution_count": 26, |
| 1870 | + "execution_count": 27, |
1820 | 1871 | "id": "8d765adf",
|
1821 | 1872 | "metadata": {},
|
1822 | 1873 | "outputs": [],
|
|
1840 | 1891 | },
|
1841 | 1892 | {
|
1842 | 1893 | "cell_type": "code",
|
1843 |
| - "execution_count": 27, |
| 1894 | + "execution_count": 28, |
1844 | 1895 | "id": "69fff7dd",
|
1845 | 1896 | "metadata": {},
|
1846 | 1897 | "outputs": [
|
|
1873 | 1924 | },
|
1874 | 1925 | {
|
1875 | 1926 | "cell_type": "code",
|
1876 |
| - "execution_count": 28, |
| 1927 | + "execution_count": 29, |
1877 | 1928 | "id": "e9c0a9c2",
|
1878 | 1929 | "metadata": {},
|
1879 | 1930 | "outputs": [
|
|
1898 | 1949 | " 'doi': ('', '', '', '')}"
|
1899 | 1950 | ]
|
1900 | 1951 | },
|
1901 |
| - "execution_count": 28, |
| 1952 | + "execution_count": 29, |
1902 | 1953 | "metadata": {},
|
1903 | 1954 | "output_type": "execute_result"
|
1904 | 1955 | }
|
|
1923 | 1974 | },
|
1924 | 1975 | {
|
1925 | 1976 | "cell_type": "code",
|
1926 |
| - "execution_count": 29, |
| 1977 | + "execution_count": 30, |
1927 | 1978 | "id": "5a43af52",
|
1928 | 1979 | "metadata": {},
|
1929 | 1980 | "outputs": [],
|
|
1937 | 1988 | },
|
1938 | 1989 | {
|
1939 | 1990 | "cell_type": "code",
|
1940 |
| - "execution_count": 30, |
| 1991 | + "execution_count": 31, |
1941 | 1992 | "id": "a65c4cef",
|
1942 | 1993 | "metadata": {},
|
1943 | 1994 | "outputs": [
|
|
1949 | 2000 | " '2,Jupyter Tutorial,en,Veit Schiele\\n']"
|
1950 | 2001 | ]
|
1951 | 2002 | },
|
1952 |
| - "execution_count": 30, |
| 2003 | + "execution_count": 31, |
1953 | 2004 | "metadata": {},
|
1954 | 2005 | "output_type": "execute_result"
|
1955 | 2006 | }
|
|
1975 | 2026 | "name": "python",
|
1976 | 2027 | "nbconvert_exporter": "python",
|
1977 | 2028 | "pygments_lexer": "ipython3",
|
1978 |
| - "version": "3.11.4" |
| 2029 | + "version": "3.11.10" |
1979 | 2030 | },
|
1980 | 2031 | "widgets": {
|
1981 | 2032 | "application/vnd.jupyter.widget-state+json": {
|
|
0 commit comments