Skip to content

Commit b52382d

Browse files
committed
some more basic examples with tablecloth
1 parent 880a487 commit b52382d

File tree

1 file changed

+55
-3
lines changed

1 file changed

+55
-3
lines changed

book/chapter_2_input_output/2_1_loading_data.clj

Lines changed: 55 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
(ns book.chapter-2-input-output.2-1-loading-data)
1+
(ns book.chapter-2-input-output.2-1-loading-data
2+
(:require
3+
[tablecloth.api :as tc]))
24

35
;; # 2.1 How to get data into the notebook
46

@@ -21,6 +23,7 @@
2123
;; TODO: Link to useful explainer on lazy seqs
2224

2325
;; #### With tablecloth
26+
2427
;; For most work involving tabular/columnar data, you'll use tablecloth, Clojure's go-to data
2528
;; wrangling library. These all return a `tech.ml.dataset Dataset` object. The implementation
2629
;; details aren't important now, but `tech.ml.dataset` is the library that allows for efficient
@@ -69,8 +72,57 @@
6972

7073
;; ##### Specify file encoding
7174

72-
;;
75+
;; TODO: does this really matter? test out different file encodings..
7376

7477
;; ##### Normalize values into consistent formats and types
7578

76-
;; Tablecloth makes it easy to apply arbitrary transformations to all values in a given column:
79+
;; Tablecloth makes it easy to apply arbitrary transformations to all values in a given column
80+
81+
;; We can inspect the column metadata with tablecloth:
82+
83+
(-> dataset
84+
(tc/info :columns))
85+
86+
;; Certain types are built-in (it knows what to do convert them, e.g. numbers:)
87+
88+
(-> dataset
89+
(tc/convert-types "CO2" :double)
90+
(tc/info :columns))
91+
92+
;; The full list of magic symbols representing types tablecloth supports comes from the underlying
93+
;; `tech.ml.dataset` library:
94+
(require '[tech.v3.datatype.casting :as casting])
95+
@casting/valid-datatype-set
96+
97+
;; More details on [supported types here](https://github.com/techascent/tech.ml.dataset/blob/master/topics/supported-datatypes.md).
98+
99+
;; You can also process multiple columns at once, either by specifying a map of columns to data types:
100+
101+
(-> dataset
102+
(tc/convert-types {"CO2" :double
103+
"adjusted CO2" :double})
104+
(tc/info :columns))
105+
106+
;; Or by changing all columns of a certain type to another:
107+
108+
(-> dataset
109+
(tc/convert-types :type/numerical :double)
110+
(tc/info :columns))
111+
112+
;; The supported types of columns are:
113+
114+
;; :type/numerical - any numerical type
115+
;; :type/float - floating point number (:float32 and :float64)
116+
;; :type/integer - any integer
117+
;; :type/datetime - any datetime type
118+
119+
;; Also the magical `:!type` qualifier exists, which will select the complement set -- all columns that
120+
;; are _not_ the specified type
121+
122+
;; For others you need to provide a casting function yourself, e.g. parsing strings:
123+
(-> dataset
124+
;; (tc/convert-types "Date" :local-date-time)
125+
(tc/info :columns))
126+
127+
;; For full details on all the possible options for type conversion of columns see the
128+
;; [tablecloth API docs](https://scicloj.github.io/tablecloth/index.html#Type_conversion)

0 commit comments

Comments
 (0)