Skip to content

Commit f0bf04a

Browse files
committed
Merge branch 'master' into unsupported-data-sources-examples
2 parents 919284d + 8d28a31 commit f0bf04a

File tree

6 files changed

+2390
-39
lines changed

6 files changed

+2390
-39
lines changed

README.md

Lines changed: 104 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,26 @@ Kotlin DataFrame aims to reconcile Kotlin's static typing with the dynamic natur
2222
* **Polymorphic** — type compatibility derives from column schema compatibility. You can define a function that requires a special subset of columns in a dataframe but doesn't care about other columns.
2323
In notebooks this works out-of-the-box. In ordinary projects this requires casting (for now).
2424

25-
Integrates with [Kotlin kernel for Jupyter](https://github.com/Kotlin/kotlin-jupyter). Inspired by [krangl](https://github.com/holgerbrandl/krangl), Kotlin Collections and [pandas](https://pandas.pydata.org/)
25+
Integrates with [Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html).
26+
Inspired by [krangl](https://github.com/holgerbrandl/krangl), Kotlin Collections and [pandas](https://pandas.pydata.org/)
27+
28+
## 🚀 Quickstart
29+
30+
Looking for a fast and simple way to learn the basics?
31+
Get started in minutes with our [Quickstart Guide](https://kotlin.github.io/dataframe/quickstart.html).
32+
33+
It walks you through the core features of Kotlin DataFrame with minimal setup and clear examples
34+
— perfect for getting up to speed in just a few minutes.
35+
36+
[![quickstart_preview](docs/StardustDocs/images/guides/quickstart_preview.png)](https://kotlin.github.io/dataframe/quickstart.html)
2637

2738
## Documentation
2839

2940
Explore [**documentation**](https://kotlin.github.io/dataframe) for details.
3041

3142
You could find the following articles there:
3243

44+
* [Guides and Examples](https://kotlin.github.io/dataframe/guides-and-examples.html)
3345
* [Get started with Kotlin DataFrame](https://kotlin.github.io/dataframe/gettingstarted.html)
3446
* [Working with Data Schemas](https://kotlin.github.io/dataframe/schemas.html)
3547
* [Setup compiler plugin in Gradle project](https://kotlin.github.io/dataframe/compiler-plugin.html)
@@ -48,31 +60,102 @@ Check out this [notebook with new features](examples/notebooks/feature_overviews
4860

4961
## Setup
5062

51-
```kotlin
52-
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2")
63+
> For more detailed instructions on how to get started with Kotlin DataFrame, refer to the
64+
> [Getting Started](https://kotlin.github.io/dataframe/gettingstarted.html).
65+
66+
### Kotlin Notebook
67+
68+
You can use Kotlin DataFrame in [Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html),
69+
or other interactive environment with [Kotlin Jupyter Kernel](https://github.com/Kotlin/kotlin-jupyter) support,
70+
such as [Datalore](https://datalore.jetbrains.com/),
71+
and [Jupyter Notebook](https://jupyter.org/).
72+
73+
You can include all the necessary dependencies and imports in the notebook using *line magic*:
74+
75+
```
76+
%use dataframe
5377
```
5478

55-
Check out the [custom setup page](https://kotlin.github.io/dataframe/gettingstartedgradleadvanced.html) if you don't need some of the formats as dependencies,
56-
for Groovy, and for configurations specific to Android projects.
79+
You can use `%useLatestDescriptors`
80+
to get the latest stable version without updating the Kotlin kernel:
5781

58-
## Code example
82+
```
83+
%useLatestDescriptors
84+
%use dataframe
85+
```
5986

60-
```kotlin
61-
import org.jetbrains.kotlinx.dataframe.*
62-
import org.jetbrains.kotlinx.dataframe.api.*
63-
import org.jetbrains.kotlinx.dataframe.io.*
87+
Or manually specify the version:
88+
89+
```
90+
%use dataframe($dataframe_version)
6491
```
6592

93+
Refer to the
94+
[Get started with Kotlin DataFrame in Kotlin Notebook](https://kotlin.github.io/dataframe/gettingstartedkotlinnotebook.html)
95+
for details.
96+
97+
### Gradle
98+
99+
Add dependencies in the build.gradle.kts script:
100+
66101
```kotlin
67-
val df = DataFrame.read("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
68-
df["full_name"][0] // Indexing https://kotlin.github.io/dataframe/access.html
102+
dependencies {
103+
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2")
104+
}
105+
```
106+
107+
Make sure that you have `mavenCentral()` in the list of repositories:
69108

70-
df.filter { "stargazers_count"<Int>() > 50 }.print()
109+
```kotlin
110+
repositories {
111+
mavenCentral()
112+
}
71113
```
72114

73-
## Getting started in Kotlin Notebook
115+
Refer to the
116+
[Get started with Kotlin DataFrame on Gradle](https://kotlin.github.io/dataframe/gettingstartedgradle.html)
117+
for details.
118+
Also, check out the [custom setup page](https://kotlin.github.io/dataframe/gettingstartedgradleadvanced.html)
119+
if you don't need some formats as dependencies,
120+
for Groovy, and for configurations specific to Android projects.
121+
122+
## Code example
74123

75-
Follow this [guide](https://kotlin.github.io/dataframe/gettingstartedkotlinnotebook.html)
124+
This example of Kotlin DataFrame code with
125+
the [Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html) enabled.
126+
See [the full project](https://github.com/Kotlin/dataframe/tree/master/examples/kotlin-dataframe-plugin-example).
127+
See also
128+
[this example in Kotlin Notebook](https://github.com/Kotlin/dataframe/tree/master/examples/notebooks/readme_example.ipynb).
129+
130+
```kotlin
131+
val df = DataFrame
132+
// Read DataFrame from the CSV file.
133+
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
134+
// And convert it to match the `Repositories` schema.
135+
.convertTo<Repositories>()
136+
137+
// Update the DataFrame.
138+
val reposUpdated = repos
139+
// Rename columns to CamelCase.
140+
.renameToCamelCase()
141+
// Rename "stargazersCount" column to "stars".
142+
.rename { stargazersCount }.into("stars")
143+
// Filter by the number of stars:
144+
.filter { stars > 50 }
145+
// Convert values in the "topic" column (which were `String` initially)
146+
// to the list of topics.
147+
.convert { topics }.with {
148+
val inner = it.removeSurrounding("[", "]")
149+
if (inner.isEmpty()) emptyList() else inner.split(',').map(String::trim)
150+
}
151+
// Add a new column with the number of topics.
152+
.add("topicCount") { topics.size }
153+
154+
// Write the updated DataFrame to a CSV file.
155+
reposUpdated.writeCsv("jetbrains_repositories_new.csv")
156+
```
157+
158+
Explore [**more examples here**](https://kotlin.github.io/dataframe/guides-and-examples.html).
76159

77160
## Data model
78161
* `DataFrame` is a list of columns with equal sizes and distinct names.
@@ -81,7 +164,12 @@ Follow this [guide](https://kotlin.github.io/dataframe/gettingstartedkotlinnoteb
81164
* `ColumnGroup` — contains columns
82165
* `FrameColumn` — contains dataframes
83166

84-
Explore [**more examples here**](https://kotlin.github.io/dataframe/guides-and-examples.html).
167+
## Visualizations
168+
169+
[Kandy](https://kotlin.github.io/kandy/welcome.html) plotting library provides seamless visualizations
170+
for your dataframes.
171+
172+
![kandy_preview](docs/StardustDocs/images/guides/kandy_gallery_preview.png)
85173

86174
## Kotlin, Kotlin Jupyter, Arrow, and JDK versions
87175

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
name,info
2+
Alice,"{""age"":23,""height"":175.5}"
3+
Bob,"{""age"":27,""height"":160.2}"
Lines changed: 156 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,165 @@
11
[//]: # (title: Extension Properties API)
22

3-
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
4-
5-
Auto-generated extension properties are the safest and easiest way to access columns in a [`DataFrame`](DataFrame.md).
6-
They are generated based on a [dataframe schema](schemas.md),
3+
When working with a [`DataFrame`](DataFrame.md), the most convenient and reliable way
4+
to access its columns — including for operations and retrieving column values
5+
in row expressions — is through *auto-generated extension properties*.
6+
They are generated based on a [dataframe schema](schemas.md),
77
with the name and type of properties inferred from the name and type of the corresponding columns.
8+
It also works for all types of hierarchical dataframes.
9+
10+
> The behavior of data schema generation differs between the
11+
> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](gettingStartedKotlinNotebook.md).
12+
>
13+
> * In **Kotlin Notebook**, a schema is generated **only after cell execution** for
14+
> `DataFrame` variables defined within that cell.
15+
> * With the **Compiler Plugin**, a new schema is generated **after every operation**
16+
> — but support for all operations is still in progress.
17+
> Retrieving the schema for `DataFrame` read from a file or URL is **not yet supported** either.
18+
>
19+
> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences.
20+
{style="warning"}
21+
22+
## Example
23+
24+
Consider a simple hierarchical dataframe from
25+
<resource src="example.csv"></resource>.
26+
27+
This table consists of two columns: `name`, which is a `String` column, and `info`,
28+
which is a [**column group**](DataColumn.md#columngroup) containing two nested
29+
[value columns](DataColumn.md#valuecolumn)
30+
`age` of type `Int`, and `height` of type `Double`.
31+
32+
<table>
33+
<thead>
34+
<tr>
35+
<th>name</th>
36+
<th colspan="2">info</th>
37+
</tr>
38+
<tr>
39+
<th></th>
40+
<th>age</th>
41+
<th>height</th>
42+
</tr>
43+
</thead>
44+
<tbody>
45+
<tr>
46+
<td>Alice</td>
47+
<td>23</td>
48+
<td>175.5</td>
49+
</tr>
50+
<tr>
51+
<td>Bob</td>
52+
<td>27</td>
53+
<td>160.2</td>
54+
</tr>
55+
</tbody>
56+
</table>
57+
58+
<tabs>
59+
<tab title="Kotlin Notebook">
60+
61+
Read the [`DataFrame`](DataFrame.md) from the CSV file:
62+
63+
```kotlin
64+
val df = DataFrame.readCsv("example.csv")
65+
```
66+
67+
**After cell execution** data schema and extensions for this `DataFrame` will be generated
68+
so you can use extensions for accessing columns,
69+
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
70+
and [DataRow API](DataRow.md):
71+
72+
73+
```kotlin
74+
// Get nested column
75+
df.info.age
76+
// Sort by multiple columns
77+
df.sortBy { name and info.height }
78+
// Filter rows using a row condition.
79+
// These extensions express the exact value in the row
80+
// with the corresponding type:
81+
df.filter { name.startsWith("A") && info.age >= 16 }
82+
```
83+
84+
If you change the dataframe's schema by changing any column [name](rename.md),
85+
or [type](convert.md) or [add](add.md) a new one, you need to
86+
run a cell with a new [`DataFrame`](DataFrame.md) declaration first.
87+
For example, rename the `name` column into "firstName":
888

9-
Having these, it allows you to work with your dataframe like:
1089
```kotlin
11-
val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>()
12-
val nameColumn /* : DataColumn<String> */ = peopleDf.name
13-
val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age
90+
val dfRenamed = df.rename { name }.into("firstName")
1491
```
15-
and of course
92+
93+
After running the cell with the code above, you can use `firstName` extensions in the following cells:
94+
95+
```kotlin
96+
dfRenamed.firstName
97+
dfRenamed.rename { firstName }.into("name")
98+
dfRenamed.filter { firstName == "Nikita" }
99+
```
100+
101+
See the [](quickstart.md) in Kotlin Notebook with basic Extension Properties API examples.
102+
103+
</tab>
104+
<tab title="Compiler Plugin">
105+
106+
For now, if you read [`DataFrame`](DataFrame.md) from a file or URL, you need to define its schema manually.
107+
You can do it quickly with [`generate..()` methods](DataSchema-Data-Classes-Generation.md).
108+
109+
Define schemas:
110+
```kotlin
111+
@DataSchema
112+
data class PersonInfo(
113+
val age: Int,
114+
val height: Float
115+
)
116+
117+
@DataSchema
118+
data class Person(
119+
val info: PersonInfo,
120+
val name: String
121+
)
122+
```
123+
124+
Read the [`DataFrame`](DataFrame.md) from the CSV file and specify the schema with
125+
[`.convertTo()`](convertTo.md) or [`cast()`](cast.md):
126+
127+
```kotlin
128+
val df = DataFrame.readCsv("example.csv").convertTo<Person>()
129+
```
130+
131+
Extensions for this `DataFrame` will be generated automatically by the plugin,
132+
so you can use extensions for accessing columns,
133+
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
134+
and [DataRow API](DataRow.md).
135+
136+
137+
```kotlin
138+
// Get nested column
139+
df.info.age
140+
// Sort by multiple columns
141+
df.sortBy { name and info.height }
142+
// Filter rows using a row condition.
143+
// These extensions express the exact value in the row
144+
// with the corresponding type:
145+
df.filter { name.startsWith("A") && info.age >= 16 }
146+
```
147+
148+
Moreover, new extensions will be generated on-the-fly after each schema change:
149+
by changing any column [name](rename.md),
150+
or [type](convert.md) or [add](add.md) a new one.
151+
For example, rename the `name` column into "firstName" and then we can use `firstName` extensions
152+
in the following operations:
153+
16154
```kotlin
17-
peopleDf.add("lastName") { name.split(",").last() }
18-
.dropNulls { personData.age }
19-
.filter { survived && home.endsWith("NY") && personData.age in 10..20 }
155+
// Rename "name" column into "firstName"
156+
df.rename { name }.into("firstName")
157+
// Can use `firstName` extension in the row condition
158+
// right after renaming
159+
.filter { firstName == "Nikita" }
20160
```
21161

22-
To find out how to use this API in your environment, check out [Working with Data Schemas](schemas.md)
23-
or jump straight to [Data Schemas in Gradle projects](schemasGradle.md),
24-
or [Data Schemas in Jupyter notebooks](schemasJupyter.md).
162+
See [Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-example)
163+
IDEA project with basic Extension Properties API examples.
164+
</tab>
165+
</tabs>

docs/StardustDocs/topics/guides/Guides-And-Examples.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@ Explore our structured, in-depth guides to steadily improve your Kotlin DataFram
2424

2525
<img src="quickstart_preview.png" border-effect="rounded" width="705"/>
2626

27+
* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md)
28+
and make working with your data both convenient and type-safe.
29+
2730
* [Enhanced Column Selection DSL](https://blog.jetbrains.com/kotlin/2024/07/enhanced-column-selection-dsl-in-kotlin-dataframe/)
2831
— explore powerful DSL for typesafe and flexible column selection in Kotlin DataFrame.
2932
* [](Kotlin-DataFrame-Features-in-Kotlin-Notebook.md)

0 commit comments

Comments
 (0)