Skip to content

Commit 4531972

Browse files
UDF is updated
1 parent 6c3465c commit 4531972

File tree

2 files changed

+16
-5
lines changed

2 files changed

+16
-5
lines changed

code/bonus_chapters/UDF/README.md

+16-5
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Ph.D. in Computer Science
66
77
8-
Last updated: July 27, 2022
8+
Last updated: 2/25/2023
99

1010
----------
1111

@@ -33,7 +33,7 @@ Original Creator of Apache Spark <br>
3333

3434
-----------
3535

36-
## Introduction
36+
## 1. Introduction
3737

3838
This short article shows how to use Python
3939
user-defined functions in PySpark applications.
@@ -43,7 +43,18 @@ To use a UDF, we need to do some basic tasks:
4343
2. Register UDF
4444
3. Use UDF in Spark SQL
4545

46-
## 1. Define a UDF in Python
46+
47+
## 2. What is a UDF?
48+
User-Defined Functions (UDFs) are user-programmable
49+
functions that act on one row. Spark UDF (a.k.a User
50+
Defined Function) is the useful feature of Spark SQL
51+
& DataFrame which extends the Spark built in
52+
capabilities. UDF’s are used to extend the functions
53+
of the Spark framework and re-use this function on
54+
several DataFrame.
55+
56+
57+
## 3. Define a UDF in Python
4758

4859
Consider a function which triples its input:
4960

@@ -54,7 +65,7 @@ def tripled(n):
5465
#end-def
5566
~~~
5667

57-
## 2. Register UDF
68+
## 4. Register UDF
5869

5970
To register a UDF, we can use `SparkSession.udf.register()`.
6071
The `register()` function takes 3 parameters:
@@ -117,7 +128,7 @@ root
117128
+----+---+-------+
118129
~~~
119130

120-
## 3. Use UDF in SQL Query
131+
## 5. Use UDF in SQL Query
121132

122133
~~~python
123134
>>> df.createOrReplaceTempView("people")

code/bonus_chapters/UDF/UDF.pdf

38.2 KB
Binary file not shown.

0 commit comments

Comments
 (0)