5
5
Ph.D. in Computer Science
6
6
7
7
8
- Last updated: July 27, 2022
8
+ Last updated: 2/25/2023
9
9
10
10
----------
11
11
@@ -33,7 +33,7 @@ Original Creator of Apache Spark <br>
33
33
34
34
-----------
35
35
36
- ## Introduction
36
+ ## 1. Introduction
37
37
38
38
This short article shows how to use Python
39
39
user-defined functions in PySpark applications.
@@ -43,7 +43,18 @@ To use a UDF, we need to do some basic tasks:
43
43
2 . Register UDF
44
44
3 . Use UDF in Spark SQL
45
45
46
- ## 1. Define a UDF in Python
46
+
47
+ ## 2. What is a UDF?
48
+ User-Defined Functions (UDFs) are user-programmable
49
+ functions that act on one row. Spark UDF (a.k.a User
50
+ Defined Function) is the useful feature of Spark SQL
51
+ & DataFrame which extends the Spark built in
52
+ capabilities. UDF’s are used to extend the functions
53
+ of the Spark framework and re-use this function on
54
+ several DataFrame.
55
+
56
+
57
+ ## 3. Define a UDF in Python
47
58
48
59
Consider a function which triples its input:
49
60
@@ -54,7 +65,7 @@ def tripled(n):
54
65
# end-def
55
66
~~~
56
67
57
- ## 2 . Register UDF
68
+ ## 4 . Register UDF
58
69
59
70
To register a UDF, we can use ` SparkSession.udf.register() ` .
60
71
The ` register() ` function takes 3 parameters:
117
128
+ ---- + -- -+ ------ -+
118
129
~~~
119
130
120
- ## 3 . Use UDF in SQL Query
131
+ ## 5 . Use UDF in SQL Query
121
132
122
133
~~~ python
123
134
>> > df.createOrReplaceTempView(" people" )
0 commit comments