ML Task with Spark

Java tasks

java folder contains the ML tasks to be executed as Spark tasks. In order to create the Uber jar, you first need to run make create-extra-deps-jar. This recipee generates the uber jar containing the dependencies shared by all tasks. It must also be copied inside the Spark image. The other tasks can be created running make create-task-jar.

The jars generated by these recipees are copied inside output/java folder.

Create the `VectorAssembler` dependency jar

You can find this jar in ./java folder. The procedure to create it from scratch is described next.

Git clone this repo and replace the POM with the following one:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.apache.spark.ml.feature</groupId>
    <artifactId>VectorDisassembler</artifactId>
    <version>0.1</version>

    <properties>
        <spark_version>3.5.3</spark_version>
        <scala_version>2.12</scala_version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala_version}</artifactId>
            <version>${spark_version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala_version}</artifactId>
            <version>${spark_version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala_version}</artifactId>
            <version>${spark_version}</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>4.9.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

Then run mvn package to create the VectorDisassembler-0.1.jar file. Finally, install the dependency into the local maven repository:

mvn install:install-file -Dfile=/path/to/VectorDisassembler-0.1.jar \
                         -DgroupId=org.apache.spark.ml.feature \
                         -DartifactId=VectorDisassembler \
                         -Dversion=1.0 \
                         -Dpackaging=jar

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
datasets		datasets
examples		examples
java		java
python		python
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML Task with Spark

Java tasks

Create the `VectorAssembler` dependency jar

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

SESARLab/musa-spark-task

Folders and files

Latest commit

History

Repository files navigation

ML Task with Spark

Java tasks

Create the VectorAssembler dependency jar

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Create the `VectorAssembler` dependency jar

Packages