Skip to content

SESARLab/musa-spark-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Task with Spark

Java tasks

java folder contains the ML tasks to be executed as Spark tasks. In order to create the Uber jar, you first need to run make create-extra-deps-jar. This recipee generates the uber jar containing the dependencies shared by all tasks. It must also be copied inside the Spark image. The other tasks can be created running make create-task-jar.

The jars generated by these recipees are copied inside output/java folder.


Create the VectorAssembler dependency jar

You can find this jar in ./java folder. The procedure to create it from scratch is described next.

Git clone this repo and replace the POM with the following one:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.apache.spark.ml.feature</groupId>
    <artifactId>VectorDisassembler</artifactId>
    <version>0.1</version>

    <properties>
        <spark_version>3.5.3</spark_version>
        <scala_version>2.12</scala_version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala_version}</artifactId>
            <version>${spark_version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala_version}</artifactId>
            <version>${spark_version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala_version}</artifactId>
            <version>${spark_version}</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>4.9.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

Then run mvn package to create the VectorDisassembler-0.1.jar file. Finally, install the dependency into the local maven repository:

mvn install:install-file -Dfile=/path/to/VectorDisassembler-0.1.jar \
                         -DgroupId=org.apache.spark.ml.feature \
                         -DartifactId=VectorDisassembler \
                         -Dversion=1.0 \
                         -Dpackaging=jar

About

ML Spark tasks for MUSA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages