-
Notifications
You must be signed in to change notification settings - Fork 979
UDF Troubleshooting
Drill is a production, non-nonsense engine that does not waste time with fancy APIs, detailed error messages or nicities for developers. With Drill, you are working with a power tool and must respect it on its own terms, not on yours.
Nowhere is that clearer than when trying to diagnose problems with UDFs. Either you know what you are doing, or you spend hours trying to figure out what you did wrong. The notes here describe the many problems that the author encountered in the hope that they can save you effort if you make the same mistakes.
The Structure of a UDF section mentioned that Drill finds UDFs by their annotations and interface. This is only part of the picture. Drill performs a class scan to locate the classes, but scans only selected packages. To identify which, look for this line in the Drill log file:
13:11:07.064 [main] INFO o.a.d.common.scanner.BuildTimeScan -
Loaded prescanned packages [org.apache.drill.exec.store.mock,
org.apache.drill.common.logical,
org.apache.drill.exec.expr,
...]
from locations [file:/.../drill/exec/java-exec/target/classes/META-INF/drill-module-scan/registry.json,
...]
The above tells us that Drill looks in only selected packages. Thus, when we are experimenting with functions within Drill source code, we must use one of the above. We do not want to attempt to modify the registry.json
files as they are very complex and appear to be generated. Drill provides no configuration options to extend the above programmatically. (As we will see, in a production system, we add our own packages to the list. But, we cannot use those mechanisms here.)
The best solution is to use the package org.apache.drill.exec.expr.contrib
as a temporary location.
Unfortunately, Drill does not provide logging for the class path scan mechanism to help you understand why Drill stubbornly ignores your UDF class. However, if you are using source code, you can insert your own debug code. The place to start is in ClassPathScanner
:
@Override
public void scan(final Object cls) {
final ClassFile classFile = (ClassFile)cls;
System.out.println(classFile.getName()); // Add me
The result will be a long list of all the classes that Drill will scan for function definitions. Check that your class appears. If not, check the package name of the class against the list of scanned packages shown above.
It is also helpful to see which classes and functions are actually being loaded. Again, there is no logging, but you can insert debug statements in LocalFunctionRegistry
:
public List<String> validate(String jarName, ScanResult scanResult) {
...
System.out.println("\n\n-- Function Classes\n\n"); // Add me
for (AnnotatedClassDescriptor func : providerClasses) {
System.out.println(func.getClassName()); // Add me
...
for (String name : names) {
String functionName = name.toLowerCase();
System.out.println(functionName); // Add me
Drill has a very clever mechanism to register functions that builds up a list of functions at build time. So, even if you find the right package for your code, Drill will still ignore it unless you run a full build. Unfortunately, that step is done only by Maven, nor your IDE. So, for your function to be seen by Drill when testing from the IDE, you must disable class path caching in your tests:
@BeforeClass
public static void setup() throws Exception {
ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
.configProperty("drill.classpath.scanning.cache.enabled", false);
startCluster(builder);
}
``
## Add Source to Class Path
As we have explained, Drill uses your function *source*, not the compiled class file, to run your function. You must ensure that your *source code* is on the class path, along with the class files. This is a particular problem in Eclipse. You will see an error such as:
org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: Failure reading Function class.
Function Class org.apache.drill.exec.expr.contrib.udfExample.Log2Wrapper
To solve this problem, add your source folders to the class path as discussed in the [[[Debugging UDFs]] section.
## Non-Annotated Fields Not Allowed
Suppose you were to create a field in your class:
public class Log2Function implements DrillSimpleFunc { private double LOG_2;
public void setup() { LOG_2 = Math.log(2.0D); }
Drill will fail to read your function and will report the following error in the log:
org.apache.drill.exec.expr.udfExample.Log2Function 00:27:16.706 [main] WARN org.reflections.Reflections - could not scan file /Users/paulrogers/git/drill/exec/java-exec/target/classes/org/apache/drill/exec/expr/udfExample/Log2Function.class with scanner AnnotationScanner org.reflections.ReflectionsException: could not create class file from Log2Function.class at org.reflections.scanners.AbstractScanner.scan(AbstractScanner.java:30) ~[reflections-0.9.8.jar:na] at org.reflections.Reflections.scan(Reflections.java:217) [reflections-0.9.8.jar:na] ... Caused by: java.lang.NullPointerException: null at org.apache.drill.common.scanner.ClassPathScanner$AnnotationScanner.getAnnotationDescriptors(ClassPathScanner.java:286) ~[classes/:na] at org.apache.drill.common.scanner.ClassPathScanner$AnnotationScanner.scan(ClassPathScanner.java:278) ~[classes/:na] at org.reflections.scanners.AbstractScanner.scan(AbstractScanner.java:28) ~[reflections-0.9.8.jar:na]
The cause is a null-pointer exception (NPE) inside the `reflections` library on access to the unannotated field.
If you see is error, the reason is that you omitted the required `@Workspace` annotation. The correct form of the above code is:
public class Log2Function implements DrillSimpleFunc { @Workspace private double LOG_2;
## Displaying Log Output when Debugging
Drill logging is not readily available when running unit tests from the IDE. However, you can use an alternative mechanism to redirect the log to the console (which is displayed in the IDE.) Use the `LoggingFixture` within the test harness as follows:
@Test public void testIntegration2() { LogFixtureBuilder logBuilder = new LogFixtureBuilder() .rootLogger(Level.INFO) ; try (LogFixture logFixture = logBuilder.build()) { String sql = "SELECT log2(4) FROM (VALUES (1))"; client.queryBuilder().sql(sql).printCsv(); } }
The above sets the root log level to `INFO` for the duration of the one test. You can also request logging of a specific subsystem or change the log level.
## Static fields (Constants) Not Supported
The the following compiles, but does not work:
public class Log2Function implements DrillSimpleFunc { @Workspace public static final double LOG_2 = Math.log(2.0D);
This classic good programming: declare a constant to hold your special numbers. The above works just fine if you test the function outside of Drill. But, when run within Drill, the `LOG_2` constant is not set; it defaults to 0, causing the function to return the wrong results.
Alternatives:
* Place constants in a separate non-function class.
* Put the values in-line in your code.
* Use temporary variables in place of constants:
public class Log2Function implements DrillSimpleFunc { @Workspace private double LOG_2;
public void setup() { LOG_2 = Math.log(2.0D); }
## No Same-Package References
You might decide that implementing code in a Drill function is too much of a hassle. Why not simply do the "real work" in a separate class?
public class FunctionImpl { private static final double LOG_2 = Math.log(2.0D);
public static final double log2(double x) { return Math.log(x) / LOG_2; } }
Then, the Drill function class need only be a thin wrapper:
@FunctionTemplate( name = "log2w", scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
public class Log2Wrapper implements DrillSimpleFunc {
@Param public Float8Holder x; @Output public Float8Holder out;
@Override public void setup() { }
@Override public void eval() { out.value = FunctionImpl.log2(x.value); } }
The problem is that Drill does not execute your code. Instead, Drill rewrites your source code. In so doing, Drill moves your code from the package that it was in into Drill's own package for generated code:
package org.apache.drill.exec.test.generated;
...
public class ProjectorGen0 {
...
public void doSetup(FragmentContext context, RecordBatch incoming, RecordBatch outgoing)
throws SchemaChangeException
{
...
Log2Wrapper_eval: { out.value = FunctionImpl.log2(x.value); }
...
There is our source code, plunked down inside Drill's generated code. Our reference to `FunctionImpl`, which was originally in the same class as our code, is not available in Drill's package. The result is a runtime failure:
01:17:39.680 [25b0bf82-1dfc-1c92-8460-3bcb6db74f7c:frag:0:0] ERROR o.a.d.e.r.AbstractSingleRecordBatch - Failure during query org.apache.drill.exec.exception.SchemaChangeException: Failure while attempting to load generated class ... Caused by: org.apache.drill.exec.exception.ClassTransformationException: java.util.concurrent.ExecutionException: org.apache.drill.exec.exception.ClassTransformationException: Failure generating transformation classes for value:
Along with several hundred lines of error messages, including multiple copies of the generated code.
## No Imports
We might think the workaround is to put our implementations and wrappers in separate packages:
package org.apache.drill.exec.expr.udfExample; ... public class Log2Wrapper implements DrillSimpleFunc {
With the implementation one level down:
package org.apache.drill.exec.expr.udfExample.impl;
public class FunctionImpl { ...
We would hope that Drill would copy the imports. But, we'd be wrong; Drill won't do so and the same error as the above will appear.
The only solution is to always use fully qualified class names for all classes except for those in the Java JDK or Drill:
@Override public void eval() { out.value = org.apache.drill.exec.expr.udfExample.FunctionImpl.log2(x.value); }
Presumably, all this extra work for the developer pays off in slightly faster runtime because, again presumably, Drill can generate better code than Java can (a highly dubious proposition, but there we have it.)