Skip to content

Add basic CMake support for the iceberg library #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

cmake_minimum_required(VERSION 3.20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for CMake 3.20? In Arrow C++ there was a discussion lately to try and move to CMake 3.25 to allow using some of the newer feature like:

CMake is easily installable but the default versions on newer distributuions are:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just chose a random one. CMake 3.25 seems to be a good option.


project(
iceberg
VERSION 0.1.0
DESCRIPTION "Iceberg C++ Project"
LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH}
"${CMAKE_CURRENT_SOURCE_DIR}/cmake_modules")

if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()

option(ICEBERG_BUILD_TESTS "Build tests" ON)

include(GNUInstallDirs)
set(ICEBERG_INSTALL_LIBDIR "${CMAKE_INSTALL_LIBDIR}")
set(ICEBERG_INSTALL_BINDIR "${CMAKE_INSTALL_BINDIR}")
set(ICEBERG_INSTALL_INCLUDEDIR "${CMAKE_INSTALL_INCLUDEDIR}")
set(ICEBERG_INSTALL_CMAKEDIR "${CMAKE_INSTALL_LIBDIR}/cmake/iceberg")

add_subdirectory(include)
add_subdirectory(src)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add example too as a subdirectory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it by purpose to verify that the libraries can be found successfully from the installed CMake config files.


if(ICEBERG_BUILD_TESTS)
enable_testing()
add_subdirectory(tests)
endif()

install(FILES LICENSE NOTICE DESTINATION "share/doc/iceberg")
24 changes: 24 additions & 0 deletions example/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Please set CMAKE_PREFIX_PATH to the install prefix of iceberg.

find_package(iceberg CONFIG REQUIRED)

add_executable(example example.cc)

target_link_libraries(example iceberg::iceberg)
27 changes: 27 additions & 0 deletions example/example.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

#include "iceberg/demo.h"

#include <iostream>

int main() {
std::cout << iceberg::hello() << std::endl;
return 0;
}
22 changes: 22 additions & 0 deletions include/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the purpose of the 'include' folder. IS this for storing all the .h files separately from the .cc files? I find it easier to navigate if the headers are next to the cc files tbh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The include folder is for public header files that will be exported and installed. I'm also fine to move them next to the cc files. In that case, I think it would be better to add _internal.h suffix to non-exported header files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a second thought: I think in the Java implementation there is a purpose of having an api/ module: Whatever is there isn't expected to change out of the Blue. Once you have anything in the spec you have to deprecate first in one of the upcoming releases and you can drop only in the following major release.

I think we can follow this as well: Have an api/ folder for the headers that we will consider the part of the lib the users should interact with. In Java these are interfaces if I'm not mistaken and then the actual implementations are part of the other modules like core/ etc. For instance Table is in the api/ and is a pure interface while inherent classes like BaseTable are in core/
I find this something we should also follow.

# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

install(
DIRECTORY "iceberg/"
DESTINATION "${ICEBERG_INSTALL_INCLUDEDIR}/iceberg"
FILES_MATCHING
PATTERN "*.h")
26 changes: 26 additions & 0 deletions include/iceberg/demo.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

#include <string_view>

namespace iceberg {

std::string_view hello();

} // namespace iceberg
57 changes: 57 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

add_library(iceberg demo.cc)

target_compile_features(iceberg PUBLIC cxx_std_20)

target_include_directories(
iceberg
INTERFACE $<INSTALL_INTERFACE:${ICEBERG_INSTALL_INCLUDEDIR}>
PUBLIC $<BUILD_INTERFACE:${CMAKE_SOURCE_DIR}/include>
PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

install(
TARGETS iceberg
EXPORT icebergTargets
LIBRARY DESTINATION ${ICEBERG_INSTALL_LIBDIR}
ARCHIVE DESTINATION ${ICEBERG_INSTALL_LIBDIR}
RUNTIME DESTINATION ${ICEBERG_INSTALL_BINDIR}
INCLUDES
DESTINATION ${ICEBERG_INSTALL_INCLUDEDIR})

install(
EXPORT icebergTargets
FILE icebergTargets.cmake
NAMESPACE iceberg::
DESTINATION ${ICEBERG_INSTALL_CMAKEDIR})

include(CMakePackageConfigHelpers)

write_basic_package_version_file(
"icebergConfigVersion.cmake"
VERSION ${PROJECT_VERSION}
COMPATIBILITY SameMajorVersion)

configure_package_config_file(
"${CMAKE_CURRENT_SOURCE_DIR}/icebergConfig.cmake.in"
"${CMAKE_CURRENT_BINARY_DIR}/icebergConfig.cmake"
INSTALL_DESTINATION ${ICEBERG_INSTALL_CMAKEDIR})

install(FILES "${CMAKE_CURRENT_BINARY_DIR}/icebergConfig.cmake"
"${CMAKE_CURRENT_BINARY_DIR}/icebergConfigVersion.cmake"
DESTINATION ${ICEBERG_INSTALL_CMAKEDIR})
26 changes: 26 additions & 0 deletions src/demo.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd structure the code a bit differently. How I imagined the structure of the c++ lib is something like this:

iceberg-cpp/
      - api/
          - src/
          - test/
...
      - core/
            - src/
            - test/
...
      - example/
            - src/
            - test/

and so on. (I hope the indentation is displayed as I intended to :) )

So in general I think there should be separate libs for each of the modules of the implementation and each of them should have their own src/ test/ folders. As I see now these folders are provided on the top level of the directory structure.
If we want to have an include/ folder for the headers separately, I think we should also have them on the module-level and not at the top level.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need more libraries other than libiceberg (which is the core library in your structure)? In my mind, libiceberg provides the building blocks for types, io, exceptions, metadata, transformations, expressions, table, catalog, reader/writer, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion we'd need at leas api/ and core/ (I'd prefer core/ over libiceberg just to be in line with the Java impl. No point of reinventing the wheel). Also I'd imagine puffin could also get a separate lib. And then if there is a need each engine could put their own connector into their own folders too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand it correctly. Do you mean that the api folder stores all public headers? Then each folder (e.g. core, puffin, example, etc) uses its own subset of headers from the api folder to create library or executable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm not in favor of splitting this repo into multiple libraries, which is an overkill. Originally I thought that the api and core folders from the apache/iceberg repo are all that we need. However, it seems that puffin is a good example which does not need to depend on other modules. Therefore I'm open to discuss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! It is still unclear what it will look like eventually. I'm adding third-party libraries (arrow, avro and simdjson and their indirect dependencies) now but let me finish this patch first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is api in this context? is this the public file headers to include?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just read the other comment. No strong opinion on calling it include or api, I've seen it call it both on different projects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

iceberg-cpp/
├── api/
│   ├── table.h
│   └── puffin.h
├── example/
│   └── demo.cc
├── src/
│   ├── core/
│   │   ├── base_table.h
│   │   └── base_table.cc
│   └── puffin/
│       └── puffin.cc
└── test/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems great!

* Licensed to the Apache Software Foundation (ASF) under one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't get now is the include/ folder. Does that meant to hold the header files that clients of this lib could include? Isn't it something that is called 'API' in the Java implementation? If yes, could we rename it to 'api'? If no, then I think there is no point of separating thos headers from the cc files, they could live next to each other.

Could you please help me understand this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments above. I think this relates to how many libraries we are going to add.

* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

#include "iceberg/demo.h"

namespace iceberg {

std::string_view hello() { return "Hello, Iceberg!"; }

} // namespace iceberg
22 changes: 22 additions & 0 deletions src/icebergConfig.cmake.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

@PACKAGE_INIT@

include("${CMAKE_CURRENT_LIST_DIR}/@[email protected]")
check_required_components("@PROJECT_NAME@")
16 changes: 16 additions & 0 deletions tests/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.