diff --git a/docs/howto/index.rst b/docs/howto/index.rst index 3b66e28a9..32c2a5476 100644 --- a/docs/howto/index.rst +++ b/docs/howto/index.rst @@ -9,5 +9,6 @@ This section provides step-by-step guides for specific tasks. :maxdepth: 2 :caption: How-to Guides + neptune-iam-auth sdds \ No newline at end of file diff --git a/docs/howto/neptune-iam-auth.rst b/docs/howto/neptune-iam-auth.rst new file mode 100644 index 000000000..04e6346ba --- /dev/null +++ b/docs/howto/neptune-iam-auth.rst @@ -0,0 +1,388 @@ +.. _neptune-iam-auth: + +Using Neptune with AWS IAM Authentication +========================================== + +This guide explains how to configure your Whyis knowledge graph application to use Amazon Neptune with AWS IAM authentication. + +Overview +-------- + +The Neptune plugin extends Whyis to support AWS IAM authentication for Amazon Neptune databases. It uses AWS SigV4 request signing for all SPARQL operations, including: + +- SPARQL queries (SELECT, ASK, CONSTRUCT, DESCRIBE) +- SPARQL updates (INSERT, DELETE, MODIFY) +- Graph Store Protocol operations (PUT, POST, DELETE) +- Full-text search queries via Neptune FTS + +Prerequisites +------------- + +- A Whyis knowledge graph application (created with ``whyis createapp``) +- Access to an Amazon Neptune database cluster (or see Quick Start below to create one) +- AWS credentials with Neptune access permissions + +Quick Start: Automated Neptune Setup +------------------------------------- + +If you don't have a Neptune cluster yet, your Whyis application includes a CloudFormation template that automatically provisions a complete Neptune environment with Full-Text Search. + +The CloudFormation Template +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Your application's directory contains ``cloudformation-neptune.json``, which creates: + +- **Neptune Serverless Cluster** with IAM authentication enabled +- **OpenSearch Domain** for full-text search capabilities +- **Security Groups** for secure network access +- **IAM Role** with necessary permissions +- **Proper VPC Configuration** for production use + +Using the CloudFormation Template +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. **Prepare parameters** (edit values for your environment): + + .. code-block:: bash + + aws cloudformation create-stack \ + --stack-name my-kgapp-neptune \ + --template-body file://cloudformation-neptune.json \ + --parameters \ + ParameterKey=VPCId,ParameterValue=vpc-xxxxxxxx \ + ParameterKey=PrivateSubnetIds,ParameterValue="subnet-xxx,subnet-yyy" \ + ParameterKey=AllowedCIDR,ParameterValue=10.0.0.0/16 \ + ParameterKey=IAMRoleName,ParameterValue=my-kgapp-neptune-access \ + --capabilities CAPABILITY_NAMED_IAM \ + --region us-east-1 + +2. **Wait for completion** (typically 20-30 minutes): + + .. code-block:: bash + + aws cloudformation wait stack-create-complete \ + --stack-name my-kgapp-neptune \ + --region us-east-1 + +3. **Get configuration values**: + + .. code-block:: bash + + aws cloudformation describe-stacks \ + --stack-name my-kgapp-neptune \ + --region us-east-1 \ + --query 'Stacks[0].Outputs' + + The outputs provide all the values you need for ``whyis.conf`` (see Step 3 below). + +.. note:: + For detailed CloudFormation documentation, see ``CLOUDFORMATION.md`` in your application directory. It includes: + + - Complete parameter descriptions + - AWS Console deployment instructions + - Cost estimates and optimization tips + - Security best practices + - Troubleshooting guide + +Step 1: Enable the Neptune Plugin +---------------------------------- + +Add the Neptune plugin to your application's configuration file (``whyis.conf`` or ``system.conf``): + +.. code-block:: python + + # Enable the Neptune plugin + PLUGINENGINE_PLUGINS = ['neptune'] + + # Or if you already have other plugins enabled: + PLUGINENGINE_PLUGINS = ['neptune', 'other_plugin'] + +Step 2: Install Required Dependencies +-------------------------------------- + +The Neptune plugin requires additional Python packages that are **not** included in core Whyis. + +Add these packages to your application's ``requirements.txt``: + +.. code-block:: text + + aws_requests_auth + +Then install them in your application environment: + +.. code-block:: bash + + pip install -r requirements.txt + +.. note:: + This dependency is only needed when using Neptune with IAM authentication. It is not required for core Whyis functionality or other database backends. + +Step 3: Configure Neptune Connection +------------------------------------- + +Configuring the Knowledge Database Endpoint +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Whyis uses a "knowledge database" to store and query RDF data. To use Neptune as your knowledge database, add the following configuration to your application's ``whyis.conf`` or ``system.conf``: + +.. code-block:: python + + # Configure Neptune as the knowledge database backend + KNOWLEDGE_TYPE = 'neptune' + + # Neptune SPARQL endpoint (required) + # This is the main endpoint for SPARQL queries and updates + KNOWLEDGE_ENDPOINT = 'https://my-cluster.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/sparql' + + # AWS region where your Neptune cluster is located (required for IAM auth) + KNOWLEDGE_REGION = 'us-east-1' + +**Finding Your Neptune Endpoint:** + +1. Log into the AWS Console +2. Navigate to Amazon Neptune +3. Select your Neptune cluster +4. Copy the "Cluster endpoint" from the cluster details +5. Append the port and path: ``https://:8182/sparql`` + +Example: If your cluster endpoint is ``my-cluster.cluster-abc123.us-east-1.neptune.amazonaws.com``, your ``KNOWLEDGE_ENDPOINT`` would be: + +.. code-block:: python + + KNOWLEDGE_ENDPOINT = 'https://my-cluster.cluster-abc123.us-east-1.neptune.amazonaws.com:8182/sparql' + +Configuring Full-Text Search +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Neptune supports full-text search through Amazon OpenSearch Service (formerly Elasticsearch). To enable full-text search queries in your knowledge graph: + +.. code-block:: python + + # Neptune Full-Text Search endpoint (required for FTS queries) + # This is your OpenSearch Service domain endpoint + neptune_fts_endpoint = 'https://search-my-domain.us-east-1.es.amazonaws.com' + +**Finding Your OpenSearch Endpoint:** + +1. Log into the AWS Console +2. Navigate to Amazon OpenSearch Service +3. Select your domain that's integrated with Neptune +4. Copy the "Domain endpoint" from the domain overview +5. Use the HTTPS URL directly (no additional path needed) + +**How Full-Text Search Works:** + +When you execute SPARQL queries with Neptune FTS SERVICE blocks like this: + +.. code-block:: sparql + + PREFIX fts: + + SELECT ?resource ?label WHERE { + SERVICE fts:search { + fts:config neptune-fts:query "search term" . + fts:config neptune-fts:endpoint "https://search-my-domain.us-east-1.es.amazonaws.com" . + fts:config neptune-fts:field rdfs:label . + fts:config neptune-fts:return ?resource . + } + ?resource rdfs:label ?label . + } + +The Neptune plugin automatically passes AWS IAM authentication to both the Neptune SPARQL endpoint and the OpenSearch endpoint, enabling secure full-text search across your knowledge graph. + +Optional Configuration Parameters +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Additional optional parameters for advanced configurations: + +.. code-block:: python + + # Optional: Custom AWS service name for SigV4 signing (defaults to 'neptune-db') + KNOWLEDGE_SERVICE_NAME = 'neptune-db' + + # Optional: Separate Graph Store Protocol endpoint for graph operations + # If not specified, uses KNOWLEDGE_ENDPOINT + KNOWLEDGE_GSP_ENDPOINT = 'https://my-cluster.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/data' + + # Optional: Default graph URI for RDF data + KNOWLEDGE_DEFAULT_GRAPH = 'http://example.org/default-graph' + +Complete Configuration Example +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Here's a complete configuration example for your ``whyis.conf`` or ``system.conf``: + +.. code-block:: python + + # Enable Neptune plugin + PLUGINENGINE_PLUGINS = ['neptune'] + + # Neptune as knowledge database + KNOWLEDGE_TYPE = 'neptune' + KNOWLEDGE_ENDPOINT = 'https://my-cluster.cluster-abc123.us-east-1.neptune.amazonaws.com:8182/sparql' + KNOWLEDGE_REGION = 'us-east-1' + + # Full-text search endpoint + neptune_fts_endpoint = 'https://search-my-domain.us-east-1.es.amazonaws.com' + + # Optional: Graph Store Protocol endpoint + KNOWLEDGE_GSP_ENDPOINT = 'https://my-cluster.cluster-abc123.us-east-1.neptune.amazonaws.com:8182/data' + +.. important:: + Replace all endpoint URLs and region names with your actual Neptune cluster and OpenSearch domain endpoints. + +Step 4: Configure AWS Credentials +---------------------------------- + +The Neptune driver uses ``boto3`` for AWS credential management. Credentials can be provided in several ways: + +Environment Variables +~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + export AWS_ACCESS_KEY_ID=your_access_key + export AWS_SECRET_ACCESS_KEY=your_secret_key + export AWS_SESSION_TOKEN=your_session_token # Optional, for temporary credentials + +IAM Roles (Recommended for EC2/ECS) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If your Whyis application runs on EC2 or ECS, the driver will automatically use the instance or task IAM role. This is the recommended approach as it avoids managing credentials directly. + +AWS Credentials File +~~~~~~~~~~~~~~~~~~~~ + +Create or edit ``~/.aws/credentials``: + +.. code-block:: ini + + [default] + aws_access_key_id = your_access_key + aws_secret_access_key = your_secret_key + +And ``~/.aws/config``: + +.. code-block:: ini + + [default] + region = us-east-1 + +Step 5: Configure IAM Permissions +---------------------------------- + +Ensure your AWS credentials or IAM role have the necessary Neptune permissions. Example IAM policy: + +.. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "neptune-db:connect", + "neptune-db:ReadDataViaQuery", + "neptune-db:WriteDataViaQuery" + ], + "Resource": "arn:aws:neptune-db:us-east-1:123456789012:cluster-XXXXX/*" + } + ] + } + +Step 6: Verify the Configuration +--------------------------------- + +Start your Whyis application and verify the Neptune connection: + +.. code-block:: bash + + cd /apps/your-app + ./run + +Check the application logs for successful Neptune driver registration and database connection. + +How It Works +------------ + +Request Signing +~~~~~~~~~~~~~~~ + +All HTTP requests to Neptune are automatically signed with AWS SigV4: + +- The Neptune connector creates a ``requests.Session`` with ``AWS4Auth`` +- AWS credentials are fetched via ``boto3.Session().get_credentials()`` +- Each request includes signed headers for authentication +- Credentials are automatically refreshed when using IAM roles + +Full-Text Search Authentication +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Full-text search queries work seamlessly with authentication: + +.. code-block:: sparql + + PREFIX fts: + PREFIX dc: + + SELECT ?node ?label WHERE { + SERVICE fts:search { + fts:config neptune-fts:query "search term" . + fts:config neptune-fts:endpoint "https://your-fts-endpoint" . + fts:config neptune-fts:field dc:title . + fts:config neptune-fts:return ?node . + } + ?node dc:title ?label . + } + +The Neptune driver ensures AWS credentials are attached to full-text search requests. + +Troubleshooting +--------------- + +Authentication Errors +~~~~~~~~~~~~~~~~~~~~~ + +If you encounter authentication errors: + +1. Verify AWS credentials are properly configured +2. Check IAM policy grants Neptune access (see Step 5) +3. Ensure the region matches your Neptune cluster +4. Verify the Neptune endpoint URL is correct + +Connection Errors +~~~~~~~~~~~~~~~~~ + +If you cannot connect to Neptune: + +1. Check VPC security groups allow access from your application +2. Verify network connectivity to Neptune endpoint +3. Ensure the endpoint URL includes the port (typically 8182) +4. Verify your Neptune cluster is available + +Import Errors +~~~~~~~~~~~~~ + +If you see ``ModuleNotFoundError: No module named 'boto3'`` or similar: + +1. Ensure ``boto3`` and ``requests-aws4auth`` are in your application's ``requirements.txt`` +2. Run ``pip install -r requirements.txt`` in your application environment +3. Restart your application + +Security Considerations +----------------------- + +- **Never commit AWS credentials to source control** +- Use IAM roles when running on AWS infrastructure (EC2, ECS, Lambda) +- Use temporary credentials (STS) when possible +- Always use HTTPS endpoints for Neptune connections +- Restrict IAM policies to minimum required permissions +- Consider using VPC endpoints for Neptune access within AWS + +Additional Resources +-------------------- + +- `AWS Neptune IAM Authentication `_ +- `AWS Neptune Full-Text Search `_ +- `AWS SigV4 Signing `_ +- `boto3 Credentials `_ diff --git a/examples/neptune_boto3_store_example.py b/examples/neptune_boto3_store_example.py new file mode 100644 index 000000000..2c9307b15 --- /dev/null +++ b/examples/neptune_boto3_store_example.py @@ -0,0 +1,158 @@ +#!/usr/bin/env python3 +""" +Example: Using NeptuneBoto3Store with AWS Neptune + +This script demonstrates how to use the NeptuneBoto3Store class to connect +to Amazon Neptune with automatic credential management. +""" + +from rdflib import ConjunctiveGraph, Namespace, Literal, URIRef +from whyis.plugins.neptune import NeptuneBoto3Store + +# Example 1: Basic usage with automatic instance metadata retrieval +def example_basic(): + """ + Basic example: Connect to Neptune with automatic credential discovery. + + When running on EC2 with an IAM role, credentials are automatically + retrieved from instance metadata. + """ + print("Example 1: Basic usage with automatic credential discovery") + + store = NeptuneBoto3Store( + query_endpoint='https://my-neptune.us-east-1.neptune.amazonaws.com:8182/sparql', + update_endpoint='https://my-neptune.us-east-1.neptune.amazonaws.com:8182/sparql', + region_name='us-east-1' + ) + + graph = ConjunctiveGraph(store) + + # Example query + results = graph.query(""" + SELECT ?s ?p ?o + WHERE { + ?s ?p ?o . + } + LIMIT 10 + """) + + for row in results: + print(f"Subject: {row.s}, Predicate: {row.p}, Object: {row.o}") + + +# Example 2: Using custom boto3 session +def example_custom_session(): + """ + Advanced example: Use a custom boto3 session with explicit credentials. + """ + import boto3 + + print("Example 2: Custom boto3 session") + + # Create custom session (could use profile, explicit credentials, etc.) + session = boto3.Session( + aws_access_key_id='YOUR_ACCESS_KEY', + aws_secret_access_key='YOUR_SECRET_KEY', + region_name='us-east-1' + ) + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-east-1', + boto3_session=session + ) + + graph = ConjunctiveGraph(store) + print(f"Graph store: {type(graph.store).__name__}") + + +# Example 3: Disable instance metadata +def example_no_instance_metadata(): + """ + Example: Disable instance metadata and use only boto3 session credentials. + + Useful when you want to avoid the instance metadata service or when + running outside of EC2. + """ + print("Example 3: Disable instance metadata") + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-west-2', + use_instance_metadata=False # Only use boto3 session credentials + ) + + graph = ConjunctiveGraph(store) + print(f"Instance metadata disabled: {not store.use_instance_metadata}") + + +# Example 4: SPARQL query with authentication +def example_sparql_query(): + """ + Example: Execute a SPARQL query with automatic AWS authentication. + """ + print("Example 4: SPARQL query") + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-east-1' + ) + + graph = ConjunctiveGraph(store) + + # Define namespaces + FOAF = Namespace("http://xmlns.com/foaf/0.1/") + + # Query for people + query = """ + PREFIX foaf: + + SELECT ?person ?name + WHERE { + ?person a foaf:Person . + ?person foaf:name ?name . + } + LIMIT 10 + """ + + results = graph.query(query) + + for row in results: + print(f"Person: {row.person}, Name: {row.name}") + + +# Example 5: Custom service name (for Neptune alternatives) +def example_custom_service(): + """ + Example: Use custom service name for request signing. + """ + print("Example 5: Custom service name") + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='eu-west-1', + service_name='custom-service' # Custom AWS service name + ) + + print(f"Service name: {store.service_name}") + + +if __name__ == '__main__': + print("NeptuneBoto3Store Examples") + print("=" * 60) + + # Note: These examples require actual Neptune endpoints and credentials + # Uncomment the example you want to run: + + # example_basic() + # example_custom_session() + # example_no_instance_metadata() + # example_sparql_query() + # example_custom_service() + + print("\nNote: Update the endpoint URLs and ensure AWS credentials are configured") + print("before running these examples.") diff --git a/script/build b/script/build index c8dfbd981..8fd42f526 100755 --- a/script/build +++ b/script/build @@ -9,4 +9,4 @@ echo ${VERSION} python setup.py build python setup.py sdist -docker build . --build-arg __version__=${VERSION} -t tetherlessworld/whyis:latest -t tetherlessworld/whyis:${VERSION} +docker build . --build-arg __version__=${VERSION} -t tetherlessworld/whyis:${VERSION} # -t tetherlessworld/whyis:latest diff --git a/script/release b/script/release index fd70c8fc0..91db6d78d 100755 --- a/script/release +++ b/script/release @@ -6,11 +6,11 @@ VERSION=`python whyis/_version.py` echo ${VERSION} -twine upload dist/whyis-${VERSION}.tar.gz +#twine upload dist/whyis-${VERSION}.tar.gz docker push tetherlessworld/whyis:${VERSION} -docker push tetherlessworld/whyis:latest +#docker push tetherlessworld/whyis:latest git tag -f v${VERSION} diff --git a/setup.py b/setup.py index 95c8837ee..1cb34128d 100644 --- a/setup.py +++ b/setup.py @@ -1,5 +1,6 @@ import os from distutils.core import setup +from setuptools import find_packages import distutils.command.build import distutils.command.sdist import subprocess @@ -135,7 +136,7 @@ def run(self): license = "Apache License 2.0", keywords = "rdf semantic knowledge graph", url = "http://tetherless-world.github.io/whyis", - packages=['whyis'], + packages=find_packages(), long_description='''Whyis is a nano-scale knowledge graph publishing, management, and analysis framework. Whyis aims to support domain-aware management and curation of knowledge from many different sources. Its primary goal is to enable @@ -184,7 +185,7 @@ def run(self): #'mod-wsgi==4.9.0', 'nltk==3.6.5', 'numpy', - 'oxrdflib==0.3.1', + 'oxrdflib==0.3.7', 'pandas', 'PyJWT', 'pyparsing', @@ -192,8 +193,7 @@ def run(self): 'python-dateutil', 'puremagic==1.14', 'python-slugify', - 'rdflib==6.3.2', - 'rdflib-jsonld==0.6.2', + 'rdflib==7.1.1', 'redislite>=6', 'requests[security]', 'sadi', @@ -231,12 +231,14 @@ def run(self): 'text/turtle = rdflib.plugins.sparql.results.graph:GraphResultParser' ], 'whyis': [ - 'whyis_sparql_entity_resolver = whyis.plugins.sparql_entity_resolver:SPARQLEntityResolverPlugin', + 'whyis_fuseki = whyis.plugins.fuseki:FusekiSearchPlugin', + 'whyis_neptune = whyis.plugins.neptune:NeptuneSearchPlugin', 'whyis_knowledge_explorer = whyis.plugins.knowledge_explorer:KnowledgeExplorerPlugin' ] }, classifiers=[ - "Development Status :: 5 - Production/Stable", +# "Development Status :: 5 - Production/Stable", + "Development Status :: 4 - Beta", "Framework :: Flask", "Environment :: Web Environment", "Topic :: Internet :: WWW/HTTP :: WSGI :: Middleware", diff --git a/tests/unit/test_neptune_boto3_connector_methods.py b/tests/unit/test_neptune_boto3_connector_methods.py new file mode 100644 index 000000000..36d73b358 --- /dev/null +++ b/tests/unit/test_neptune_boto3_connector_methods.py @@ -0,0 +1,429 @@ +""" +Unit tests for NeptuneBoto3Store connector-level methods (_connector_query, _connector_update). + +These tests specifically verify that the low-level connector methods work correctly, +including proper use of response_mime_types() and other inherited methods. +""" + +import pytest +from unittest.mock import Mock, patch, MagicMock + +# Skip all tests if dependencies not available +boto3 = pytest.importorskip("boto3") +pytest.importorskip("botocore") + +from whyis.plugins.neptune.neptune_boto3_store import NeptuneBoto3Store + + +class TestNeptuneBoto3ConnectorQuery: + """Test the _connector_query() method.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_connector_query_uses_response_mime_types(self, mock_boto_session_class, mock_requests_session_class): + """Test that _connector_query properly calls response_mime_types().""" + # Setup boto3 mock + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'Content-Type': 'application/sparql-results+json'} + mock_response.content = b'{"head": {"vars": []}, "results": {"bindings": []}}' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Verify response_mime_types method exists and works + mime_types = store.response_mime_types() + assert mime_types is not None + assert isinstance(mime_types, str) + + # Execute query through _connector_query + query_string = "SELECT * WHERE { ?s ?p ?o }" + result = store._connector_query(query_string) + + # Verify the request was made with Accept header containing MIME types + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + assert 'Accept' in call_args[1]['headers'] + # The Accept header should contain valid MIME types + accept_header = call_args[1]['headers']['Accept'] + assert len(accept_header) > 0 + assert 'sparql-results' in accept_header or 'xml' in accept_header or 'json' in accept_header + + @patch('requests.Session') + @patch('boto3.Session') + def test_connector_query_with_default_graph(self, mock_boto_session_class, mock_requests_session_class): + """Test that _connector_query handles default_graph parameter.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'Content-Type': 'application/sparql-results+json'} + mock_response.content = b'{"head": {"vars": []}, "results": {"bindings": []}}' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Execute query with default_graph + query_string = "SELECT * WHERE { ?s ?p ?o }" + default_graph = "http://example.org/graph" + result = store._connector_query(query_string, default_graph=default_graph) + + # Verify the request includes the default-graph-uri parameter + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + url = call_args[1]['url'] + assert 'default-graph-uri=' in url or 'query=' in url # Depends on method + + @patch('requests.Session') + @patch('boto3.Session') + def test_connector_query_error_reporting(self, mock_boto_session_class, mock_requests_session_class): + """Test that _connector_query provides good error messages on failure.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock to return error + mock_response = Mock() + mock_response.ok = False + mock_response.status_code = 400 + mock_response.text = 'Bad Request: Invalid SPARQL syntax' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Execute query that will fail + query_string = "INVALID QUERY" + + with pytest.raises(IOError) as exc_info: + store._connector_query(query_string) + + # Verify error message contains useful information + error_msg = str(exc_info.value) + assert 'Neptune SPARQL query failed' in error_msg + assert '400' in error_msg + assert 'Bad Request' in error_msg + + +class TestNeptuneBoto3ConnectorUpdate: + """Test the _connector_update() method.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_connector_update_uses_response_mime_types(self, mock_boto_session_class, mock_requests_session_class): + """Test that _connector_update properly calls response_mime_types().""" + # Setup boto3 mock + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.text = 'Success' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Verify response_mime_types method exists and works + mime_types = store.response_mime_types() + assert mime_types is not None + + # Execute update through _connector_update + update_string = "INSERT DATA { }" + store._connector_update(update_string) + + # Verify the request was made with Accept header + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + assert 'Accept' in call_args[1]['headers'] + assert 'Content-Type' in call_args[1]['headers'] + assert 'application/sparql-update' in call_args[1]['headers']['Content-Type'] + + +class TestNeptuneBoto3StoreQueryShortcuts: + """Test the _query() shortcut method.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_query_shortcut_increments_counter(self, mock_boto_session_class, mock_requests_session_class): + """Test that _query() properly increments the query counter.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'Content-Type': 'application/sparql-results+json'} + mock_response.content = b'{"head": {"vars": []}, "results": {"bindings": []}}' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Check initial counter + initial_count = store._queries + + # Execute query through _query shortcut + query_string = "SELECT * WHERE { ?s ?p ?o }" + store._query(query_string) + + # Verify counter was incremented + assert store._queries == initial_count + 1 + + @patch('requests.Session') + @patch('boto3.Session') + def test_update_shortcut_increments_counter(self, mock_boto_session_class, mock_requests_session_class): + """Test that _update() properly increments the update counter.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.text = 'Success' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Check initial counter + initial_count = store._updates + + # Execute update through _update shortcut + update_string = "INSERT DATA { }" + store._update(update_string) + + # Verify counter was incremented + assert store._updates == initial_count + 1 + + +class TestNeptuneBoto3ResponseMimeTypes: + """Test the response_mime_types() method.""" + + @patch('boto3.Session') + def test_response_mime_types_method_exists(self, mock_boto_session_class): + """Test that response_mime_types() method is always available.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Verify the method exists + assert hasattr(store, 'response_mime_types') + assert callable(store.response_mime_types) + + # Verify it returns a string + result = store.response_mime_types() + assert isinstance(result, str) + assert len(result) > 0 + + # Verify it contains valid MIME types + assert 'sparql' in result.lower() or 'xml' in result.lower() or 'json' in result.lower() + + +class TestNeptuneBoto3RequestErrorHandling: + """Test the _request() method's error handling.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_request_handles_http_error_without_exception(self, mock_boto_session_class, mock_requests_session_class): + """Test that _request() handles HTTP errors that don't trigger exceptions.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock to return HTTP error (non-200 status) + mock_response = Mock() + mock_response.status_code = 500 + mock_response.text = 'Internal Server Error: Database timeout' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Attempt to make a request that will get HTTP error + with pytest.raises(IOError) as exc_info: + store._request('GET', 'https://neptune.example.com/sparql?query=test') + + # Verify error message contains HTTP status and response + error_msg = str(exc_info.value) + assert '500' in error_msg + assert 'Internal Server Error' in error_msg or 'Database timeout' in error_msg + + @patch('requests.Session') + @patch('boto3.Session') + def test_request_handles_network_exception(self, mock_boto_session_class, mock_requests_session_class): + """Test that _request() handles network exceptions properly.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock to raise an exception + import requests + mock_requests_session = Mock() + mock_requests_session.request.side_effect = requests.ConnectionError("Network unreachable") + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Attempt to make a request that will raise exception + with pytest.raises(IOError) as exc_info: + store._request('GET', 'https://neptune.example.com/sparql?query=test') + + # Verify error message contains exception info + error_msg = str(exc_info.value) + assert 'ConnectionError' in error_msg or 'Network unreachable' in error_msg diff --git a/tests/unit/test_neptune_boto3_query_update.py b/tests/unit/test_neptune_boto3_query_update.py new file mode 100644 index 000000000..55ea85f8c --- /dev/null +++ b/tests/unit/test_neptune_boto3_query_update.py @@ -0,0 +1,226 @@ +""" +Unit tests for NeptuneBoto3Store query and update methods with authentication. + +Tests that the overridden query() and update() methods properly use AWS authentication. +""" + +import pytest +from unittest.mock import Mock, patch, MagicMock +import sys + +# Skip all tests if dependencies not available +boto3 = pytest.importorskip("boto3") +pytest.importorskip("botocore") + +# Import directly from the module file, not through plugin __init__ +from whyis.plugins.neptune.neptune_boto3_store import NeptuneBoto3Store + + +class TestNeptuneBoto3StoreQueryMethod: + """Test the query() method override for authenticated requests.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_query_method_uses_authenticated_request(self, mock_boto_session_class, mock_requests_session_class): + """Test that query() method makes authenticated HTTP requests.""" + # Setup boto3 mock + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock to return a valid SPARQL result + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'Content-Type': 'application/sparql-results+json'} + mock_response.content = b'{"head": {"vars": []}, "results": {"bindings": []}}' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store with POST method + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False, + method='POST' # Explicitly set POST method + ) + + # Execute a query directly (SPARQLConnector level) + query_string = "SELECT * WHERE { ?s ?p ?o } LIMIT 10" + result = store.query(query_string) + + # Verify request was made with authentication + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + + # Check that headers include authorization + assert 'Authorization' in call_args[1]['headers'] + assert 'AWS4-HMAC-SHA256' in call_args[1]['headers']['Authorization'] + + # Check the query was sent with POST method + assert call_args[1]['method'] == 'POST' + assert call_args[1]['data'] == query_string.encode('utf-8') + + @patch('requests.Session') + @patch('boto3.Session') + def test_query_method_handles_get_method(self, mock_boto_session_class, mock_requests_session_class): + """Test that query() works with GET method.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'Content-Type': 'application/sparql-results+xml'} + mock_response.content = b'' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store with GET method + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False, + method='GET' + ) + + # Execute query + query_string = "SELECT * WHERE { ?s ?p ?o }" + store.query(query_string) + + # Verify GET method was used + call_args = mock_requests_session.request.call_args + assert call_args[1]['method'] == 'GET' + assert 'Authorization' in call_args[1]['headers'] + + +class TestNeptuneBoto3StoreUpdateMethod: + """Test the update() method override for authenticated requests.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_update_method_uses_authenticated_request(self, mock_boto_session_class, mock_requests_session_class): + """Test that update() method makes authenticated HTTP requests.""" + # Setup boto3 mock + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.text = 'Success' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Execute an update directly (SPARQLConnector level) + update_string = "INSERT DATA { }" + store.update(update_string) + + # Verify request was made with authentication + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + + # Check that headers include authorization + assert 'Authorization' in call_args[1]['headers'] + assert 'AWS4-HMAC-SHA256' in call_args[1]['headers']['Authorization'] + + # Check the update was sent + assert call_args[1]['method'] == 'POST' + assert call_args[1]['data'] == update_string.encode('utf-8') + assert 'application/sparql-update' in call_args[1]['headers']['Content-Type'] + + +class TestNeptuneBoto3StoreHighLevelQuery: + """Test that high-level query functionality is preserved.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_high_level_query_with_initBindings(self, mock_boto_session_class, mock_requests_session_class): + """Test that query() works with initBindings through parent class.""" + # Setup mocks + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'Content-Type': 'application/sparql-results+json'} + mock_response.content = b'{"head": {"vars": []}, "results": {"bindings": []}}' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + from rdflib import ConjunctiveGraph, Literal + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + graph = ConjunctiveGraph(store) + + # Execute query with initBindings (high-level interface) + # This tests that the parent class preprocessing still works + try: + results = graph.query( + "SELECT * WHERE { ?s ?p ?value }", + initBindings={'value': Literal('test')} + ) + # If we get here, the query flow worked (even if results are empty) + assert True + except Exception as e: + # If there's an error, make sure it's not about authentication + assert 'Authorization' not in str(e) + assert 'AWS' not in str(e) diff --git a/tests/unit/test_neptune_boto3_store.py b/tests/unit/test_neptune_boto3_store.py new file mode 100644 index 000000000..d053fd895 --- /dev/null +++ b/tests/unit/test_neptune_boto3_store.py @@ -0,0 +1,532 @@ +""" +Unit tests for NeptuneBoto3Store class. + +Tests the new boto3-based Neptune SPARQL store implementation that provides +AWS IAM authentication using boto3's credential management and instance metadata. +""" + +import pytest +from unittest.mock import Mock, patch, MagicMock, PropertyMock +from io import BytesIO + +# Skip all tests if dependencies not available +boto3 = pytest.importorskip("boto3") +pytest.importorskip("botocore") + +from whyis.plugins.neptune.neptune_boto3_store import NeptuneBoto3Store + + +class TestNeptuneBoto3StoreInit: + """Test NeptuneBoto3Store initialization.""" + + def test_store_requires_region_name(self): + """Test that store requires region_name parameter.""" + with pytest.raises(ValueError, match="region_name is required"): + NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql' + ) + + @patch('boto3.Session') + def test_store_creates_boto3_session(self, mock_session_class): + """Test that store creates a boto3 session if not provided.""" + mock_session = Mock() + mock_credentials = Mock() + mock_credentials.get_frozen_credentials.return_value = Mock( + access_key='test_key', + secret_key='test_secret', + token=None + ) + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1' + ) + + # Verify boto3.Session was called + mock_session_class.assert_called_once() + assert store.region_name == 'us-east-1' + assert store.service_name == 'neptune-db' + + @patch('boto3.Session') + def test_store_uses_provided_boto3_session(self, mock_session_class): + """Test that store uses provided boto3 session.""" + mock_session = Mock() + mock_credentials = Mock() + mock_credentials.get_frozen_credentials.return_value = Mock( + access_key='test_key', + secret_key='test_secret', + token=None + ) + mock_session.get_credentials.return_value = mock_credentials + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-west-2', + boto3_session=mock_session + ) + + # Verify provided session was used + mock_session_class.assert_not_called() + assert store.boto3_session is mock_session + + @patch('boto3.Session') + def test_store_accepts_custom_service_name(self, mock_session_class): + """Test that store accepts custom service name.""" + mock_session = Mock() + mock_credentials = Mock() + mock_credentials.get_frozen_credentials.return_value = Mock( + access_key='test_key', + secret_key='test_secret', + token=None + ) + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='eu-west-1', + service_name='custom-service' + ) + + assert store.service_name == 'custom-service' + + @patch('boto3.Session') + def test_store_raises_error_without_credentials(self, mock_session_class): + """Test that store raises error when no credentials are available and instance metadata disabled.""" + mock_session = Mock() + mock_session.get_credentials.return_value = None + mock_session_class.return_value = mock_session + + with pytest.raises(ValueError, match="No AWS credentials found"): + NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False # Disable instance metadata to ensure error is raised + ) + + @patch('whyis.plugins.neptune.neptune_boto3_store.InstanceMetadataFetcher') + @patch('whyis.plugins.neptune.neptune_boto3_store.InstanceMetadataProvider') + @patch('boto3.Session') + def test_store_initializes_instance_metadata_provider(self, mock_session_class, + mock_provider_class, mock_fetcher_class): + """Test that store initializes instance metadata provider by default.""" + mock_session = Mock() + mock_credentials = Mock() + mock_credentials.get_frozen_credentials.return_value = Mock( + access_key='test_key', + secret_key='test_secret', + token=None + ) + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + mock_fetcher = Mock() + mock_fetcher_class.return_value = mock_fetcher + + mock_provider = Mock() + mock_provider_class.return_value = mock_provider + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=True + ) + + # Verify instance metadata components were created + mock_fetcher_class.assert_called_once() + mock_provider_class.assert_called_once_with(iam_role_fetcher=mock_fetcher) + assert store._instance_metadata_provider is mock_provider + + @patch('boto3.Session') + def test_store_skips_instance_metadata_when_disabled(self, mock_session_class): + """Test that store skips instance metadata provider when disabled.""" + mock_session = Mock() + mock_credentials = Mock() + mock_credentials.get_frozen_credentials.return_value = Mock( + access_key='test_key', + secret_key='test_secret', + token=None + ) + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Verify instance metadata provider was not created + assert store._instance_metadata_provider is None + + +class TestNeptuneBoto3StoreInstanceMetadata: + """Test dynamic credential retrieval from instance metadata.""" + + @patch('whyis.plugins.neptune.neptune_boto3_store.InstanceMetadataFetcher') + @patch('whyis.plugins.neptune.neptune_boto3_store.InstanceMetadataProvider') + @patch('boto3.Session') + def test_get_credentials_from_instance_metadata(self, mock_session_class, + mock_provider_class, mock_fetcher_class): + """Test that _get_credentials retrieves from instance metadata provider.""" + # Setup mocks + mock_session = Mock() + mock_session_credentials = Mock() + mock_session_credentials.get_frozen_credentials.return_value = Mock( + access_key='session_key', + secret_key='session_secret', + token=None + ) + mock_session.get_credentials.return_value = mock_session_credentials + mock_session_class.return_value = mock_session + + mock_fetcher = Mock() + mock_fetcher_class.return_value = mock_fetcher + + # Mock instance metadata credentials + mock_instance_creds = Mock() + mock_frozen_instance_creds = Mock() + mock_frozen_instance_creds.access_key = 'instance_key' + mock_frozen_instance_creds.secret_key = 'instance_secret' + mock_frozen_instance_creds.token = 'instance_token' + mock_instance_creds.get_frozen_credentials.return_value = mock_frozen_instance_creds + + mock_provider = Mock() + mock_provider.load.return_value = mock_instance_creds + mock_provider_class.return_value = mock_provider + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=True + ) + + # Get credentials + frozen_creds = store._get_credentials() + + # Verify instance metadata provider was called + mock_provider.load.assert_called_once() + + # Verify we got instance metadata credentials (not session credentials) + assert frozen_creds.access_key == 'instance_key' + assert frozen_creds.secret_key == 'instance_secret' + assert frozen_creds.token == 'instance_token' + + @patch('whyis.plugins.neptune.neptune_boto3_store.InstanceMetadataFetcher') + @patch('whyis.plugins.neptune.neptune_boto3_store.InstanceMetadataProvider') + @patch('boto3.Session') + def test_get_credentials_falls_back_to_session(self, mock_session_class, + mock_provider_class, mock_fetcher_class): + """Test that _get_credentials falls back to session when instance metadata fails.""" + # Setup mocks + mock_session = Mock() + mock_session_credentials = Mock() + mock_frozen_session_creds = Mock() + mock_frozen_session_creds.access_key = 'session_key' + mock_frozen_session_creds.secret_key = 'session_secret' + mock_frozen_session_creds.token = None + mock_session_credentials.get_frozen_credentials.return_value = mock_frozen_session_creds + mock_session.get_credentials.return_value = mock_session_credentials + mock_session_class.return_value = mock_session + + mock_fetcher = Mock() + mock_fetcher_class.return_value = mock_fetcher + + # Mock instance metadata provider to return None (not available) + mock_provider = Mock() + mock_provider.load.return_value = None + mock_provider_class.return_value = mock_provider + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=True + ) + + # Get credentials + frozen_creds = store._get_credentials() + + # Verify we got session credentials as fallback + assert frozen_creds.access_key == 'session_key' + assert frozen_creds.secret_key == 'session_secret' + + @patch('boto3.Session') + def test_get_credentials_without_instance_metadata(self, mock_session_class): + """Test that _get_credentials uses session when instance metadata is disabled.""" + mock_session = Mock() + mock_session_credentials = Mock() + mock_frozen_creds = Mock() + mock_frozen_creds.access_key = 'session_key' + mock_frozen_creds.secret_key = 'session_secret' + mock_frozen_creds.token = None + mock_session_credentials.get_frozen_credentials.return_value = mock_frozen_creds + mock_session.get_credentials.return_value = mock_session_credentials + mock_session_class.return_value = mock_session + + # Create store with instance metadata disabled + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Get credentials + frozen_creds = store._get_credentials() + + # Verify we got session credentials + assert frozen_creds.access_key == 'session_key' + assert frozen_creds.secret_key == 'session_secret' + + +class TestNeptuneBoto3StoreRequestSigning: + """Test request signing with boto3.""" + + @patch('boto3.Session') + def test_sign_request_adds_signature_headers(self, mock_session_class): + """Test that _sign_request adds AWS signature headers.""" + # Setup mock session and credentials + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_session = Mock() + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1' + ) + + # Sign a request + headers = {'Content-Type': 'application/sparql-query'} + signed_headers = store._sign_request( + method='POST', + url='https://neptune.example.com/sparql', + headers=headers, + body='SELECT * WHERE { ?s ?p ?o }' + ) + + # Verify signature headers are present + assert 'Authorization' in signed_headers + assert signed_headers['Content-Type'] == 'application/sparql-query' + assert 'AWS4-HMAC-SHA256' in signed_headers['Authorization'] + + @patch('boto3.Session') + def test_sign_request_handles_query_parameters(self, mock_session_class): + """Test that _sign_request properly handles URLs with query parameters.""" + # Setup mock session and credentials + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_session = Mock() + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1' + ) + + # Sign a request with query parameters + url_with_params = 'https://neptune.example.com/sparql?query=SELECT%20*' + signed_headers = store._sign_request( + method='GET', + url=url_with_params, + headers={} + ) + + # Verify signature was added + assert 'Authorization' in signed_headers + + @patch('boto3.Session') + def test_sign_request_with_session_token(self, mock_session_class): + """Test that _sign_request works with temporary credentials (session token).""" + # Setup mock session with temporary credentials + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'temp_key' + frozen_creds.secret_key = 'temp_secret' + frozen_creds.token = 'session_token' + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_session = Mock() + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1' + ) + + # Sign a request + signed_headers = store._sign_request( + method='POST', + url='https://neptune.example.com/sparql', + headers={'Content-Type': 'application/sparql-query'} + ) + + # Verify signature headers are present (including session token) + assert 'Authorization' in signed_headers + assert 'X-Amz-Security-Token' in signed_headers or 'x-amz-security-token' in signed_headers + + +class TestNeptuneBoto3StoreHTTPRequests: + """Test HTTP request methods.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_request_method_signs_and_sends(self, mock_boto_session_class, mock_requests_session_class): + """Test that _request method signs request and sends it.""" + # Setup boto3 mock + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock + mock_response = Mock() + mock_response.status_code = 200 + mock_response.text = 'OK' + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1' + ) + + # Make a request + response = store._request( + method='POST', + url='https://neptune.example.com/sparql', + headers={'Content-Type': 'application/sparql-query'}, + body='SELECT * WHERE { ?s ?p ?o }' + ) + + # Verify request was made + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + + # Check that method and URL are correct + assert call_args[1]['method'] == 'POST' + assert call_args[1]['url'] == 'https://neptune.example.com/sparql' + + # Check that headers include authorization + assert 'Authorization' in call_args[1]['headers'] + + # Check response + assert response.status_code == 200 + + +class TestNeptuneBoto3StoreIntegration: + """Test integration with RDFlib.""" + + @patch('boto3.Session') + def test_store_can_be_used_with_conjunctive_graph(self, mock_session_class): + """Test that store can be used with RDFlib's ConjunctiveGraph.""" + from rdflib.graph import ConjunctiveGraph + + # Setup mock session + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_session = Mock() + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1' + ) + + # Create graph with the store + graph = ConjunctiveGraph(store) + + # Verify graph was created + assert isinstance(graph, ConjunctiveGraph) + assert graph.store is store + + @patch('boto3.Session') + def test_store_inherits_from_whyis_sparql_update_store(self, mock_session_class): + """Test that store properly inherits from WhyisSPARQLUpdateStore.""" + from whyis.database.whyis_sparql_update_store import WhyisSPARQLUpdateStore + + # Setup mock session + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_session = Mock() + mock_session.get_credentials.return_value = mock_credentials + mock_session_class.return_value = mock_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1' + ) + + # Verify inheritance + assert isinstance(store, WhyisSPARQLUpdateStore) + assert hasattr(store, '_inject_prefixes') # Method from WhyisSPARQLUpdateStore + + +class TestNeptuneBoto3StoreMissingBoto3: + """Test behavior when boto3 is not installed.""" + + def test_import_error_when_boto3_not_available(self): + """Test that appropriate error is raised when boto3 is not available.""" + # This test is mainly for documentation - in practice, pytest.importorskip + # will skip these tests if boto3 is not available + + # We can't actually test this without uninstalling boto3, + # but we document the expected behavior + pass diff --git a/tests/unit/test_neptune_plugin.py b/tests/unit/test_neptune_plugin.py new file mode 100644 index 000000000..998a32435 --- /dev/null +++ b/tests/unit/test_neptune_plugin.py @@ -0,0 +1,383 @@ +""" +Unit tests for Neptune plugin with IAM authentication. + +Tests the Neptune driver that supports AWS IAM authentication for Amazon Neptune. +""" + +import pytest +from unittest.mock import Mock, patch, MagicMock +from io import BytesIO + +# Skip all tests if dependencies not available +pytest.importorskip("flask_security") +pytest.importorskip("aws_requests_auth") + +from rdflib import URIRef, Namespace, Literal +from rdflib.graph import ConjunctiveGraph +from whyis.database.database_utils import drivers, node_to_sparql + + +class TestNeptuneDriver: + """Test the Neptune driver registration and functionality.""" + + def test_neptune_driver_function_exists(self): + """Test that neptune driver function exists and is callable.""" + from whyis.plugins.neptune.plugin import neptune_driver + + # Verify the function exists and is callable + assert callable(neptune_driver) + + def test_neptune_driver_registered_via_plugin_init(self): + """Test that neptune driver gets registered in drivers dict during plugin init.""" + from whyis.plugins.neptune.plugin import neptune_driver + from whyis.database.database_utils import drivers + + # Store original state + had_neptune = 'neptune' in drivers + original_neptune = drivers.get('neptune') + + # Clear neptune from drivers if it exists + if 'neptune' in drivers: + del drivers['neptune'] + + # Verify neptune driver is not registered + assert 'neptune' not in drivers + + # Simulate what plugin.init() does - directly register the driver + # This is what happens in NeptuneSearchPlugin.init() + drivers['neptune'] = neptune_driver + + # Verify neptune driver is now registered + assert 'neptune' in drivers + assert callable(drivers['neptune']) + assert drivers['neptune'] is neptune_driver + + # Restore original state + if had_neptune: + drivers['neptune'] = original_neptune + elif 'neptune' in drivers: + del drivers['neptune'] + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + def test_neptune_driver_requires_region(self): + """Test that neptune driver requires region configuration.""" + from whyis.plugins.neptune.plugin import neptune_driver + + config = { + '_endpoint': 'https://neptune.example.com/sparql' + } + + with pytest.raises(ValueError, match="requires '_region'"): + neptune_driver(config) + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + def test_neptune_driver_returns_graph(self): + """Test that neptune driver returns a ConjunctiveGraph.""" + from whyis.plugins.neptune.plugin import neptune_driver + + config = { + '_endpoint': 'https://neptune.example.com/sparql', + '_region': 'us-east-1' + } + + graph = neptune_driver(config) + + assert isinstance(graph, ConjunctiveGraph) + # Store should have gsp_endpoint set + assert hasattr(graph.store, 'gsp_endpoint') + assert graph.store.gsp_endpoint == 'https://neptune.example.com/sparql' + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + def test_neptune_driver_with_custom_service_name(self): + """Test that neptune driver accepts custom service name.""" + from whyis.plugins.neptune.plugin import neptune_driver + + config = { + '_endpoint': 'https://neptune.example.com/sparql', + '_region': 'us-west-2', + '_service_name': 'custom-service' + } + + graph = neptune_driver(config) + + # Graph should be created successfully + assert isinstance(graph, ConjunctiveGraph) + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + def test_neptune_driver_with_gsp_endpoint(self): + """Test that neptune driver uses separate GSP endpoint if provided.""" + from whyis.plugins.neptune.plugin import neptune_driver + + config = { + '_endpoint': 'https://neptune.example.com/sparql', + '_gsp_endpoint': 'https://neptune.example.com/data', + '_region': 'us-east-1' + } + + graph = neptune_driver(config) + + assert graph.store.gsp_endpoint == 'https://neptune.example.com/data' + + +class TestNeptuneGSPOperations: + """Test Neptune Graph Store Protocol operations with AWS auth.""" + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + @patch('whyis.plugins.neptune.plugin.requests.Session') + def test_gsp_operations_use_aws_auth(self, mock_requests_session): + """Test that GSP operations (publish, put, post, delete) use AWS auth.""" + from whyis.plugins.neptune.plugin import neptune_driver + + # Mock requests session + mock_session_instance = Mock() + mock_response = Mock() + mock_response.ok = True + mock_session_instance.post.return_value = mock_response + mock_session_instance.put.return_value = mock_response + mock_session_instance.delete.return_value = mock_response + mock_requests_session.return_value = mock_session_instance + + config = { + '_endpoint': 'https://neptune.example.com/sparql', + '_region': 'us-east-1' + } + + graph = neptune_driver(config) + + # Test that publish method exists and has auth + assert hasattr(graph.store, 'publish') + assert hasattr(graph.store, 'put') + assert hasattr(graph.store, 'post') + assert hasattr(graph.store, 'delete') + + # Call publish to verify it works + graph.store.publish(b'test data') + + # Verify a session was created + assert mock_requests_session.called + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + @patch('whyis.plugins.neptune.plugin.uuid.uuid4') + def test_publish_uses_temp_graph_by_default(self, mock_uuid): + """Test that publish uses temporary UUID graph by default.""" + from whyis.plugins.neptune.plugin import neptune_driver + + # Mock UUID generation + test_uuid = 'test-uuid-1234' + mock_uuid.return_value = test_uuid + + config = { + '_endpoint': 'https://neptune.example.com/sparql', + '_region': 'us-east-1' + } + + graph = neptune_driver(config) + + # Mock the store's _request method to track calls + original_request = graph.store._request + request_calls = [] + + def mock_request(method, url, headers=None, body=None): + request_calls.append({ + 'method': method, + 'url': url, + 'headers': headers, + 'body': body + }) + # Return a mock response + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.text = 'OK' + return mock_response + + graph.store._request = mock_request + + # Call publish + test_data = b' .' + graph.store.publish(test_data) + + # Verify POST was called with temporary graph parameter + post_calls = [c for c in request_calls if c['method'] == 'POST'] + assert len(post_calls) == 1 + assert f'urn:uuid:{test_uuid}' in post_calls[0]['url'] + + # Verify DELETE was called to clean up temporary graph + delete_calls = [c for c in request_calls if c['method'] == 'DELETE'] + assert len(delete_calls) == 1 + assert f'urn:uuid:{test_uuid}' in delete_calls[0]['url'] + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + def test_publish_without_temp_graph(self): + """Test that publish uses default graph when use_temp_graph=False.""" + from whyis.plugins.neptune.plugin import neptune_driver + + config = { + '_endpoint': 'https://neptune.example.com/sparql', + '_region': 'us-east-1', + '_use_temp_graph': False + } + + graph = neptune_driver(config) + + # Mock the store's _request method to track calls + request_calls = [] + + def mock_request(method, url, headers=None, body=None): + request_calls.append({ + 'method': method, + 'url': url, + 'headers': headers, + 'body': body + }) + # Return a mock response + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.text = 'OK' + return mock_response + + graph.store._request = mock_request + + # Call publish + test_data = b' .' + graph.store.publish(test_data) + + # Verify POST was called WITHOUT graph parameter in URL + post_calls = [c for c in request_calls if c['method'] == 'POST'] + assert len(post_calls) == 1 + # URL should not contain graph parameter + assert 'graph=' not in post_calls[0]['url'] + + # Verify DELETE was NOT called + delete_calls = [c for c in request_calls if c['method'] == 'DELETE'] + assert len(delete_calls) == 0 + + + @patch('whyis.plugins.neptune.plugin.os.environ', {'AWS_ACCESS_KEY_ID': 'test_key', 'AWS_SECRET_ACCESS_KEY': 'test_secret'}) + @patch('whyis.plugins.neptune.plugin.uuid.uuid4') + def test_temp_graph_cleanup_on_error(self, mock_uuid): + """Test that temporary graph is still deleted even if POST fails.""" + from whyis.plugins.neptune.plugin import neptune_driver + + # Mock UUID generation + test_uuid = 'test-uuid-error' + mock_uuid.return_value = test_uuid + + config = { + '_endpoint': 'https://neptune.example.com/sparql', + '_region': 'us-east-1' + } + + graph = neptune_driver(config) + + # Mock the store's _request method - POST fails but DELETE succeeds + request_calls = [] + + def mock_request(method, url, headers=None, body=None): + request_calls.append({ + 'method': method, + 'url': url, + 'headers': headers, + 'body': body + }) + # Return appropriate response based on method + mock_response = Mock() + if method == 'POST': + mock_response.ok = False + mock_response.status_code = 500 + mock_response.text = 'Internal Server Error' + else: # DELETE + mock_response.ok = True + mock_response.status_code = 200 + mock_response.text = 'OK' + return mock_response + + graph.store._request = mock_request + + # Call publish (should fail but still clean up) + test_data = b' .' + graph.store.publish(test_data) + + # Verify POST was called + post_calls = [c for c in request_calls if c['method'] == 'POST'] + assert len(post_calls) == 1 + + # Verify DELETE was still called for cleanup despite POST failure + delete_calls = [c for c in request_calls if c['method'] == 'DELETE'] + assert len(delete_calls) == 1 + assert f'urn:uuid:{test_uuid}' in delete_calls[0]['url'] + + + +class TestNeptuneEntityResolver: + """Test the NeptuneEntityResolver class.""" + + def test_escape_sparql_string(self): + """Test that SPARQL string escaping works correctly.""" + from whyis.plugins.neptune.plugin import NeptuneEntityResolver + + resolver = NeptuneEntityResolver() + + # Test basic string + assert resolver._escape_sparql_string("test") == "test" + + # Test string with quotes + assert resolver._escape_sparql_string('test "quoted"') == 'test \\"quoted\\"' + + # Test string with backslashes + assert resolver._escape_sparql_string('test\\path') == 'test\\\\path' + + # Test string with newlines + assert resolver._escape_sparql_string('test\nline') == 'test\\nline' + + # Test string with carriage returns + assert resolver._escape_sparql_string('test\rline') == 'test\\rline' + + # Test complex string with multiple special characters + assert resolver._escape_sparql_string('test "quote" and\\path\nline') == 'test \\"quote\\" and\\\\path\\nline' + + # Test None + assert resolver._escape_sparql_string(None) == "" + + def test_fts_query_format(self): + """Test that the FTS query is correctly formatted.""" + from whyis.plugins.neptune.plugin import NeptuneEntityResolver + + resolver = NeptuneEntityResolver() + + # Check that the query uses full URIs for Neptune FTS + assert '' in resolver.query + assert '' in resolver.query + assert '' in resolver.query + assert '' in resolver.query + + # Check that query uses string substitution for search term (not variable binding) + assert '"%s"' in resolver.query # Search term should be inserted as quoted string + + def test_on_resolve_escapes_search_term(self): + """Test that on_resolve properly escapes the search term and type.""" + from whyis.plugins.neptune.plugin import NeptuneEntityResolver + + resolver = NeptuneEntityResolver() + + # Test that the query will safely escape special characters in search term + term_with_quotes = 'test "injection" attempt' + escaped = resolver._escape_sparql_string(term_with_quotes) + + # Verify the quotes were escaped + assert escaped == 'test \\"injection\\" attempt' + + # Verify that when formatted into the query, it's safe + test_query = 'SELECT * WHERE { ?s ?p "%s" }' % escaped + + # The query should contain the escaped version + assert 'test \\"injection\\" attempt' in test_query + + # And should not contain the unescaped quotes that could break out + assert 'test "injection" attempt' not in test_query + + # Test escaping type parameter as well + type_with_special_chars = 'http://example.org/Test"Type' + escaped_type = resolver._escape_sparql_string(type_with_special_chars) + assert escaped_type == 'http://example.org/Test\\"Type' diff --git a/tests/unit/test_sparql_blueprint_auth.py b/tests/unit/test_sparql_blueprint_auth.py new file mode 100644 index 000000000..1c022e376 --- /dev/null +++ b/tests/unit/test_sparql_blueprint_auth.py @@ -0,0 +1,215 @@ +""" +Unit tests for SPARQL blueprint authentication. + +Tests that the SPARQL blueprint properly uses store authentication methods. +""" + +import pytest +from unittest.mock import Mock, patch, MagicMock + +# Skip all tests if dependencies not available +boto3 = pytest.importorskip("boto3") +pytest.importorskip("botocore") + +from whyis.plugins.neptune.neptune_boto3_store import NeptuneBoto3Store +from whyis.database.whyis_sparql_update_store import WhyisSPARQLUpdateStore + + +class TestNeptuneBoto3StoreRawRequest: + """Test the raw_sparql_request method for NeptuneBoto3Store.""" + + @patch('requests.Session') + @patch('boto3.Session') + def test_raw_sparql_request_get(self, mock_boto_session_class, mock_requests_session_class): + """Test raw_sparql_request with GET method.""" + # Setup boto3 mock + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'content-type': 'application/sparql-results+json'} + mock_response.content = b'{"results": {}}' + mock_response.iter_content = Mock(return_value=[b'{"results": {}}']) + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Make raw request + params = {'query': 'SELECT * WHERE { ?s ?p ?o }'} + headers = {'Accept': 'application/sparql-results+json'} + + response = store.raw_sparql_request( + method='GET', + params=params, + headers=headers + ) + + # Verify request was made with authentication + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + + # Check that authorization header was added + assert 'Authorization' in call_args[1]['headers'] + assert 'AWS4-HMAC-SHA256' in call_args[1]['headers']['Authorization'] + + # Check response + assert response.ok + assert response.status_code == 200 + + @patch('requests.Session') + @patch('boto3.Session') + def test_raw_sparql_request_post(self, mock_boto_session_class, mock_requests_session_class): + """Test raw_sparql_request with POST method.""" + # Setup boto3 mock + mock_credentials = Mock() + frozen_creds = Mock() + frozen_creds.access_key = 'test_key' + frozen_creds.secret_key = 'test_secret' + frozen_creds.token = None + mock_credentials.get_frozen_credentials.return_value = frozen_creds + + mock_boto_session = Mock() + mock_boto_session.get_credentials.return_value = mock_credentials + mock_boto_session_class.return_value = mock_boto_session + + # Setup requests mock + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'content-type': 'application/sparql-results+xml'} + mock_response.content = b'' + mock_response.iter_content = Mock(return_value=[b'']) + + mock_requests_session = Mock() + mock_requests_session.request.return_value = mock_response + mock_requests_session_class.return_value = mock_requests_session + + # Create store + store = NeptuneBoto3Store( + query_endpoint='https://neptune.example.com/sparql', + update_endpoint='https://neptune.example.com/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) + + # Make raw POST request + data = b'query=SELECT * WHERE { ?s ?p ?o }' + headers = {'Content-Type': 'application/x-www-form-urlencoded'} + + response = store.raw_sparql_request( + method='POST', + data=data, + headers=headers + ) + + # Verify request was made with authentication + assert mock_requests_session.request.called + call_args = mock_requests_session.request.call_args + + # Check that authorization header was added + assert 'Authorization' in call_args[1]['headers'] + + # Check response + assert response.ok + + +class TestWhyisSPARQLUpdateStoreRawRequest: + """Test the raw_sparql_request method for WhyisSPARQLUpdateStore.""" + + @patch('requests.request') + def test_raw_sparql_request_with_basic_auth(self, mock_request): + """Test raw_sparql_request with HTTP Basic Auth.""" + # Setup mock response + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'content-type': 'application/sparql-results+json'} + mock_response.content = b'{"results": {}}' + mock_response.iter_content = Mock(return_value=[b'{"results": {}}']) + mock_request.return_value = mock_response + + # Create store with auth + store = WhyisSPARQLUpdateStore( + query_endpoint='http://localhost:3030/test/sparql', + update_endpoint='http://localhost:3030/test/sparql', + auth=('user', 'pass') + ) + store.auth = ('user', 'pass') + + # Make raw request + params = {'query': 'SELECT * WHERE { ?s ?p ?o }'} + headers = {'Accept': 'application/sparql-results+json'} + + response = store.raw_sparql_request( + method='GET', + params=params, + headers=headers + ) + + # Verify request was made with auth + assert mock_request.called + call_args = mock_request.call_args + + # Check that auth was passed + assert 'auth' in call_args[1] + assert call_args[1]['auth'] == ('user', 'pass') + + # Check response + assert response.ok + + @patch('requests.request') + def test_raw_sparql_request_without_auth(self, mock_request): + """Test raw_sparql_request without authentication.""" + # Setup mock response + mock_response = Mock() + mock_response.ok = True + mock_response.status_code = 200 + mock_response.headers = {'content-type': 'application/sparql-results+json'} + mock_response.content = b'{"results": {}}' + mock_response.iter_content = Mock(return_value=[b'{"results": {}}']) + mock_request.return_value = mock_response + + # Create store without auth + store = WhyisSPARQLUpdateStore( + query_endpoint='http://localhost:3030/test/sparql', + update_endpoint='http://localhost:3030/test/sparql' + ) + store.auth = None + + # Make raw request + params = {'query': 'SELECT * WHERE { ?s ?p ?o }'} + + response = store.raw_sparql_request( + method='GET', + params=params + ) + + # Verify request was made without auth + assert mock_request.called + call_args = mock_request.call_args + + # Check that auth was not passed (or is None) + assert 'auth' not in call_args[1] or call_args[1].get('auth') is None + + # Check response + assert response.ok diff --git a/whyis/_version.py b/whyis/_version.py index dfc4ef9d0..2347af12c 100644 --- a/whyis/_version.py +++ b/whyis/_version.py @@ -1,4 +1,4 @@ -__version__='2.3.20' +__version__='2.4.0.beta14' if __name__ == '__main__': print(__version__) diff --git a/whyis/blueprint/sparql/sparql_view.py b/whyis/blueprint/sparql/sparql_view.py index 0195d6979..8336ecccb 100644 --- a/whyis/blueprint/sparql/sparql_view.py +++ b/whyis/blueprint/sparql/sparql_view.py @@ -17,31 +17,59 @@ def sparql_view(): has_query = True if request.method == 'GET' and not has_query: return redirect(url_for('.sparql_form')) - #print self.db.store.query_endpoint - if request.method == 'GET': - headers = {} - headers.update(request.headers) - if 'Content-Length' in headers: - del headers['Content-Length'] - print('getting') - req = requests.get(current_app.db.store.query_endpoint, - headers = headers, params=request.args, stream=True) - print("gotten") - elif request.method == 'POST': - if 'application/sparql-update' in request.headers['content-type']: - return "Update not allowed.", 403 - if 'update' in request.values: - return "Update not allowed.", 403 - #print(request.get_data()) - print("posting") - req = requests.post(current_app.db.store.query_endpoint,# data=request.values, - headers = request.headers, data=request.values, stream=True) - print("posted") - #print self.db.store.query_endpoint - #print req.status_code - print("making response") + + # Check if store has raw_sparql_request method (all drivers should now have this) + if hasattr(current_app.db.store, 'raw_sparql_request'): + # Use the store's authenticated request method + try: + if request.method == 'GET': + headers = {} + headers.update(request.headers) + if 'Content-Length' in headers: + del headers['Content-Length'] + + req = current_app.db.store.raw_sparql_request( + method='GET', + params=dict(request.args), + headers=headers + ) + elif request.method == 'POST': + if 'application/sparql-update' in request.headers.get('content-type', ''): + return "Update not allowed.", 403 + if 'update' in request.values: + return "Update not allowed.", 403 + + req = current_app.db.store.raw_sparql_request( + method='POST', + headers=dict(request.headers), + data=request.get_data() + ) + except NotImplementedError as e: + # Local stores don't support proxying - return error + return str(e), 501 + except Exception as e: + # Log and return error + current_app.logger.error(f"SPARQL request failed: {str(e)}") + return f"SPARQL request failed: {str(e)}", 500 + else: + # Fallback for stores without raw_sparql_request (should not happen) + # This is the old behavior - direct HTTP request without authentication + if request.method == 'GET': + headers = {} + headers.update(request.headers) + if 'Content-Length' in headers: + del headers['Content-Length'] + req = requests.get(current_app.db.store.query_endpoint, + headers=headers, params=request.args, stream=True) + elif request.method == 'POST': + if 'application/sparql-update' in request.headers.get('content-type', ''): + return "Update not allowed.", 403 + if 'update' in request.values: + return "Update not allowed.", 403 + req = requests.post(current_app.db.store.query_endpoint, + headers=request.headers, data=request.values, stream=True) + + # Return the response response = Response(FileLikeFromIter(req.iter_content()), - content_type = req.headers['content-type']) - print("returning") - #response.headers[con(req.headers) + content_type=req.headers.get('content-type', 'application/sparql-results+xml')) return response, req.status_code diff --git a/whyis/config-template/{{cookiecutter.project_slug}}/CLOUDFORMATION.md b/whyis/config-template/{{cookiecutter.project_slug}}/CLOUDFORMATION.md new file mode 100644 index 000000000..f515cabae --- /dev/null +++ b/whyis/config-template/{{cookiecutter.project_slug}}/CLOUDFORMATION.md @@ -0,0 +1,349 @@ +# Setting Up AWS Neptune with CloudFormation + +This directory contains a CloudFormation template (`cloudformation-neptune.json`) that automates the deployment of AWS Neptune Serverless with Full-Text Search capabilities for your Whyis knowledge graph application. + +## What This Template Creates + +The CloudFormation template provisions: + +1. **Neptune Serverless Cluster**: A scalable Neptune database cluster with IAM authentication enabled +2. **OpenSearch Domain**: For full-text search capabilities integrated with Neptune +3. **Security Groups**: Proper network security for both Neptune and OpenSearch +4. **IAM Role**: With necessary permissions to access both Neptune and OpenSearch +5. **VPC Configuration**: Subnet groups for secure deployment + +## Prerequisites + +Before deploying this template, you need: + +1. **AWS Account** with appropriate permissions to create: + - Neptune clusters + - OpenSearch domains + - IAM roles and policies + - EC2 security groups + - VPC subnet groups + +2. **Existing VPC** with: + - At least 2 private subnets in different Availability Zones + - Proper routing configuration + - NAT Gateway (if your application needs internet access) + +3. **AWS CLI** installed and configured (or use AWS Console) + +## Deployment Steps + +### Option 1: Using AWS CLI + +1. **Prepare your parameters** by creating a `parameters.json` file: + +```json +[ + { + "ParameterKey": "DBClusterIdentifier", + "ParameterValue": "my-kgapp-neptune" + }, + { + "ParameterKey": "VPCId", + "ParameterValue": "vpc-xxxxxxxxx" + }, + { + "ParameterKey": "PrivateSubnetIds", + "ParameterValue": "subnet-xxxxxxxx,subnet-yyyyyyyy" + }, + { + "ParameterKey": "AllowedCIDR", + "ParameterValue": "10.0.0.0/16" + }, + { + "ParameterKey": "IAMRoleName", + "ParameterValue": "my-kgapp-neptune-access" + }, + { + "ParameterKey": "MinNCUs", + "ParameterValue": "2.5" + }, + { + "ParameterKey": "MaxNCUs", + "ParameterValue": "128" + }, + { + "ParameterKey": "OpenSearchInstanceType", + "ParameterValue": "t3.small.search" + }, + { + "ParameterKey": "OpenSearchInstanceCount", + "ParameterValue": "1" + } +] +``` + +2. **Deploy the stack**: + +```bash +aws cloudformation create-stack \ + --stack-name my-kgapp-neptune-stack \ + --template-body file://cloudformation-neptune.json \ + --parameters file://parameters.json \ + --capabilities CAPABILITY_NAMED_IAM \ + --region us-east-1 +``` + +3. **Monitor the deployment**: + +```bash +aws cloudformation describe-stacks \ + --stack-name my-kgapp-neptune-stack \ + --region us-east-1 \ + --query 'Stacks[0].StackStatus' +``` + +The deployment typically takes 20-30 minutes to complete. + +4. **Get the outputs**: + +```bash +aws cloudformation describe-stacks \ + --stack-name my-kgapp-neptune-stack \ + --region us-east-1 \ + --query 'Stacks[0].Outputs' +``` + +### Option 2: Using AWS Console + +1. Log into the AWS Console +2. Navigate to CloudFormation service +3. Click "Create Stack" → "With new resources" +4. Select "Upload a template file" +5. Upload the `cloudformation-neptune.json` file +6. Fill in the required parameters: + - **DBClusterIdentifier**: Unique name for your Neptune cluster + - **VPCId**: Select your VPC + - **PrivateSubnetIds**: Select at least 2 private subnets in different AZs + - **AllowedCIDR**: IP range that can access Neptune and OpenSearch + - **IAMRoleName**: Name for the IAM role (must be unique) + - **MinNCUs/MaxNCUs**: Capacity settings for Neptune Serverless + - **OpenSearchInstanceType**: Instance type for OpenSearch + - **OpenSearchInstanceCount**: Number of OpenSearch nodes +7. Acknowledge IAM resource creation +8. Click "Create Stack" + +## Configuring Your Whyis Application + +After the CloudFormation stack completes, configure your Whyis application: + +### 1. Get Configuration Values from Stack Outputs + +The CloudFormation outputs provide all the values you need. Key outputs: + +- `NeptuneSPARQLEndpoint`: Neptune SPARQL endpoint URL +- `OpenSearchFTSEndpoint`: OpenSearch full-text search endpoint +- `Region`: AWS region +- `NeptuneAccessRoleArn`: IAM role ARN for accessing Neptune +- `WhyisConfigSummary`: Quick reference of all configuration values + +### 2. Update whyis.conf + +Add these lines to your `whyis.conf`: + +```python +# Enable Neptune plugin +PLUGINENGINE_PLUGINS = ['neptune'] + +# Neptune configuration +KNOWLEDGE_TYPE = 'neptune' +KNOWLEDGE_ENDPOINT = 'https://:8182/sparql' # From NeptuneSPARQLEndpoint output +KNOWLEDGE_REGION = 'us-east-1' # From Region output + +# Full-text search configuration +neptune_fts_endpoint = 'https://' # From OpenSearchFTSEndpoint output +``` + +### 3. Add Dependencies to requirements.txt + +``` +aws_requests_auth +``` + +Install dependencies: + +```bash +pip install -r requirements.txt +``` + +### 4. Configure AWS Credentials + +Your application needs AWS credentials to access Neptune. Choose one option: + +#### Option A: Using IAM Role (Recommended for EC2/ECS) + +If running on EC2, attach the instance profile to your instance: + +```bash +# Get the instance profile ARN from CloudFormation outputs +aws ec2 associate-iam-instance-profile \ + --instance-id i-xxxxxxxxx \ + --iam-instance-profile Arn= +``` + +#### Option B: Using Environment Variables (For local development) + +Create an IAM user with permissions to assume the Neptune access role, then: + +```bash +export AWS_ACCESS_KEY_ID=your_access_key +export AWS_SECRET_ACCESS_KEY=your_secret_key +export AWS_REGION=us-east-1 +``` + +#### Option C: Using AWS CLI Profile + +```bash +aws configure --profile neptune +# Enter your credentials +export AWS_PROFILE=neptune +``` + +### 5. Verify the Configuration + +Start your Whyis application and verify Neptune connection: + +```bash +./run +``` + +Check the logs for successful Neptune plugin initialization and database connection. + +## Configuration Parameters Explained + +### Required Parameters + +- **DBClusterIdentifier**: Unique identifier for your Neptune cluster (3-63 characters, alphanumeric and hyphens) +- **VPCId**: The VPC where Neptune and OpenSearch will be deployed +- **PrivateSubnetIds**: At least 2 private subnets in different Availability Zones for high availability +- **AllowedCIDR**: CIDR block that can access Neptune and OpenSearch (e.g., your VPC CIDR) +- **IAMRoleName**: Name for the IAM role that grants access to Neptune and OpenSearch + +### Optional Parameters (with defaults) + +- **MinNCUs**: Minimum Neptune Capacity Units (default: 2.5) - Lowest cost option +- **MaxNCUs**: Maximum Neptune Capacity Units (default: 128) - Allows scaling to high workloads +- **OpenSearchInstanceType**: Instance type for OpenSearch (default: t3.small.search) - Good for development +- **OpenSearchInstanceCount**: Number of OpenSearch instances (default: 1) - Use 2+ for production + +## Cost Considerations + +### Neptune Serverless Costs + +- **NCU-hours**: Charged per NCU-hour when cluster is active +- **Storage**: Charged per GB-month +- **I/O**: Charged per million requests +- **Backups**: Automated backups included, additional snapshots charged + +Estimated monthly cost (with 2.5 NCUs average, 10GB data): +- ~$150-300/month depending on usage patterns + +### OpenSearch Costs + +- **Instance hours**: Based on instance type (t3.small.search ~$35/month) +- **Storage**: Charged per GB (20GB included in template) + +### Cost Optimization Tips + +1. **Development**: Use MinNCUs=1, t3.small.search, single instance +2. **Production**: Use MinNCUs=2.5, larger instance types, multiple instances for HA +3. **Stop when not in use**: Neptune Serverless automatically scales to zero after inactivity +4. **Monitor usage**: Use AWS Cost Explorer to track actual costs + +## Security Best Practices + +1. **Network Security**: + - Deploy in private subnets only + - Use restrictive security groups + - Set AllowedCIDR to minimum required range + +2. **IAM Authentication**: + - Always use IAM authentication (enabled by default in template) + - Rotate credentials regularly + - Use IAM roles instead of long-term credentials when possible + +3. **Encryption**: + - Encryption at rest enabled by default + - TLS/HTTPS enforced for all connections + - Node-to-node encryption enabled for OpenSearch + +4. **Least Privilege**: + - Use the provided IAM role with minimal permissions + - Create separate roles for different access patterns if needed + +## Troubleshooting + +### Stack Creation Failed + +1. **Check CloudFormation Events**: + ```bash + aws cloudformation describe-stack-events \ + --stack-name my-kgapp-neptune-stack \ + --region us-east-1 + ``` + +2. **Common Issues**: + - Insufficient IAM permissions + - VPC/Subnet configuration issues + - Resource naming conflicts + - Service limits exceeded + +### Connection Issues + +1. **Verify Security Groups**: Ensure your application's security group can reach Neptune (port 8182) and OpenSearch (port 443) + +2. **Check IAM Permissions**: Verify the IAM role has neptune-db:* and es:* permissions + +3. **Test Connectivity**: + ```bash + # From an instance in the same VPC + curl -k https://:8182/sparql + ``` + +### OpenSearch Access Issues + +1. **Fine-grained Access Control**: Ensure the IAM role ARN is configured as master user +2. **VPC Configuration**: Verify OpenSearch is in the correct subnets +3. **Domain Policy**: Check the access policy allows your CIDR range + +## Updating the Stack + +To update configuration (e.g., increase capacity): + +```bash +aws cloudformation update-stack \ + --stack-name my-kgapp-neptune-stack \ + --template-body file://cloudformation-neptune.json \ + --parameters file://updated-parameters.json \ + --capabilities CAPABILITY_NAMED_IAM \ + --region us-east-1 +``` + +## Deleting the Stack + +To remove all resources: + +```bash +aws cloudformation delete-stack \ + --stack-name my-kgapp-neptune-stack \ + --region us-east-1 +``` + +**Warning**: This will permanently delete: +- All data in Neptune +- All data in OpenSearch +- Security groups and IAM roles + +Create a backup before deletion if you need to preserve data. + +## Additional Resources + +- [AWS Neptune Documentation](https://docs.aws.amazon.com/neptune/latest/userguide/) +- [Neptune IAM Authentication](https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth.html) +- [Neptune Full-Text Search](https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html) +- [OpenSearch Documentation](https://docs.aws.amazon.com/opensearch-service/) +- [CloudFormation Best Practices](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/best-practices.html) diff --git a/whyis/config-template/{{cookiecutter.project_slug}}/cloudformation-neptune.json b/whyis/config-template/{{cookiecutter.project_slug}}/cloudformation-neptune.json new file mode 100644 index 000000000..0d6d5b20b --- /dev/null +++ b/whyis/config-template/{{cookiecutter.project_slug}}/cloudformation-neptune.json @@ -0,0 +1,505 @@ +{ + "AWSTemplateFormatVersion": "2010-09-09", + "Description": "CloudFormation template for AWS Neptune Serverless cluster with Full-Text Search (OpenSearch) for Whyis Knowledge Graph Application", + "Parameters": { + "DBClusterIdentifier": { + "Type": "String", + "Default": "{{cookiecutter.project_slug}}-neptune", + "Description": "Neptune DB cluster identifier", + "MinLength": 1, + "MaxLength": 63, + "AllowedPattern": "^[a-zA-Z][a-zA-Z0-9-]*$", + "ConstraintDescription": "Must begin with a letter and contain only alphanumeric characters and hyphens" + }, + "MinNCUs": { + "Type": "Number", + "Default": 2.5, + "Description": "Minimum Neptune Capacity Units (NCUs) for serverless cluster", + "AllowedValues": [1, 2.5] + }, + "MaxNCUs": { + "Type": "Number", + "Default": 128, + "Description": "Maximum Neptune Capacity Units (NCUs) for serverless cluster", + "AllowedValues": [2.5, 128] + }, + "OpenSearchInstanceType": { + "Type": "String", + "Default": "t3.small.search", + "Description": "OpenSearch instance type for Full-Text Search", + "AllowedValues": [ + "t3.small.search", + "t3.medium.search", + "r6g.large.search", + "r6g.xlarge.search" + ] + }, + "OpenSearchInstanceCount": { + "Type": "Number", + "Default": 1, + "Description": "Number of OpenSearch instances", + "MinValue": 1, + "MaxValue": 10 + }, + "VPCId": { + "Type": "AWS::EC2::VPC::Id", + "Description": "VPC ID where Neptune and OpenSearch will be deployed" + }, + "PrivateSubnetIds": { + "Type": "List", + "Description": "List of private subnet IDs for Neptune and OpenSearch (at least 2 in different AZs)" + }, + "AllowedCIDR": { + "Type": "String", + "Default": "10.0.0.0/8", + "Description": "CIDR block allowed to access Neptune and OpenSearch", + "AllowedPattern": "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})/(\\d{1,2})", + "ConstraintDescription": "Must be a valid CIDR range" + }, + "IAMRoleName": { + "Type": "String", + "Default": "{{cookiecutter.project_slug}}-neptune-access-role", + "Description": "Name for the IAM role that will access Neptune", + "MinLength": 1, + "MaxLength": 64 + } + }, + "Resources": { + "NeptuneSecurityGroup": { + "Type": "AWS::EC2::SecurityGroup", + "Properties": { + "GroupDescription": "Security group for Neptune cluster", + "VpcId": { + "Ref": "VPCId" + }, + "SecurityGroupIngress": [ + { + "IpProtocol": "tcp", + "FromPort": 8182, + "ToPort": 8182, + "CidrIp": { + "Ref": "AllowedCIDR" + }, + "Description": "Allow Neptune access from specified CIDR" + } + ], + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${DBClusterIdentifier}-sg" + } + } + ] + } + }, + "OpenSearchSecurityGroup": { + "Type": "AWS::EC2::SecurityGroup", + "Properties": { + "GroupDescription": "Security group for OpenSearch domain", + "VpcId": { + "Ref": "VPCId" + }, + "SecurityGroupIngress": [ + { + "IpProtocol": "tcp", + "FromPort": 443, + "ToPort": 443, + "SourceSecurityGroupId": { + "Ref": "NeptuneSecurityGroup" + }, + "Description": "Allow HTTPS from Neptune security group" + }, + { + "IpProtocol": "tcp", + "FromPort": 443, + "ToPort": 443, + "CidrIp": { + "Ref": "AllowedCIDR" + }, + "Description": "Allow HTTPS from specified CIDR" + } + ], + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${DBClusterIdentifier}-opensearch-sg" + } + } + ] + } + }, + "NeptuneDBSubnetGroup": { + "Type": "AWS::Neptune::DBSubnetGroup", + "Properties": { + "DBSubnetGroupName": { + "Fn::Sub": "${DBClusterIdentifier}-subnet-group" + }, + "DBSubnetGroupDescription": "Subnet group for Neptune cluster", + "SubnetIds": { + "Ref": "PrivateSubnetIds" + }, + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${DBClusterIdentifier}-subnet-group" + } + } + ] + } + }, + "NeptuneDBCluster": { + "Type": "AWS::Neptune::DBCluster", + "Properties": { + "DBClusterIdentifier": { + "Ref": "DBClusterIdentifier" + }, + "Engine": "neptune", + "EngineVersion": "1.3.2.0", + "ServerlessScalingConfiguration": { + "MinCapacity": { + "Ref": "MinNCUs" + }, + "MaxCapacity": { + "Ref": "MaxNCUs" + } + }, + "DBSubnetGroupName": { + "Ref": "NeptuneDBSubnetGroup" + }, + "VpcSecurityGroupIds": [ + { + "Ref": "NeptuneSecurityGroup" + } + ], + "IamAuthEnabled": true, + "BackupRetentionPeriod": 7, + "PreferredBackupWindow": "03:00-04:00", + "PreferredMaintenanceWindow": "mon:04:00-mon:05:00", + "Tags": [ + { + "Key": "Name", + "Value": { + "Ref": "DBClusterIdentifier" + } + } + ] + } + }, + "OpenSearchDomain": { + "Type": "AWS::OpenSearchService::Domain", + "Properties": { + "DomainName": { + "Fn::Sub": "${DBClusterIdentifier}-fts" + }, + "EngineVersion": "OpenSearch_2.11", + "ClusterConfig": { + "InstanceType": { + "Ref": "OpenSearchInstanceType" + }, + "InstanceCount": { + "Ref": "OpenSearchInstanceCount" + }, + "DedicatedMasterEnabled": false, + "ZoneAwarenessEnabled": { + "Fn::If": [ + "MultipleInstances", + true, + false + ] + } + }, + "EBSOptions": { + "EBSEnabled": true, + "VolumeType": "gp3", + "VolumeSize": 20 + }, + "VPCOptions": { + "SubnetIds": [ + { + "Fn::Select": [ + 0, + { + "Ref": "PrivateSubnetIds" + } + ] + } + ], + "SecurityGroupIds": [ + { + "Ref": "OpenSearchSecurityGroup" + } + ] + }, + "AccessPolicies": { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "AWS": "*" + }, + "Action": "es:*", + "Resource": { + "Fn::Sub": "arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/${DBClusterIdentifier}-fts/*" + }, + "Condition": { + "IpAddress": { + "aws:SourceIp": { + "Ref": "AllowedCIDR" + } + } + } + } + ] + }, + "AdvancedSecurityOptions": { + "Enabled": true, + "InternalUserDatabaseEnabled": false, + "MasterUserOptions": { + "MasterUserARN": { + "Fn::GetAtt": [ + "NeptuneAccessRole", + "Arn" + ] + } + } + }, + "NodeToNodeEncryptionOptions": { + "Enabled": true + }, + "EncryptionAtRestOptions": { + "Enabled": true + }, + "DomainEndpointOptions": { + "EnforceHTTPS": true, + "TLSSecurityPolicy": "Policy-Min-TLS-1-2-2019-07" + }, + "Tags": [ + { + "Key": "Name", + "Value": { + "Fn::Sub": "${DBClusterIdentifier}-fts" + } + } + ] + } + }, + "NeptuneAccessRole": { + "Type": "AWS::IAM::Role", + "Properties": { + "RoleName": { + "Ref": "IAMRoleName" + }, + "AssumeRolePolicyDocument": { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Service": [ + "ec2.amazonaws.com", + "ecs-tasks.amazonaws.com", + "lambda.amazonaws.com" + ] + }, + "Action": "sts:AssumeRole" + } + ] + }, + "ManagedPolicyArns": [ + "arn:aws:iam::aws:policy/NeptuneReadOnlyAccess" + ], + "Policies": [ + { + "PolicyName": "NeptuneIAMAccess", + "PolicyDocument": { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "neptune-db:connect", + "neptune-db:ReadDataViaQuery", + "neptune-db:WriteDataViaQuery", + "neptune-db:DeleteDataViaQuery" + ], + "Resource": { + "Fn::Sub": "arn:aws:neptune-db:${AWS::Region}:${AWS::AccountId}:${NeptuneDBCluster}/*" + } + } + ] + } + }, + { + "PolicyName": "OpenSearchAccess", + "PolicyDocument": { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "es:ESHttpGet", + "es:ESHttpPost", + "es:ESHttpPut", + "es:ESHttpDelete", + "es:ESHttpHead" + ], + "Resource": { + "Fn::Sub": "arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/${DBClusterIdentifier}-fts/*" + } + } + ] + } + } + ], + "Tags": [ + { + "Key": "Name", + "Value": { + "Ref": "IAMRoleName" + } + } + ] + } + }, + "NeptuneAccessInstanceProfile": { + "Type": "AWS::IAM::InstanceProfile", + "Properties": { + "InstanceProfileName": { + "Fn::Sub": "${IAMRoleName}-instance-profile" + }, + "Roles": [ + { + "Ref": "NeptuneAccessRole" + } + ] + } + } + }, + "Conditions": { + "MultipleInstances": { + "Fn::Not": [ + { + "Fn::Equals": [ + { + "Ref": "OpenSearchInstanceCount" + }, + 1 + ] + } + ] + } + }, + "Outputs": { + "NeptuneClusterEndpoint": { + "Description": "Neptune cluster endpoint", + "Value": { + "Fn::GetAtt": [ + "NeptuneDBCluster", + "Endpoint" + ] + }, + "Export": { + "Name": { + "Fn::Sub": "${AWS::StackName}-NeptuneEndpoint" + } + } + }, + "NeptuneClusterPort": { + "Description": "Neptune cluster port", + "Value": { + "Fn::GetAtt": [ + "NeptuneDBCluster", + "Port" + ] + }, + "Export": { + "Name": { + "Fn::Sub": "${AWS::StackName}-NeptunePort" + } + } + }, + "NeptuneSPARQLEndpoint": { + "Description": "Neptune SPARQL endpoint URL for Whyis configuration", + "Value": { + "Fn::Sub": "https://${NeptuneDBCluster.Endpoint}:${NeptuneDBCluster.Port}/sparql" + } + }, + "OpenSearchDomainEndpoint": { + "Description": "OpenSearch domain endpoint", + "Value": { + "Fn::GetAtt": [ + "OpenSearchDomain", + "DomainEndpoint" + ] + }, + "Export": { + "Name": { + "Fn::Sub": "${AWS::StackName}-OpenSearchEndpoint" + } + } + }, + "OpenSearchFTSEndpoint": { + "Description": "OpenSearch FTS endpoint URL for Whyis configuration", + "Value": { + "Fn::Sub": "https://${OpenSearchDomain.DomainEndpoint}" + } + }, + "NeptuneAccessRoleArn": { + "Description": "ARN of the IAM role for accessing Neptune and OpenSearch", + "Value": { + "Fn::GetAtt": [ + "NeptuneAccessRole", + "Arn" + ] + }, + "Export": { + "Name": { + "Fn::Sub": "${AWS::StackName}-AccessRoleArn" + } + } + }, + "NeptuneAccessInstanceProfileArn": { + "Description": "ARN of the instance profile for EC2 instances", + "Value": { + "Fn::GetAtt": [ + "NeptuneAccessInstanceProfile", + "Arn" + ] + }, + "Export": { + "Name": { + "Fn::Sub": "${AWS::StackName}-InstanceProfileArn" + } + } + }, + "Region": { + "Description": "AWS Region where resources are deployed", + "Value": { + "Ref": "AWS::Region" + } + }, + "WhyisConfigSummary": { + "Description": "Configuration values for whyis.conf", + "Value": { + "Fn::Sub": [ + "KNOWLEDGE_TYPE=neptune | KNOWLEDGE_ENDPOINT=${Endpoint} | KNOWLEDGE_REGION=${Region} | neptune_fts_endpoint=${FTSEndpoint}", + { + "Endpoint": { + "Fn::Sub": "https://${NeptuneDBCluster.Endpoint}:${NeptuneDBCluster.Port}/sparql" + }, + "Region": { + "Ref": "AWS::Region" + }, + "FTSEndpoint": { + "Fn::Sub": "https://${OpenSearchDomain.DomainEndpoint}" + } + } + ] + } + } + } +} diff --git a/whyis/config/default.py b/whyis/config/default.py index 285c582c5..1f4b52577 100644 --- a/whyis/config/default.py +++ b/whyis/config/default.py @@ -87,7 +87,7 @@ class Config: MULTIUSER = True PLUGINENGINE_NAMESPACE = "whyis" - PLUGINENGINE_PLUGINS = ['whyis_sparql_entity_resolver'] + PLUGINENGINE_PLUGINS = ['whyis_fuseki'] SECURITY_EMAIL_SENDER = "Name " SECURITY_FLASH_MESSAGES = True diff --git a/whyis/database/database_utils.py b/whyis/database/database_utils.py index a36573558..dbd40e1c4 100644 --- a/whyis/database/database_utils.py +++ b/whyis/database/database_utils.py @@ -51,11 +51,22 @@ def post(graph): def delete(c): store.remove((None, None, None), c) + + def raw_sparql_request(self, method, params=None, data=None, headers=None): + """ + Local stores don't support raw SPARQL endpoint requests. + This method is here for interface compatibility but raises an error. + """ + raise NotImplementedError( + "Local stores (memory, oxigraph) don't support raw SPARQL endpoint requests. " + "Use the store's query() method instead." + ) store.publish = publish store.put = put store.delete = delete store.post = post + store.raw_sparql_request = raw_sparql_request return store @driver(name="memory") @@ -93,7 +104,7 @@ def _remote_sparql_store_protocol(store): Returns: The store object with GSP methods attached """ - def publish(data, format='text/trig;charset=utf-8'): + def publish(data, format='application/trig'): s = requests.session() s.keep_alive = False @@ -102,7 +113,10 @@ def publish(data, format='text/trig;charset=utf-8'): ) if store.auth is not None: kwargs['auth'] = store.auth - r = s.post(store.gsp_endpoint, data=data, **kwargs) + r = s.post(store.gsp_endpoint, + params=dict(default='true'), + data=data, + **kwargs) if not r.ok: print(f"Error: {store.gsp_endpoint} publish returned status {r.status_code}:\n{r.text}") @@ -114,7 +128,7 @@ def put(graph): s.keep_alive = False kwargs = dict( - headers={'Content-Type':'text/turtle;charset=utf-8'}, + headers={'Content-Type':'text/turtle'}, ) if store.auth is not None: kwargs['auth'] = store.auth @@ -134,11 +148,11 @@ def post(graph): s.keep_alive = False kwargs = dict( - headers={'Content-Type':'text/trig;charset=utf-8'}, + headers={'Content-Type':'application/trig'}, ) if store.auth is not None: kwargs['auth'] = store.auth - r = s.post(store.gsp_endpoint, data=data, **kwargs) + r = s.post(store.gsp_endpoint, params=dict(default="true"), data=data, **kwargs) if not r.ok: print(f"Error: {store.gsp_endpoint} POST returned status {r.status_code}:\n{r.text}") @@ -209,24 +223,36 @@ def sparql_driver(config): return graph def create_query_store(store): - new_store = WhyisSPARQLStore(endpoint=store.query_endpoint, - query_endpoint=store.query_endpoint, -# method="POST", -# returnFormat='json', - node_to_sparql=node_to_sparql) + """ + Create a read-only query store from an existing store. + + This function creates a query-only store that can be used for read operations + without update capabilities. + + Args: + store: The source store object + + Returns: + A new store configured for queries only + """ + new_store = WhyisSPARQLStore( + endpoint=store.query_endpoint, + query_endpoint=store.query_endpoint, + node_to_sparql=node_to_sparql + ) return new_store # memory_graphs = collections.defaultdict(ConjunctiveGraph) def engine_from_config(config): engine = None - if "_endpoint" in config: + if '_type' in config: + t = config['_type'] + engine = drivers[t](config) + elif "_endpoint" in config: engine = drivers['sparql'](config) elif '_store' in config: engine = drivers['oxigraph'](config) - elif '_memory' in config: - engine = drivers['memory'](config) else: - t = config['_type'] - engine = drivers[t](config) + engine = drivers['memory'](config) return engine diff --git a/whyis/database/whyis_sparql_update_store.py b/whyis/database/whyis_sparql_update_store.py index bd61c6edb..c926786a3 100644 --- a/whyis/database/whyis_sparql_update_store.py +++ b/whyis/database/whyis_sparql_update_store.py @@ -31,3 +31,48 @@ def query(self, query, initNs=None, initBindings=None, queryGraph=None, DEBUG=Fa query = re.sub(r'where\s+{', 'WHERE {%s' % values, query, count=1, flags=re.I) return SPARQLUpdateStore.query(self, query, initNs=initNs, initBindings=None, queryGraph=queryGraph, DEBUG=DEBUG) + + def raw_sparql_request(self, method, params=None, data=None, headers=None): + """ + Make a raw SPARQL endpoint request with authentication. + + This method is used by the SPARQL blueprint to proxy requests. + For stores with HTTP Basic Auth, it uses the store's auth attribute. + + Args: + method (str): HTTP method ('GET' or 'POST') + params (dict): Query parameters + data: Request body data + headers (dict): Request headers + + Returns: + Response object from requests library + """ + import requests + from urllib.parse import urlencode + + # Use query_endpoint for all SPARQL requests + url = self.query_endpoint + + # Build URL with parameters if any + if params: + qsa = "?" + urlencode(params) + url = url + qsa + + # Prepare headers + request_headers = dict(headers) if headers else {} + + # Make request with authentication if available + kwargs = {} + if hasattr(self, 'auth') and self.auth is not None: + kwargs['auth'] = self.auth + + response = requests.request( + method=method.upper(), + url=url, + headers=request_headers, + data=data, + **kwargs + ) + + return response diff --git a/whyis/default_vocab.ttl b/whyis/default_vocab.ttl index 67687ca9e..5386a3d9a 100644 --- a/whyis/default_vocab.ttl +++ b/whyis/default_vocab.ttl @@ -481,22 +481,8 @@ np:Nanopublication a owl:Class; whyis:hasDescribe "nanopub_describe.json"; whyis:hasView "nanopublication_view.html". -# a whyis:searchView. -# whyis:searchView whyis:hasView "search.html". - -# a whyis:searchView. - -# whyis:searchView whyis:hasView "search-view.html". - - a whyis:searchApi . - -whyis:searchApi whyis:hasView "search-api.json". - - a whyis:search . - -whyis:HomePage whyis:searchView "search.html"; - whyis:searchData "search.json". +whyis:HomePage whyis:searchView "search.html". whyis:searchView rdfs:subPropertyOf whyis:hasView; dc:identifier "search". diff --git a/whyis/filters.py b/whyis/filters.py index ecef11c30..695966ff1 100644 --- a/whyis/filters.py +++ b/whyis/filters.py @@ -115,6 +115,14 @@ def query_filter(query, graph=app.db, prefixes=None, values=None, limit=None, of if offset is not None: query = query + '\n offset %s' % int(offset) results = graph.query(query, **params) + if type(results) == tuple: + raise rdflib.exceptions.ParserError(f'''{results[1]}: +{results[2] if results[2] is not None else''} +query: +{query} +bindings: +{values} +''') if raw: return results elif as_graph: diff --git a/whyis/plugins/sparql_entity_resolver/__init__.py b/whyis/plugins/fuseki/__init__.py similarity index 100% rename from whyis/plugins/sparql_entity_resolver/__init__.py rename to whyis/plugins/fuseki/__init__.py diff --git a/whyis/plugins/sparql_entity_resolver/plugin.py b/whyis/plugins/fuseki/plugin.py similarity index 87% rename from whyis/plugins/sparql_entity_resolver/plugin.py rename to whyis/plugins/fuseki/plugin.py index e2659a3a0..1bedd779a 100644 --- a/whyis/plugins/sparql_entity_resolver/plugin.py +++ b/whyis/plugins/fuseki/plugin.py @@ -1,6 +1,7 @@ from whyis.plugin import Plugin, EntityResolverListener import rdflib from flask import current_app +from flask_pluginengine import PluginBlueprint, current_plugin prefixes = dict( @@ -14,7 +15,7 @@ dc = rdflib.URIRef("http://purl.org/dc/terms/") ) -class SPARQLEntityResolver(EntityResolverListener): +class FusekiEntityResolver(EntityResolverListener): context_query=""" optional { @@ -69,6 +70,7 @@ def __init__(self, database="knowledge"): self.database = database def on_resolve(self, term, type=None, context=None, label=True): + print(f'Searching {self.database} for {term}') graph = current_app.databases[self.database] context_query = '' if context is not None: @@ -93,14 +95,18 @@ def on_resolve(self, term, type=None, context=None, label=True): results.append(result) return results +plugin_blueprint = PluginBlueprint('fuseki', __name__) -class SPARQLEntityResolverPlugin(Plugin): +class FusekiSearchPlugin(Plugin): resolvers = { - "sparql" : SPARQLEntityResolver, - "fuseki" : SPARQLEntityResolver + "sparql" : FusekiEntityResolver, + "fuseki" : FusekiEntityResolver } + def create_blueprint(self): + return plugin_blueprint + def init(self): resolver_type = self.app.config.get('RESOLVER_TYPE', 'fuseki') resolver_db = self.app.config.get('RESOLVER_DB', "knowledge") diff --git a/whyis/templates/search.json b/whyis/plugins/fuseki/templates/search.json similarity index 100% rename from whyis/templates/search.json rename to whyis/plugins/fuseki/templates/search.json diff --git a/whyis/plugins/fuseki/vocab.ttl b/whyis/plugins/fuseki/vocab.ttl new file mode 100644 index 000000000..b02684bf5 --- /dev/null +++ b/whyis/plugins/fuseki/vocab.ttl @@ -0,0 +1,3 @@ +@prefix whyis: . + +whyis:HomePage whyis:searchData "whyis_fuseki:search.json". diff --git a/whyis/plugins/neptune/NeptuneBoto3Store.md b/whyis/plugins/neptune/NeptuneBoto3Store.md new file mode 100644 index 000000000..8edf2419c --- /dev/null +++ b/whyis/plugins/neptune/NeptuneBoto3Store.md @@ -0,0 +1,265 @@ +# NeptuneBoto3Store + +A subclass of RDFlib's `SPARQLUpdateStore` that uses boto3 for AWS Neptune authentication with dynamic credential retrieval from EC2 instance metadata. + +## Overview + +`NeptuneBoto3Store` extends `WhyisSPARQLUpdateStore` to provide robust AWS authentication for Amazon Neptune databases. It leverages boto3's credential management system and supports dynamic credential retrieval from EC2 instance metadata via `InstanceMetadataProvider` and `InstanceMetadataFetcher`. + +## Features + +- **boto3 Credential Management**: Uses boto3's full credential chain (environment variables, credential files, IAM roles) +- **Dynamic Instance Metadata Retrieval**: Automatically fetches credentials from EC2 instance metadata service when running on EC2 +- **AWS SigV4 Request Signing**: All HTTP requests are signed with AWS SigV4 signatures +- **Automatic Credential Refresh**: Credentials are dynamically retrieved for each request, ensuring they're always current +- **Fallback Mechanism**: Falls back to boto3 session credentials if instance metadata is unavailable +- **RDFlib Compatible**: Works seamlessly with RDFlib's `ConjunctiveGraph` + +## Installation + +Install the required dependencies: + +```bash +pip install boto3 botocore +``` + +## Usage + +### Basic Usage + +```python +from rdflib import ConjunctiveGraph +from whyis.plugins.neptune import NeptuneBoto3Store + +# Create the store +store = NeptuneBoto3Store( + query_endpoint='https://my-neptune.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/sparql', + update_endpoint='https://my-neptune.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/sparql', + region_name='us-east-1' +) + +# Create a graph with the store +graph = ConjunctiveGraph(store) + +# Use the graph for SPARQL operations +results = graph.query(""" + SELECT ?s ?p ?o + WHERE { + ?s ?p ?o . + } + LIMIT 10 +""") +``` + +### Advanced Configuration + +```python +import boto3 +from whyis.plugins.neptune import NeptuneBoto3Store + +# Use a custom boto3 session +session = boto3.Session( + aws_access_key_id='YOUR_KEY', + aws_secret_access_key='YOUR_SECRET', + region_name='us-east-1' +) + +store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-east-1', + service_name='neptune-db', # Custom service name + boto3_session=session, # Use custom session + use_instance_metadata=True # Enable instance metadata (default) +) +``` + +### Disabling Instance Metadata + +If you don't want to use instance metadata and prefer only boto3 session credentials: + +```python +store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-east-1', + use_instance_metadata=False # Disable instance metadata +) +``` + +## Credential Retrieval + +`NeptuneBoto3Store` retrieves credentials in the following order: + +1. **Instance Metadata Provider** (if `use_instance_metadata=True`): + - Uses `InstanceMetadataFetcher` to fetch credentials from EC2 instance metadata service + - Automatically used when running on EC2 with an IAM role + - Credentials are dynamically refreshed + +2. **boto3 Session Credentials** (fallback): + - Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`) + - AWS credentials file (`~/.aws/credentials`) + - IAM roles (when running on EC2/ECS/Lambda) + +## Parameters + +### `__init__` Parameters + +- **`query_endpoint`** (str, required): SPARQL query endpoint URL +- **`update_endpoint`** (str, required): SPARQL update endpoint URL +- **`region_name`** (str, required): AWS region where Neptune is located +- **`service_name`** (str, optional): AWS service name for signing (default: `'neptune-db'`) +- **`boto3_session`** (boto3.Session, optional): Custom boto3 session. If not provided, a new session is created. +- **`use_instance_metadata`** (bool, optional): Enable dynamic credential retrieval from EC2 instance metadata (default: `True`) +- **`**kwargs`**: Additional arguments passed to `WhyisSPARQLUpdateStore` + +## Methods + +### `_get_credentials()` + +Dynamically retrieves current AWS credentials, prioritizing instance metadata when available. + +**Returns**: Frozen credentials object with `access_key`, `secret_key`, and `token` attributes. + +### `_sign_request(method, url, headers=None, body=None)` + +Signs an HTTP request using AWS SigV4 with dynamically retrieved credentials. + +**Parameters**: +- `method` (str): HTTP method (GET, POST, etc.) +- `url` (str): Full URL including query parameters +- `headers` (dict): HTTP headers +- `body`: Request body (str or bytes) + +**Returns**: Dictionary of signed headers including AWS signature. + +### `_request(method, url, headers=None, body=None)` + +Makes an authenticated HTTP request to Neptune. + +**Returns**: Response object from requests library. + +## EC2 Instance Metadata + +When running on EC2 instances with IAM roles, `NeptuneBoto3Store` automatically: + +1. Discovers the IAM role attached to the instance +2. Fetches temporary credentials from the instance metadata service +3. Refreshes credentials automatically when they expire +4. Falls back to other credential sources if metadata is unavailable + +This is ideal for production deployments where you want to avoid managing long-lived credentials. + +## Example: Running on EC2 + +```python +# When running on an EC2 instance with an IAM role, +# no credential configuration is needed! +from rdflib import ConjunctiveGraph +from whyis.plugins.neptune import NeptuneBoto3Store + +store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-east-1' + # Credentials will be automatically retrieved from instance metadata +) + +graph = ConjunctiveGraph(store) +# Use the graph... +``` + +## Comparison with Existing Neptune Driver + +The Neptune plugin includes two authentication mechanisms: + +### 1. `neptune_driver` (Existing) +- Uses `aws-requests-auth` library +- Requires explicit AWS credentials in environment variables +- Uses `WhyisSPARQLUpdateStore` with custom requests session + +### 2. `NeptuneBoto3Store` (New) +- Uses boto3's comprehensive credential management +- Supports dynamic instance metadata retrieval +- Better integration with AWS SDK patterns +- Automatic credential refresh +- Recommended for new projects + +## Security Considerations + +- **No Hardcoded Credentials**: Never hardcode AWS credentials in your code +- **IAM Roles**: Use IAM roles when possible (EC2, ECS, Lambda) +- **Temporary Credentials**: Instance metadata provides temporary, auto-rotating credentials +- **Least Privilege**: Ensure IAM policies grant only necessary permissions +- **HTTPS Only**: Always use HTTPS endpoints for Neptune + +## IAM Policy Example + +Your IAM role or user should have permissions like: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "neptune-db:connect", + "neptune-db:ReadDataViaQuery", + "neptune-db:WriteDataViaQuery" + ], + "Resource": "arn:aws:neptune-db:region:account-id:cluster-id/*" + } + ] +} +``` + +## Error Handling + +The store raises appropriate errors for common issues: + +```python +# Missing region +try: + store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql' + ) +except ValueError as e: + print(f"Error: {e}") # "region_name is required for NeptuneBoto3Store" + +# No credentials available +try: + store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-east-1', + use_instance_metadata=False + ) +except ValueError as e: + print(f"Error: {e}") # "No AWS credentials found..." +``` + +## Testing + +The implementation includes comprehensive unit tests. Run them with: + +```bash +pytest tests/unit/test_neptune_boto3_store.py -v +``` + +Test coverage includes: +- Initialization with various configurations +- Instance metadata provider setup +- Dynamic credential retrieval +- Request signing with different credential types +- Integration with RDFlib +- Error handling + +## References + +- [AWS Neptune IAM Authentication](https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth.html) +- [boto3 Credentials Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) +- [EC2 Instance Metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html) +- [AWS SigV4 Signing](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html) +- [RDFlib Documentation](https://rdflib.readthedocs.io/) diff --git a/whyis/plugins/neptune/README.md b/whyis/plugins/neptune/README.md new file mode 100644 index 000000000..ef23b99d0 --- /dev/null +++ b/whyis/plugins/neptune/README.md @@ -0,0 +1,281 @@ +# Neptune Plugin - AWS IAM Authentication Support + +## Overview + +This plugin provides AWS IAM authentication support for Amazon Neptune databases. It includes two authentication mechanisms: + +1. **neptune_driver**: Uses aws-requests-auth for authentication (existing implementation) +2. **NeptuneBoto3Store**: RDFlib store subclass using boto3 with dynamic instance metadata support (new) + +The plugin registers a "neptune" database driver that uses AWS SigV4 request signing for all SPARQL queries, updates, and Graph Store Protocol operations. It also extends Neptune full-text search capabilities. + +## Features + +- **AWS IAM Authentication**: Uses AWS SigV4 request signing for secure access to Neptune databases +- **Automatic Credential Management**: Leverages boto3 for AWS credential discovery (environment variables, IAM roles, etc.) +- **Dynamic Instance Metadata**: Automatically retrieves credentials from EC2 instance metadata when available +- **Full Text Search Support**: Passes authentication through to Neptune's full-text search queries +- **Graph Store Protocol**: Supports authenticated PUT, POST, DELETE, and publish operations +- **Configuration-Based**: Easy setup via Flask configuration + +## Authentication Options + +### Option 1: neptune_driver (Existing) + +Uses the existing `neptune_driver` function with aws-requests-auth. Suitable for applications already using this approach. + +**Dependencies**: `aws_requests_auth` + +### Option 2: NeptuneBoto3Store (New - Recommended) + +A new RDFlib SPARQL store subclass that uses boto3 for credential management with automatic instance metadata support. Recommended for new projects. + +**Dependencies**: `boto3`, `botocore` + +**See**: [NeptuneBoto3Store.md](NeptuneBoto3Store.md) for detailed documentation. + +## Installation and Setup + +### 1. Enable the Neptune Plugin + +To enable the Neptune plugin in your Whyis knowledge graph application, add it to your application's configuration file (typically `whyis.conf` or `system.conf`): + +```python +# Enable the Neptune plugin +PLUGINENGINE_PLUGINS = ['neptune'] + +# Or if you already have other plugins enabled: +PLUGINENGINE_PLUGINS = ['neptune', 'other_plugin'] +``` + +### 2. Install Required Dependencies + +#### For neptune_driver (Existing): + +``` +aws_requests_auth +``` + +#### For NeptuneBoto3Store (New): + +``` +boto3 +botocore +``` + +Then install them in your application environment: + +```bash +pip install -r requirements.txt +``` + +**Note**: These dependencies are only needed if you're using Neptune with IAM authentication. They are not required for core Whyis functionality. + +## Configuration + +After enabling the plugin and installing dependencies, configure your Whyis application to use Neptune with IAM authentication: + +### System Configuration (system.conf) + +```python +# Neptune SPARQL endpoint +KNOWLEDGE_TYPE = 'neptune' +KNOWLEDGE_ENDPOINT = 'https://my-cluster.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/sparql' + +# AWS region (required for Neptune driver) +KNOWLEDGE_REGION = 'us-east-1' + +# Optional: Custom service name (defaults to 'neptune-db') +KNOWLEDGE_SERVICE_NAME = 'neptune-db' + +# Optional: Separate Graph Store Protocol endpoint +KNOWLEDGE_GSP_ENDPOINT = 'https://my-cluster.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/data' + +# Optional: Default graph URI +KNOWLEDGE_DEFAULT_GRAPH = 'http://example.org/default-graph' + +# Optional: Use temporary UUID graphs for GSP operations (defaults to True) +# When True, ensures graph-aware semantics for RDF data with named graphs +KNOWLEDGE_USE_TEMP_GRAPH = True + +# Neptune Full-Text Search endpoint +neptune_fts_endpoint = 'https://search-my-domain.us-east-1.es.amazonaws.com' +``` + + +### AWS Credentials + +The Neptune driver uses environment variables for AWS credential management. Credentials can be provided via: + +1. **Environment Variables** (required): + ```bash + export AWS_ACCESS_KEY_ID=your_access_key + export AWS_SECRET_ACCESS_KEY=your_secret_key + export AWS_SESSION_TOKEN=your_session_token # Optional, for temporary credentials + ``` + +2. **IAM Roles**: If running on EC2 or ECS with an IAM role, set the environment variables from the role's credentials + +3. **AWS Credentials File** (`~/.aws/credentials`): + ```ini + [default] + aws_access_key_id = your_access_key + aws_secret_access_key = your_secret_key + ``` + Then export them: + ```bash + export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id) + export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key) + ``` + +## How It Works + +### Driver Registration + +The Neptune plugin automatically registers a "neptune" database driver when initialized. This driver: + +1. Creates Neptune SPARQL stores with AWS IAM authentication +2. Signs all HTTP requests with AWS SigV4 signatures +3. Passes authentication to full-text search queries +4. Provides authenticated Graph Store Protocol operations + +### Graph-Aware Semantics with Temporary UUID Graphs + +By default (when `KNOWLEDGE_USE_TEMP_GRAPH = True`), the Neptune driver ensures graph-aware semantics for all Graph Store Protocol (GSP) operations: + +- **Problem**: Without this feature, Neptune's GSP implementation inserts triples into an explicit default graph (using `?default` parameter), causing all RDF data to lose its graph structure even when using graph-aware formats like TriG. + +- **Solution**: The driver generates a temporary UUID-based graph URI (e.g., `urn:uuid:...`) for each GSP operation, posts/puts data to that temporary graph, and then deletes it. This ensures that: + - Named graphs from TriG data are preserved correctly + - Graph-aware RDF data maintains its structure + - Union semantics are properly applied instead of explicit default graph semantics + +- **Configuration**: Set `KNOWLEDGE_USE_TEMP_GRAPH = False` to disable this behavior and use legacy default graph semantics. + +### Request Signing + +All requests to Neptune are automatically signed with AWS SigV4: + +- **SPARQL Queries**: SELECT, ASK, CONSTRUCT, DESCRIBE queries +- **SPARQL Updates**: INSERT, DELETE, MODIFY operations +- **Graph Store Protocol**: GET, PUT, POST, DELETE on named graphs +- **Full-Text Search**: Neptune FTS queries via SERVICE blocks + +### Usage in SPARQL Queries + +Full-text search queries work seamlessly with authentication: + +```sparql +PREFIX fts: +PREFIX dc: + +SELECT ?node ?label WHERE { + SERVICE fts:search { + fts:config neptune-fts:query "search term" . + fts:config neptune-fts:endpoint "https://your-fts-endpoint" . + fts:config neptune-fts:field dc:title . + fts:config neptune-fts:return ?node . + } + ?node dc:title ?label . +} +``` + +The Neptune driver ensures that AWS credentials are attached to the full-text search requests. + +## API + +### Using NeptuneBoto3Store Directly (Recommended for New Projects) + +For direct use of the boto3-based store with instance metadata support: + +```python +from rdflib import ConjunctiveGraph +from whyis.plugins.neptune import NeptuneBoto3Store + +# Create store with automatic instance metadata retrieval +store = NeptuneBoto3Store( + query_endpoint='https://neptune.amazonaws.com:8182/sparql', + update_endpoint='https://neptune.amazonaws.com:8182/sparql', + region_name='us-east-1' +) + +# Create graph +graph = ConjunctiveGraph(store) + +# Use the graph for SPARQL operations +results = graph.query(""" + SELECT ?s ?p ?o WHERE { + ?s ?p ?o . + } LIMIT 10 +""") +``` + +**Key Features**: +- Automatic credential retrieval from EC2 instance metadata +- Fallback to boto3 session credentials +- Dynamic credential refresh +- No explicit credential configuration needed on EC2 + +See [NeptuneBoto3Store.md](NeptuneBoto3Store.md) for complete documentation. + +### Neptune Driver Function (Existing) + +```python +from whyis.plugins.neptune.plugin import neptune_driver + +config = { + '_endpoint': 'https://neptune.amazonaws.com:8182/sparql', + '_region': 'us-east-1', + '_service_name': 'neptune-db', # Optional + '_gsp_endpoint': 'https://neptune.amazonaws.com:8182/data', # Optional + '_default_graph': 'http://example.org/graph' # Optional +} + +graph = neptune_driver(config) +``` + +## Security Considerations + +- **Credentials**: Never commit AWS credentials to source control +- **IAM Policies**: Ensure Neptune IAM policies grant only necessary permissions +- **Temporary Credentials**: Use STS temporary credentials or IAM roles when possible +- **HTTPS**: Always use HTTPS endpoints for Neptune +- **VPC**: Consider using VPC endpoints for Neptune access within AWS + +## Troubleshooting + +### Authentication Errors + +If you see authentication errors: + +1. Verify AWS credentials are properly configured +2. Check that the IAM policy grants Neptune access: + ```json + { + "Effect": "Allow", + "Action": [ + "neptune-db:connect", + "neptune-db:ReadDataViaQuery", + "neptune-db:WriteDataViaQuery" + ], + "Resource": "arn:aws:neptune-db:region:account:cluster-id/*" + } + ``` +3. Ensure the region is correctly specified +4. Verify the Neptune endpoint URL is correct + +### Connection Errors + +If you cannot connect to Neptune: + +1. Check VPC security groups allow access +2. Verify network connectivity to Neptune endpoint +3. Ensure the endpoint URL includes the port (typically 8182) +4. Check that Neptune cluster is available + +## References + +- [AWS Neptune IAM Authentication](https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth.html) +- [AWS Neptune Full-Text Search](https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html) +- [AWS SigV4 Signing](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html) +- [boto3 Credentials](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) diff --git a/whyis/plugins/neptune/__init__.py b/whyis/plugins/neptune/__init__.py new file mode 100644 index 000000000..639424e72 --- /dev/null +++ b/whyis/plugins/neptune/__init__.py @@ -0,0 +1,2 @@ +from .plugin import * +from .neptune_boto3_store import NeptuneBoto3Store diff --git a/whyis/plugins/neptune/neptune_boto3_store.py b/whyis/plugins/neptune/neptune_boto3_store.py new file mode 100644 index 000000000..61c90de89 --- /dev/null +++ b/whyis/plugins/neptune/neptune_boto3_store.py @@ -0,0 +1,473 @@ +# -*- coding:utf-8 -*- +""" +Neptune Boto3 SPARQL Store + +This module provides a subclass of RDFlib's SPARQLUpdateStore that uses boto3 +for AWS credential management and request signing. This provides more robust +credential handling compared to manual AWS authentication. + +The NeptuneBoto3Store class automatically signs all HTTP requests to Neptune +using AWS SigV4 signatures, leveraging boto3's built-in credential discovery +mechanisms including InstanceMetadataProvider for EC2 instances. +""" + +import logging +from urllib.parse import urlparse, parse_qs +from io import BytesIO + +try: + import boto3 + from botocore.auth import SigV4Auth + from botocore.awsrequest import AWSRequest + from botocore.credentials import InstanceMetadataProvider, InstanceMetadataFetcher +except ImportError: + boto3 = None + SigV4Auth = None + AWSRequest = None + InstanceMetadataProvider = None + InstanceMetadataFetcher = None + +from whyis.database.whyis_sparql_update_store import WhyisSPARQLUpdateStore + +logger = logging.getLogger(__name__) + + +class NeptuneBoto3Store(WhyisSPARQLUpdateStore): + """ + A SPARQL store that uses boto3 credentials for AWS Neptune authentication. + + This store extends WhyisSPARQLUpdateStore and automatically signs all HTTP + requests using AWS SigV4 signatures via boto3's request signing capabilities. + + Credentials are dynamically retrieved using boto3's credential chain, with + special support for EC2 instance metadata via InstanceMetadataProvider and + InstanceMetadataFetcher for IAM role credentials. + + Attributes: + region_name (str): AWS region where Neptune is located + service_name (str): AWS service name for signing (default: 'neptune-db') + boto3_session: Boto3 session for credential management + use_instance_metadata (bool): Whether to prioritize instance metadata credentials + + Example: + >>> from rdflib import ConjunctiveGraph + >>> store = NeptuneBoto3Store( + ... query_endpoint='https://neptune.amazonaws.com:8182/sparql', + ... update_endpoint='https://neptune.amazonaws.com:8182/sparql', + ... region_name='us-east-1' + ... ) + >>> graph = ConjunctiveGraph(store) + """ + + def __init__(self, query_endpoint=None, update_endpoint=None, + region_name=None, service_name='neptune-db', + boto3_session=None, use_instance_metadata=True, **kwargs): + """ + Initialize the Neptune Boto3 Store. + + Args: + query_endpoint (str): SPARQL query endpoint URL + update_endpoint (str): SPARQL update endpoint URL + region_name (str): AWS region name (required) + service_name (str): AWS service name for signing (default: 'neptune-db') + boto3_session: Optional boto3 Session object. If not provided, + a new session will be created using default credentials. + use_instance_metadata (bool): If True (default), dynamically fetch credentials + from EC2 instance metadata when available. + **kwargs: Additional arguments passed to WhyisSPARQLUpdateStore + + Raises: + ValueError: If region_name is not provided + ImportError: If boto3 is not installed + """ + # Import boto3 here so it's only required when this store is used + if boto3 is None: + raise ImportError( + "boto3 is required for NeptuneBoto3Store. " + "Install it with: pip install boto3" + ) + + if not region_name: + raise ValueError("region_name is required for NeptuneBoto3Store") + + # Store AWS configuration + self.region_name = region_name + self.service_name = service_name + self.use_instance_metadata = use_instance_metadata + + # Create or use provided boto3 session + if boto3_session is None: + self.boto3_session = boto3.Session() + else: + self.boto3_session = boto3_session + + # Set up instance metadata provider if requested + self._instance_metadata_provider = None + if self.use_instance_metadata: + try: + # Create instance metadata fetcher and provider + fetcher = InstanceMetadataFetcher() + self._instance_metadata_provider = InstanceMetadataProvider( + iam_role_fetcher=fetcher + ) + logger.info("Instance metadata provider initialized for dynamic credential retrieval") + except Exception as e: + logger.warning(f"Could not initialize instance metadata provider: {e}") + self._instance_metadata_provider = None + + # Get initial credentials from boto3 session + self.credentials = self.boto3_session.get_credentials() + if self.credentials is None and self._instance_metadata_provider is None: + raise ValueError( + "No AWS credentials found. Configure credentials using " + "environment variables, ~/.aws/credentials, or IAM roles." + ) + + # Initialize parent class without custom_requests + # We'll override the methods that make HTTP requests + super().__init__( + query_endpoint=query_endpoint, + update_endpoint=update_endpoint, + **kwargs + ) + + logger.info( + f"Initialized NeptuneBoto3Store with region={region_name}, " + f"service={service_name}, use_instance_metadata={use_instance_metadata}" + ) + + def _get_credentials(self): + """ + Get current AWS credentials, dynamically fetching from instance metadata if configured. + + This method attempts to get credentials in the following order: + 1. If use_instance_metadata is True, try InstanceMetadataProvider first + 2. Fall back to boto3 session credentials + + Returns: + Frozen credentials object with access_key, secret_key, and token + """ + credentials = None + + # Try instance metadata provider first if configured + if self.use_instance_metadata and self._instance_metadata_provider: + try: + credentials = self._instance_metadata_provider.load() + if credentials: + logger.debug("Using credentials from instance metadata provider") + return credentials.get_frozen_credentials() + except Exception as e: + logger.debug(f"Could not load credentials from instance metadata: {e}") + + # Fall back to boto3 session credentials + credentials = self.boto3_session.get_credentials() + if credentials: + logger.debug("Using credentials from boto3 session") + return credentials.get_frozen_credentials() + + raise ValueError( + "Unable to locate credentials. Configure credentials using " + "environment variables, ~/.aws/credentials, IAM roles, or instance metadata." + ) + + def _sign_request(self, method, url, headers=None, body=None): + """ + Sign an HTTP request using AWS SigV4 with boto3 credentials. + + Dynamically retrieves credentials (potentially from instance metadata) + for each request to ensure credentials are always current. + + Args: + method (str): HTTP method (GET, POST, etc.) + url (str): Full URL including query parameters + headers (dict): HTTP headers + body: Request body (str or bytes) + + Returns: + dict: Updated headers with AWS signature + """ + # Get current credentials (handles credential refresh and instance metadata) + frozen_credentials = self._get_credentials() + + # Parse URL to separate path and query string + parsed = urlparse(url) + + # Create AWS request object + request = AWSRequest( + method=method, + url=url, + headers=headers or {}, + data=body + ) + + # Sign the request + signer = SigV4Auth(frozen_credentials, self.service_name, self.region_name) + signer.add_auth(request) + + return dict(request.headers) + + def response_mime_types(self): + """ + Return the MIME types to use in Accept headers for SPARQL queries. + + This method ensures that NeptuneBoto3Store always has this method available, + even if there are issues with the MRO or inheritance chain. + + Returns: + str: Comma-separated list of MIME types + """ + # Try to use parent class implementation first + try: + return super().response_mime_types() + except AttributeError: + # Fallback to default MIME types if parent doesn't have it + return "application/sparql-results+xml, application/sparql-results+json, application/rdf+xml" + + def _request(self, method, url, headers=None, body=None): + """ + Make an authenticated HTTP request to Neptune. + + This method signs the request using AWS SigV4 and sends it. + + Args: + method (str): HTTP method + url (str): Request URL + headers (dict): Request headers + body: Request body + + Returns: + Response object from requests library + + Raises: + IOError: If the request fails with detailed error information + """ + import requests + + try: + # Sign the request + signed_headers = self._sign_request(method, url, headers, body) + + # Make the request + session = requests.Session() + response = session.request( + method=method, + url=url, + headers=signed_headers, + data=body + ) + + # Handle HTTP errors (non-200 status codes don't trigger exceptions) + if response.status_code != 200: + error_msg = f"Neptune request failed: {method} {url}\n" + error_msg += f"Status code: {response.status_code}\n" + error_msg += f"Response: {response.text[:1000]}" + logger.error(error_msg) + raise IOError(error_msg) + + return response + except IOError: + # Re-raise IOError as-is (from our HTTP error handling above) + raise + except Exception as e: + # Handle other exceptions (network errors, etc.) + error_msg = f"Neptune request failed: {method} {url}\n" + error_msg += f"Error: {type(e).__name__}: {str(e)}" + logger.error(error_msg) + raise IOError(error_msg) from e + + def _connector_query(self, query, default_graph=None, named_graph=None): + """ + Execute a SPARQL query at the connector level with AWS SigV4 authentication. + + This is the low-level method that makes the actual HTTP request. + It's called by _query() which is called by the high-level query() method. + + Args: + query (str): SPARQL query string + default_graph: Default graph URI + named_graph: Named graph URI + + Returns: + Query results from rdflib Result.parse() + """ + from urllib.parse import urlencode + from io import BytesIO + from rdflib.query import Result + + if not self.query_endpoint: + raise ValueError("Query endpoint not set!") + + # Build query parameters + params = {} + if default_graph is not None: + from rdflib.term import BNode + if not isinstance(default_graph, BNode): + params["default-graph-uri"] = default_graph + + # Build headers + headers = {"Accept": self.response_mime_types()} + + # Make authenticated request based on method + if self.method == "POST": + headers["Content-Type"] = "application/sparql-query" + qsa = "?" + urlencode(params) if params else "" + url = self.query_endpoint + qsa + + response = self._request( + method="POST", + url=url, + headers=headers, + body=query.encode('utf-8') + ) + else: # GET or POST_FORM + params["query"] = query + qsa = "?" + urlencode(params) + url = self.query_endpoint + qsa + + if self.method == "GET": + response = self._request( + method="GET", + url=url, + headers=headers + ) + else: # POST_FORM + headers["Content-Type"] = "application/x-www-form-urlencoded" + response = self._request( + method="POST", + url=self.query_endpoint, + headers=headers, + body=urlencode(params).encode('utf-8') + ) + + # Handle HTTP errors with improved reporting + if not response.ok: + error_msg = f"Neptune SPARQL query failed\n" + error_msg += f"Status: {response.status_code}\n" + error_msg += f"URL: {url}\n" + error_msg += f"Query: {query[:200]}...\n" if len(query) > 200 else f"Query: {query}\n" + error_msg += f"Response: {response.text[:500]}" + logger.error(error_msg) + raise IOError(error_msg) + + # Parse the response + content_type = response.headers.get('Content-Type', 'application/sparql-results+xml') + if ';' in content_type: + content_type = content_type.split(';')[0] + + return Result.parse(BytesIO(response.content), content_type=content_type) + + def _query(self, *args, **kwargs): + """ + Override SPARQLStore._query to use authenticated requests. + + This method increments the query counter and calls the connector-level query. + """ + self._queries += 1 + return self._connector_query(*args, **kwargs) + + def _connector_update(self, query, default_graph=None, named_graph=None): + """ + Execute a SPARQL update at the connector level with AWS SigV4 authentication. + + This is the low-level method that makes the actual HTTP request. + It's called by _update() which is called by the high-level update() method. + + Args: + query (str): SPARQL update string + default_graph: Default graph URI + named_graph: Named graph URI + """ + from urllib.parse import urlencode + + if not self.update_endpoint: + raise ValueError("Update endpoint not set!") + + # Build parameters + params = {} + if default_graph is not None: + params["using-graph-uri"] = default_graph + if named_graph is not None: + params["using-named-graph-uri"] = named_graph + + # Build headers + headers = { + "Accept": self.response_mime_types(), + "Content-Type": "application/sparql-update; charset=UTF-8" + } + + # Build URL with parameters + qsa = "?" + urlencode(params) if params else "" + url = self.update_endpoint + qsa + + # Make authenticated request + response = self._request( + method="POST", + url=url, + headers=headers, + body=query.encode('utf-8') + ) + + # Handle HTTP errors with improved reporting + if not response.ok: + error_msg = f"Neptune SPARQL update failed\n" + error_msg += f"Status: {response.status_code}\n" + error_msg += f"URL: {url}\n" + error_msg += f"Update: {query[:200]}...\n" if len(query) > 200 else f"Update: {query}\n" + error_msg += f"Response: {response.text[:500]}" + logger.error(error_msg) + raise IOError(error_msg) + + def _update(self, update): + """ + Override SPARQLUpdateStore._update to use authenticated requests. + + This method increments the update counter and calls the connector-level update. + """ + self._updates += 1 + self._connector_update(update) + + def raw_sparql_request(self, method, params=None, data=None, headers=None): + """ + Make a raw authenticated SPARQL endpoint request. + + This method is used by the SPARQL blueprint to proxy requests with authentication. + It handles both GET and POST requests with proper AWS SigV4 signing. + + Args: + method (str): HTTP method ('GET' or 'POST') + params (dict): Query parameters + data: Request body data + headers (dict): Request headers + + Returns: + Response object from requests library + """ + from urllib.parse import urlencode + + # Use query_endpoint for all SPARQL requests + url = self.query_endpoint + + # Build URL with parameters if any + if params: + qsa = "?" + urlencode(params) + url = url + qsa + + # Prepare headers + request_headers = dict(headers) if headers else {} + + # Make authenticated request + try: + response = self._request( + method=method.upper(), + url=url, + headers=request_headers, + body=data + ) + return response + except Exception as e: + error_msg = f"Neptune raw SPARQL request failed\n" + error_msg += f"Method: {method}\n" + error_msg += f"URL: {url}\n" + error_msg += f"Error: {str(e)}" + logger.error(error_msg) + raise diff --git a/whyis/plugins/neptune/plugin.py b/whyis/plugins/neptune/plugin.py new file mode 100644 index 000000000..1045a5392 --- /dev/null +++ b/whyis/plugins/neptune/plugin.py @@ -0,0 +1,589 @@ +from whyis.plugin import Plugin, EntityResolverListener +from whyis.namespace import NS +import rdflib +from flask import current_app +from flask_pluginengine import PluginBlueprint, current_plugin +from rdflib import URIRef +from rdflib.graph import ConjunctiveGraph +import requests +import logging +import os +import uuid +from aws_requests_auth.aws_auth import AWSRequestsAuth + +logger = logging.getLogger(__name__) + + +prefixes = dict( + skos = rdflib.URIRef("http://www.w3.org/2004/02/skos/core#"), + foaf = rdflib.URIRef("http://xmlns.com/foaf/0.1/"), + text = rdflib.URIRef("http://jena.apache.org/fulltext#"), + schema = rdflib.URIRef("http://schema.org/"), + owl = rdflib.OWL, + rdfs = rdflib.RDFS, + rdf = rdflib.RDF, + dc = rdflib.URIRef("http://purl.org/dc/terms/"), + fts = rdflib.URIRef('http://aws.amazon.com/neptune/vocab/v01/services/fts#') +) + +class NeptuneEntityResolver(EntityResolverListener): + + context_query=""" + optional { + (?context ?cr) text:search ('''%s''' 100 0.4). + ?node ?p ?context. + } +""" + type_query = """ +?node rdf:type <%s> . +""" + + query = """ +select distinct +?node +?label +(group_concat(distinct ?type; separator="||") as ?types) +(0.9 as ?score) +where { + SERVICE { + "%s" . + "%s" . + "match" . + dc:title . + rdfs:label . + skos:prefLabel . + skos:altLabel . + foaf:name . + dc:identifier . + schema:name . + skos:notation . + ?node . + } + + optional { + ?node rdf:type ?type. + } + + %s + + filter not exists { + ?node a + } + filter not exists { + ?node a + } + filter not exists { + ?node a + } + filter not exists { + ?node a + } + filter not exists { + ?node a + } +} group by ?node ?label limit 10""" + + def __init__(self, database="knowledge"): + self.database = database + + def _escape_sparql_string(self, s): + """ + Escape a string for safe inclusion in a SPARQL query. + + This prevents SPARQL injection by escaping special characters. + """ + if s is None: + return "" + # Escape backslashes first, then quotes, then newlines/returns + s = str(s).replace('\\', '\\\\') + s = s.replace('"', '\\"') + s = s.replace('\n', '\\n') + s = s.replace('\r', '\\r') + return s + + def on_resolve(self, term, type=None, context=None, label=True): + logger.info(f'Searching {self.database} for {term}') + graph = current_app.databases[self.database] + fts_endpoint = current_app.config['NEPTUNE_FTS_ENDPOINT'] + #context_query = '' + + # Safely escape the search term for inclusion in SPARQL query + escaped_term = self._escape_sparql_string(term) + escaped_endpoint = self._escape_sparql_string(fts_endpoint) + + type_query = '' + if type is not None: + # Escape the type URI to prevent SPARQL injection + escaped_type = self._escape_sparql_string(type) + type_query = self.type_query % escaped_type + + query = self.query % (escaped_term, escaped_endpoint, type_query) + + results = [] + for hit in graph.query(query, initNs=prefixes): + result = hit.asdict() + result['types'] = [{'uri':x} for x in result.get('types','').split('||')] + if label: + current_app.labelize(result,'node','preflabel') + result['types'] = [ + current_app.labelize(x,'uri','label') + for x in result['types'] + ] + results.append(result) + return results + +plugin_blueprint = PluginBlueprint('neptune', __name__) + + +def neptune_driver(config): + """ + Create an AWS Neptune SPARQL-based RDF graph store with IAM authentication. + + Uses NeptuneBoto3Store with boto3 for credential management and AWS SigV4 auth. + + Configuration options (via Flask config with prefix like KNOWLEDGE_ or ADMIN_): + - _endpoint: Neptune SPARQL query/update endpoint (required) + - _gsp_endpoint: Graph Store Protocol endpoint (optional, defaults to _endpoint) + - _region: AWS region where Neptune instance is located (required) + - _service_name: AWS service name for signing (optional, default: 'neptune-db') + - _default_graph: Default graph URI (optional) + - _use_temp_graph: Use temporary UUID graphs for GSP operations (optional, default: True) + When True, publish/put/post operations use a temporary UUID-based graph URI + to ensure graph-aware semantics instead of using the default graph. + - _use_instance_metadata: Use EC2 instance metadata for credentials (optional, default: True) + + Example configuration in system.conf: + KNOWLEDGE_ENDPOINT = 'https://my-neptune.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/sparql' + KNOWLEDGE_REGION = 'us-east-1' + KNOWLEDGE_GSP_ENDPOINT = 'https://my-neptune.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/data' + KNOWLEDGE_USE_TEMP_GRAPH = True # Default, ensures graph-aware semantics + + Authentication: + Uses boto3 for AWS credential discovery (environment variables, IAM roles, + instance metadata, etc.). All requests are signed with SigV4, including + full text search queries. + """ + from whyis.database.database_utils import node_to_sparql + from whyis.plugins.neptune.neptune_boto3_store import NeptuneBoto3Store + + defaultgraph = None + if "_default_graph" in config: + defaultgraph = URIRef(config["_default_graph"]) + + # Get AWS region (required for Neptune) + region_name = config.get("_region") + if not region_name: + raise ValueError("Neptune driver requires '_region' configuration parameter") + + + service_name = config.get("_service_name", "neptune-db") + endpoint_url = config["_endpoint"] + print(f"creating neptune driver against {endpoint_url}") + + # Get configuration options + use_temp_graph = config.get("_use_temp_graph", True) + use_instance_metadata = config.get("_use_instance_metadata", True) + + # Create store with NeptuneBoto3Store (uses boto3 for authentication) + store = NeptuneBoto3Store( + query_endpoint=endpoint_url, + update_endpoint=endpoint_url, + region_name=region_name, + service_name=service_name, + use_instance_metadata=use_instance_metadata, + method="POST", + returnFormat='json', + node_to_sparql=node_to_sparql + ) + + # Set GSP endpoint + store.gsp_endpoint = config.get("_gsp_endpoint", endpoint_url) + + # Add GSP protocol methods with boto3 authentication + store = _add_gsp_methods_to_boto3_store(store, use_temp_graph=use_temp_graph) + + graph = ConjunctiveGraph(store, defaultgraph) + return graph + +def _add_gsp_methods_to_boto3_store(store, use_temp_graph=True): + """ + Add Graph Store Protocol (GSP) operations to a NeptuneBoto3Store. + + This adds authenticated GSP methods (publish, put, post, delete) to the store + using the store's built-in boto3 authentication via _request method. + + When use_temp_graph is True (default), publish/put/post operations use a + temporary UUID-based graph URI to ensure graph-aware semantics. This prevents + triples from being inserted into an explicit default graph and instead maintains + the graph structure from the RDF data (e.g., TriG format). + + Args: + store: A NeptuneBoto3Store object with gsp_endpoint attribute and _request method + use_temp_graph: If True, use temporary UUID graphs for GSP operations (default: True) + + Returns: + The store object with GSP methods attached + """ + + def publish(data, format='text/trig;charset=utf-8'): + kwargs = dict( + headers={'Content-Type': format}, + ) + + if use_temp_graph: + # Generate a temporary UUID-based graph URI + temp_graph_uri = f"urn:uuid:{uuid.uuid4()}" + + # POST to the temporary graph using authenticated request + params = dict(graph=temp_graph_uri) + url = f"{store.gsp_endpoint}?graph={temp_graph_uri}" + + try: + r = store._request( + method='POST', + url=url, + headers=kwargs['headers'], + body=data + ) + + # Always delete the temporary graph to clean up, even if POST failed + delete_url = f"{store.gsp_endpoint}?graph={temp_graph_uri}" + delete_r = store._request( + method='DELETE', + url=delete_url, + headers={} + ) + + if not delete_r.ok: + logger.warning(f"Warning: Failed to delete temporary graph {temp_graph_uri}: {delete_r.status_code}:\n{delete_r.text}") + + # Log error if POST failed + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} publish returned status {r.status_code}:\n{r.text}") + except Exception as e: + logger.error(f"Error in publish: {e}") + else: + # Legacy behavior: POST without graph parameter + try: + r = store._request( + method='POST', + url=store.gsp_endpoint, + headers=kwargs['headers'], + body=data + ) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} publish returned status {r.status_code}:\n{r.text}") + except Exception as e: + logger.error(f"Error in publish: {e}") + + def put(graph): + g = ConjunctiveGraph(store=graph.store) + data = g.serialize(format='turtle') + + kwargs = dict( + headers={'Content-Type': 'text/turtle;charset=utf-8'}, + ) + + if use_temp_graph: + # Generate a temporary UUID-based graph URI + temp_graph_uri = f"urn:uuid:{uuid.uuid4()}" + + # PUT to the temporary graph using authenticated request + url = f"{store.gsp_endpoint}?graph={temp_graph_uri}" + + try: + r = store._request( + method='PUT', + url=url, + headers=kwargs['headers'], + body=data.encode('utf-8') if isinstance(data, str) else data + ) + + # Always delete the temporary graph to clean up + delete_url = f"{store.gsp_endpoint}?graph={temp_graph_uri}" + delete_r = store._request( + method='DELETE', + url=delete_url, + headers={} + ) + + if not delete_r.ok: + logger.warning(f"Warning: Failed to delete temporary graph {temp_graph_uri}: {delete_r.status_code}:\n{delete_r.text}") + + # Log result + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} PUT returned status {r.status_code}:\n{r.text}") + else: + logger.debug(f"{r.text} {r.status_code}") + except Exception as e: + logger.error(f"Error in put: {e}") + else: + # Legacy behavior: PUT with specified graph identifier + url = f"{store.gsp_endpoint}?graph={graph.identifier}" + try: + r = store._request( + method='PUT', + url=url, + headers=kwargs['headers'], + body=data.encode('utf-8') if isinstance(data, str) else data + ) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} PUT returned status {r.status_code}:\n{r.text}") + else: + logger.debug(f"{r.text} {r.status_code}") + except Exception as e: + logger.error(f"Error in put: {e}") + + def post(graph): + g = ConjunctiveGraph(store=graph.store) + data = g.serialize(format='trig') + + kwargs = dict( + headers={'Content-Type': 'text/trig;charset=utf-8'}, + ) + + if use_temp_graph: + # Generate a temporary UUID-based graph URI + temp_graph_uri = f"urn:uuid:{uuid.uuid4()}" + + # POST to the temporary graph using authenticated request + url = f"{store.gsp_endpoint}?graph={temp_graph_uri}" + + try: + r = store._request( + method='POST', + url=url, + headers=kwargs['headers'], + body=data.encode('utf-8') if isinstance(data, str) else data + ) + + # Always delete the temporary graph to clean up + delete_url = f"{store.gsp_endpoint}?graph={temp_graph_uri}" + delete_r = store._request( + method='DELETE', + url=delete_url, + headers={} + ) + + if not delete_r.ok: + logger.warning(f"Warning: Failed to delete temporary graph {temp_graph_uri}: {delete_r.status_code}:\n{delete_r.text}") + + # Log error if POST failed + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} POST returned status {r.status_code}:\n{r.text}") + except Exception as e: + logger.error(f"Error in post: {e}") + else: + # Legacy behavior: POST without graph parameter + try: + r = store._request( + method='POST', + url=store.gsp_endpoint, + headers=kwargs['headers'], + body=data.encode('utf-8') if isinstance(data, str) else data + ) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} POST returned status {r.status_code}:\n{r.text}") + except Exception as e: + logger.error(f"Error in post: {e}") + + def delete(c): + url = f"{store.gsp_endpoint}?graph={c}" + try: + r = store._request( + method='DELETE', + url=url, + headers={} + ) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} DELETE returned status {r.status_code}:\n{r.text}") + except Exception as e: + logger.error(f"Error in delete: {e}") + + store.publish = publish + store.put = put + store.post = post + store.delete = delete + + return store + + +def _remote_sparql_store_protocol_with_aws(store, aws_auth, use_temp_graph=True): + """ + Add Graph Store Protocol (GSP) operations with AWS authentication. + + DEPRECATED: This function is kept for backward compatibility with the old + aws_requests_auth approach. New code should use NeptuneBoto3Store with + _add_gsp_methods_to_boto3_store instead. + + This is similar to _remote_sparql_store_protocol but uses AWS SigV4 auth + instead of basic auth. + + When use_temp_graph is True (default), publish/put/post operations use a + temporary UUID-based graph URI to ensure graph-aware semantics. This prevents + triples from being inserted into an explicit default graph and instead maintains + the graph structure from the RDF data (e.g., TriG format). + + Args: + store: A SPARQL store object with gsp_endpoint attribute + aws_auth: AWSRequestsAuth object for request signing + use_temp_graph: If True, use temporary UUID graphs for GSP operations (default: True) + + Returns: + The store object with GSP methods attached + """ + # Create a reusable session with AWS auth for all GSP operations + session = requests.Session() + session.auth = aws_auth + session.keep_alive = False + + def publish(data, format='text/trig;charset=utf-8'): + kwargs = dict( + headers={'Content-Type': format}, + ) + + if use_temp_graph: + # Generate a temporary UUID-based graph URI + temp_graph_uri = f"urn:uuid:{uuid.uuid4()}" + + # POST to the temporary graph + r = session.post(store.gsp_endpoint, + params=dict(graph=temp_graph_uri), + data=data, + **kwargs) + + # Always delete the temporary graph to clean up, even if POST failed + delete_r = session.delete(store.gsp_endpoint, + params=dict(graph=temp_graph_uri)) + if not delete_r.ok: + logger.warning(f"Warning: Failed to delete temporary graph {temp_graph_uri}: {delete_r.status_code}:\n{delete_r.text}") + + # Log error if POST failed + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} publish returned status {r.status_code}:\n{r.text}") + else: + # Legacy behavior: POST without graph parameter + r = session.post(store.gsp_endpoint, data=data, **kwargs) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} publish returned status {r.status_code}:\n{r.text}") + + def put(graph): + g = ConjunctiveGraph(store=graph.store) + data = g.serialize(format='turtle') + + kwargs = dict( + headers={'Content-Type': 'text/turtle;charset=utf-8'}, + ) + + if use_temp_graph: + # Generate a temporary UUID-based graph URI + temp_graph_uri = f"urn:uuid:{uuid.uuid4()}" + + # PUT to the temporary graph + r = session.put(store.gsp_endpoint, + params=dict(graph=temp_graph_uri), + data=data, + **kwargs) + + # Always delete the temporary graph to clean up, even if PUT failed + delete_r = session.delete(store.gsp_endpoint, + params=dict(graph=temp_graph_uri)) + if not delete_r.ok: + logger.warning(f"Warning: Failed to delete temporary graph {temp_graph_uri}: {delete_r.status_code}:\n{delete_r.text}") + + # Log result + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} PUT returned status {r.status_code}:\n{r.text}") + else: + logger.debug(f"{r.text} {r.status_code}") + else: + # Legacy behavior: PUT with specified graph identifier + r = session.put(store.gsp_endpoint, + params=dict(graph=graph.identifier), + data=data, + **kwargs) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} PUT returned status {r.status_code}:\n{r.text}") + else: + logger.debug(f"{r.text} {r.status_code}") + + def post(graph): + g = ConjunctiveGraph(store=graph.store) + data = g.serialize(format='trig') + + kwargs = dict( + headers={'Content-Type': 'text/trig;charset=utf-8'}, + ) + + if use_temp_graph: + # Generate a temporary UUID-based graph URI + temp_graph_uri = f"urn:uuid:{uuid.uuid4()}" + + # POST to the temporary graph + r = session.post(store.gsp_endpoint, + params=dict(graph=temp_graph_uri), + data=data, + **kwargs) + + # Always delete the temporary graph to clean up, even if POST failed + delete_r = session.delete(store.gsp_endpoint, + params=dict(graph=temp_graph_uri)) + if not delete_r.ok: + logger.warning(f"Warning: Failed to delete temporary graph {temp_graph_uri}: {delete_r.status_code}:\n{delete_r.text}") + + # Log error if POST failed + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} POST returned status {r.status_code}:\n{r.text}") + else: + # Legacy behavior: POST without graph parameter + r = session.post(store.gsp_endpoint, data=data, **kwargs) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} POST returned status {r.status_code}:\n{r.text}") + + def delete(c): + kwargs = dict() + r = session.delete(store.gsp_endpoint, + params=dict(graph=c), + **kwargs) + if not r.ok: + logger.error(f"Error: {store.gsp_endpoint} DELETE returned status {r.status_code}:\n{r.text}") + + store.publish = publish + store.put = put + store.post = post + store.delete = delete + + return store + + +class NeptuneSearchPlugin(Plugin): + + resolvers = { + "neptune" : NeptuneEntityResolver + } + + def create_blueprint(self): + return plugin_blueprint + + def init(self): + """ + Initialize the Neptune plugin. + + This registers the Neptune database driver and entity resolver. + """ + # Import and register the Neptune driver + from whyis.database.database_utils import driver, drivers + + print('initializing neptune plugin') + # Register the Neptune driver + if 'neptune' not in drivers: + drivers['neptune'] = neptune_driver + + # Set up namespace + NS.fts = rdflib.Namespace('http://aws.amazon.com/neptune/vocab/v01/services/fts#') + + # Set up entity resolver + resolver_type = self.app.config.get('RESOLVER_TYPE', 'neptune') + resolver_db = self.app.config.get('RESOLVER_DB', "knowledge") + resolver = self.resolvers[resolver_type](resolver_db) + self.app.add_listener(resolver) diff --git a/whyis/plugins/neptune/templates/search.json b/whyis/plugins/neptune/templates/search.json new file mode 100644 index 000000000..6164410f5 --- /dev/null +++ b/whyis/plugins/neptune/templates/search.json @@ -0,0 +1,20 @@ +{{" + SELECT ?identifier (sample(?d) as ?description) (0.9 as ?score) + WHERE { + + SERVICE fts:search { + fts:config fts:query '''"+args['query']+"''' . + fts:config fts:endpoint '"+app.config.get('NEPTUNE_FTS_ENDPOINT')+"' . + fts:config fts:queryType 'match' . + fts:config fts:field '*' . + fts:config fts:return ?identifier . + } + + ?identifier ?p ?o . + filter(!isBlank(?identifier)) + OPTIONAL { + ?identifier dc:description|skos:definition|rdfs:comment|sioc:content|dc:abstract|dc:summary|rdfs:comment|dcelements:description||prov:value|sio:hasValue| ?d. + filter(lang(?d) = '' || langMatches(lang(?d), 'en')) + } + } group by ?identifier + LIMIT 1000" | query | iter_labelize("identifier","label") | tojson }} diff --git a/whyis/plugins/neptune/vocab.ttl b/whyis/plugins/neptune/vocab.ttl new file mode 100644 index 000000000..13664b8b1 --- /dev/null +++ b/whyis/plugins/neptune/vocab.ttl @@ -0,0 +1,3 @@ +@prefix whyis: . + +whyis:HomePage whyis:searchData "whyis_neptune:search.json". diff --git a/whyis/static/js/whyis_vue/components/album.vue b/whyis/static/js/whyis_vue/components/album.vue index 556c22ce0..8272c63cb 100644 --- a/whyis/static/js/whyis_vue/components/album.vue +++ b/whyis/static/js/whyis_vue/components/album.vue @@ -1,9 +1,9 @@