Skip to content

vs using sitemap.xml fork branch #389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ script prior to upgrading to minimize the downtime.
- In the docker container the folder /home/clowder/data is now whitelisted by default for uploading by reference.
This can be changed using the environment variable CLOWDER_SOURCEPATH.
- The current CLA for developers of clowder.
- sitemap.xml route to list dataset pages so they can be crawled for thier embedded jsonld, for google dataset search

### Fixed
- Send email to all admins in a single email when a user submits 'Request access' for a space
Expand Down
56 changes: 55 additions & 1 deletion app/controllers/Application.scala
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ import scala.collection.mutable.ListBuffer
@Singleton
class Application @Inject()(files: FileService, collections: CollectionService, datasets: DatasetService,
spaces: SpaceService, events: EventService, comments: CommentService,
sections: SectionService, users: UserService, selections: SelectionService) extends SecuredController {
sections: SectionService, users: UserService, selections: SelectionService,
tree: TreeService) extends SecuredController {
/**
* Redirect any url's that have a trailing /
*
Expand Down Expand Up @@ -84,6 +85,59 @@ class Application @Inject()(files: FileService, collections: CollectionService,
}
}

/**
* Returns the sitemap.xml for the datasets to be scraped for their jsonld scripts
* suggested to start like w/swagger route, but if I don't cache it, then I should change this
* otherwise it will need a filler file there; which I should provide as a cache
*/
import play.api.libs.json._ //put at top
import api.Permission.Permission //put at top
import models.User

def sitemap = Action { implicit request =>
Play.resource("/public/sitemap.xml") match { //in case we cache it here someday
case Some(resource) => {
val https = Utils.https(request)
val clowderurl = new URL(Utils.baseUrl(request))
val host = if (clowderurl.getPort == -1) {
clowderurl.getHost
} else {
clowderurl.getHost + ":" + clowderurl.getPort
}
val user = User.anonymous //not found: value User
//val dd=tree.getDatasets(true,user) //not owned by anon
val dd = tree.getDatasets(false,user)
var resultStr=""
val top= """<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> """
resultStr = resultStr.concat(top)
var uStr = ""
dd.foreach( dd_ => {
val dd_id = (dd_ \ "id").as[String]
uStr = "\n<url><loc>" + clowderurl + "/datasets/" + dd_id + "</loc></url>"
resultStr = resultStr.concat(uStr)
})
//was from route
//val d = scala.io.Source.fromURL(clowderurl + "/api/datasets")
//val sd = d.mkString
//val parsedJson = Json.parse(sd)
//val idl = (parsedJson \\ "id")
//idl.foreach( id => {
// val id_ = id.as[String]
// uStr = "\n<url><loc>" + clowderurl + "/datasets/" + id_ + "</loc></url>"
// resultStr = resultStr.concat(uStr)
//})
//will rm above once getstatsets
resultStr = resultStr + "\n</urlset>"
//could cache, in case we want to reuse later, w/Ok(reult.mkString)
//_would again check cache before creating, but still problems w/:
//might skip as would have to recheck permissions as well
Ok(resultStr.mkString)
}
case None => NotFound("Could not find sitemap.xml")
}
}

/**
* Main page.
*/
Expand Down
6 changes: 6 additions & 0 deletions conf/routes
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,12 @@ GET /javascriptRoutes
# ----------------------------------------------------------------------
GET /swagger @controllers.Application.swagger
GET /swaggerUI @controllers.Application.swaggerUI

# ----------------------------------------------------------------------
# SITEMAP
# ----------------------------------------------------------------------
GET /sitemap.xml @controllers.Application.sitemap
GET /sitemap @controllers.Application.sitemap

# ----------------------------------------------------------------------
# RESTful API
Expand Down
4 changes: 4 additions & 0 deletions conf/sitemap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
=placeholder right now:
Route setup to read from this cached file, so expects it
even though the caching hasn't been done yet
and right now it is returning it directly
6 changes: 6 additions & 0 deletions public/sitemap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
filler that will be replaced with cached sitemap
though that idea might be on hold if we have to
worry about who has access to this sitemap
as clowder v1 only has public and hidden
while v2 might get a private setting where
you can see that it is there but not download it w/o auth