Skip to content

Custom Domains for Territories #1958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 90 commits into
base: master
Choose a base branch
from

Conversation

Soxasora
Copy link
Member

@Soxasora Soxasora commented Mar 9, 2025

Description

Closes #1942
Introduces custom domains for territories, featuring one-click sign-in and customizable branding

Additional Context

Several parts are at the core of custom domains functionality.

Middleware

We use the main middleware to handle redirects and rewrites for custom domains.
As usual we run referrerMiddleware to get referral cookies but we pass it on to custom domains

To recognize an allowed custom domain we check req.headers.host and verify it against a cached /api/domains endpoint, if there's a match we'll proceed to the next step: customDomainMiddleware with the referral cookies

This middleware handles various paths differently

/login /signup

The custom domain, callbackUrl, multiAuth flag, gets applied to the path to be later used.

/ or TERRITORY_PATHS

As we got redirected to a path without /~subName we still need to know what territory are we on for paths that requires it, so this step applies a hidden rewrite to /~subName

/~subName

To minimize changes to the codebase, every path that makes us of /~subName gets redirected to a path without it, applying searchParams and running then the / rewrite we just talked about.

Remember the referral cookies we passed at the start? For every redirect or rewrite, we apply those referral cookies, maintaining the usual behavior.
And we apply security headers, too.

Auth Sync

To provide a seamless experience between custom domains and SN, all auth actions gets handled by stacker.news.
During the middleware stage we recognize if we're going to a login or signup page and pass our custom domain.
We can then build a callbackUrl to /api/auth/sync containing the custom domain as final callback.

/api/auth/sync

If the login is successful or we're already logged in, we create a verification token that lasts a minute for the user they want to login with.
We then redirect back to the custom domain we passed via /login /signup, with the token and the callback using the ?type=sync parameter.

a TEST DomainProvider will catch this parameter and will start a next auth signIn
A CredentialsProvider called Sync will verify this token and provide the user object if everything goes smoothly.
We're in!

Setup and Verification UI/UX

On edits, a new collapsed section called advanced pops-up
Here we can submit our custom domain
image

Upon submitting, DNS verification starts automatically and informs the user of the steps they need to follow
image

Entering this part of the page or submitting a domain starts a poll that stops whenever the domain is fully VERIFIED (dnsState and sslState).

And on verified DNS we trigger SSL verification
image

We're all set!
image

DNS and SSL verification

We saw how it appears on the user side, let's dive in the worker that's behind these verifications

NOTE: The verification process has changed from routine worker to a boss.send based approach

verifyDomainDNS

We check both TXT and CNAME records that we asked the user to set, specifically if the domain they provided has a CNAME that points at stacker.news and a TXT with our randomly generated string

Adding to this verification, we use a DNS Resolver set from the ENV or we fallback to CloudFlare's 1.1.1.1, for consistency.

If both records are set correctly, assuming everything goes smoothly, we set dnsState to VERIFIED and go onto the next step.

Issue and verify SSL certificates

If dnsState is VERIFIED, we can start our SSL issuance and verification process.
We contact ACM to issue a domain certificate for the domain the user has provided, eg. forum.pizza.com

If we got a certificateArn from ACM then we can already check its status to see if we got the validation values with it. If so, let's put them in our database so the user can set point a CNAME record to ACM's validation values.

Since we need to wait for the user to complete this step, we continue to check the sslState by contacting ACM.

If ACM declares the domain as valid we can then set sslState to VERIFIED and the user will be notified.
We are all set!

Custom Branding

On full positive domain verification a new section pops up: Custom Branding
This sub-feature gives the possibility to choose colors, logo, text and description of the community, further distinguishing a community powered by SN from SN itself.

TODO: better styling
image

It works by saving these settings to the CustomBranding model which is connected to the Sub model instead of CustomDomain to make it independent from any domain change.

TODO: better way

At the moment, colors are applied via a BrandingProvider that fetches them during SSR.

Navigation and UI

Custom domains won't have the territory picker or informations about the territory they're in, leaving these to SN.
Subject to change

Default branding:
image

This is handled by the DomainProvider, a context that recognizes a custom domain and applies the related subName via SSR and client-side, also gets the changed styles via an SSR fetch to the customBranding model, using the subName that we obtained before.

image

SEO and colors are applied dynamically, overwriting SN's style with the territory owner's choices.

This is a title bars comparison, the difference here is that we don't show the territory name and stacker news, instead we just show the title chosen by the territory owner.
image

TBD

Checklist

Are your changes backwards compatible? Please answer below:
Custom Domains are bolted on top of SN codebase, it shouldn't interfere, at all, with stacker.news usual behavior

On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
5, iterative testing

For frontend changes: Tested on mobile, light and dark mode? Please answer below:
Yes, remains the SCSS style application that has a slight delay

Did you introduce any new environment variables? If so, call them out explicitly here:
DNS_RESOLVER -> with a fallback to 1.1.1.1 in worker/domainVerification.js
LOCALSTACK_ENDPOINT -> for test atm
DOMAIN_DEBUG -> for custom domain logger, for test atm

MVP Done Tasks

APIs

  • LocalStack ACM (issue certificates for custom domains)

SSL

  • Show validation CNAME to the user
  • Issue SSL certificates via ACM API

Custom branding

  • Color picker
  • Logo upload
  • Form
  • Dynamic branding, custom colors
  • Branding context
  • Branding db model

Territory Edit

  • Poll every 30 seconds for validation status
  • Info with SSL validation CNAME to be set // depends on ACM
  • Info with TXT records to be set
  • every 5 minutes DNS/SSL verification and validation via node:dns and worker
  • Popover respectively for DNS and SSL lastVerified
  • WIP save custom domain in DB via Territory Edit // wip UI
  • show values from customDomain

Domains

  • delete domain after X failed verifications
  • dnsState and sslState to track domain's functionality
  • cached API endpoint for custom domains map
  • custom domain table

Navigation and UI

  • fix layout shifts
  • better isCustomDomain conditional UI
  • fix sorts not being highlighted due to non-expected url // not clean

Middleware

  • consider Referral Cookies, do we want them? // not clean
  • better management of territory paths
  • WIP redirect and rewrite of territories

Auth Sync

  • MultiAuth handling
  • WIP one click login if visiting a custom domain // todo: give an appropriate location for verification
  • switch to verification request
  • redirect to custom domain with main domain's cookies and save them
  • sync auth only if coming from main domain

Take care of

  • check cookies validity across domains
  • check security risks in auth sync
  • custom domains SEO
  • Site title
  • Favicon
  • hard coded 'stacker.news'
  • use boss.send on domain submission to instant verify
  • going back causes isCustomDomain to be lost

After MVP TODOs

Wallet Sync

  • Autosync?
  • Device Sync as a placeholder

APIs

  • LocalStack ELBv2 (Load Balancer)
  • LocalStack CloudFront CDN) 

More

  • Consider push notifications being possibly duplicated
  • Consider hiding Footer for custom domains or redirect them to main domain
  • use constants
  • Total cleanup

Take care of

  • don't redirect to SN if not strictly necessary // WIP, searching for paths to be redirected
  • CORS for a.stacker.news and m.stacker.news
  • check for expired or transferred domains
  • test unusual DNS propagation times
  • Use Device Sync to sync wallets infos
  • prevent abuse on setCustomDomain
  • Webmanifest

Change requests progress

  • Middleware change requests
  • ACM certificate synergy with territory status
  • Better handling of ACM statuses
  • DNS/SSL verification UI/UX (statuses, copy values)
  • Handle TXT and CNAME records in another way because of RFC 1034
  • TEST Move 'advanced' dropdown before the territory edit save button
  • TEST Allow re-saving the same domain
  • TEST Don't change TXT records after HOLD status or on re-saving
  • Show clipboards regardless of hover
  • Show last verification regardless of hover
  • Use CopyInput for clipboards
  • Switch to a continue/abort system for domain verification, don't mutate data
  • TEST Don't use exponential backoff for domain verification, switch to 5-minute job for 48 hours
  • Remove log of domain ID on domain verification
  • Don't use dnsPromises.setServers() for DNS queries, use dnsPromises.Resolver
  • Fix localstack endpoint in env vars
  • WAIT Normalize domainVerification statuses with a dedicated schema
  • Custom Domains info popover

@Soxasora Soxasora marked this pull request as ready for review April 9, 2025 20:09
Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only looked at the middleware and the stuff it calls so far.

Makes sense to me what's going on (not much fortunately haha) but left some comments!

middleware.js Outdated
}

// If we're on a custom domain, handle that next
const host = request.headers.get('host')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check for X-Forwarded-Host here or is that not needed because it's not a reverse proxy that's going to call us but a browser because DNS will resolve to our IP address?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think that if we give instructions to point at our load balancer or CDN, X-Forwarded-Host may be needed to know the starting domain (which is what we need). For best practice, I think we should check X-Forwarded-Host and if that's not present (i.e. there's not a reverse proxy, browsers don't create this header) then we'll fallback to the Host header.

Copy link
Member Author

@Soxasora Soxasora Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I think we can go ahead with X-Forwarded-Host to check for reverse proxies and then fallback to Host if that header doesn't exist. I'll leave this comment unresolved to remind myself to do further checks on reverse proxies.

Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting review now since we'll meet in ~15 minutes

We can go over my comments in the meeting, you don't need to read them before the meeting

]
}

const certificate = await acm.requestCertificate(params).promise()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For how long are we requesting/issuing a certificate?

Certificates should expire when their territory expires or have to be renewed every year.

I tried to find how this is determined in the AWS docs but I couldn't find it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super question, I didn't think of that, the way I described it to you during the meeting got me confused so I'll explain better here:

ACM issues certificates with a fixed validity of 13 months or 395 days, to have some kind of control here we could send a DeleteCertificate request to ACM after the territory period of grace ends without successful renewal. I'll annotate this to implement it.

But! If we don't ever delete a certificate (good territory owner), after 13 months, ACM will automatically renew it so we don't have to do anything.

{verification?.ssl?.state === 'PENDING' && (
<>
<h5>Step 2: Prepare your domain for SSL</h5>
<p>We issued an SSL certificate for your domain. To validate it, add the following CNAME record:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state says PENDING but here you mention we already issued it. That's confusing. Do you mean with PENDING that we're waiting for the user or for the issuer?

I think if it's pending for the issuer, we shouldn't tell them to do anything if they can only wait.

If we're waiting for the user to do something, this might need another PENDING state.

I also don't know what "validating a SSL certificate" means. Do they have to validate it in some way if they don't trust the one we issued?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mapped ACM's statuses to something we can digest better. But it's clear that we might want to differentiate DNS and SSL states.

In the case of SSL validation, PENDING does mean PENDING_VALIDATION in ACM's language, and will stay in this status until the user has put the correct DNS validation records issued by ACM.

ACM statuses: PENDING_VALIDATION | ISSUED | INACTIVE | EXPIRED | VALIDATION_TIMED_OUT | REVOKED | FAILED

ACM DNS validation requires DNS records to be set to validate the domain before enabling a SSL certificate. if the domain is, e.g., forum.pizza.com we will prompt the user to add:

CNAME record: _a79865eb4cd1a6ab990a45779b4e0b96.forum.pizza.com.
Value: _424c7224e9b0146f9a8808af955727d0.acm-validations.aws.

^info gathered from ACM docs

I didn't handle that much of ACM edge cases because we don't have the ability to test with Localstack its behavior, so I left it for last

Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another partial review. Mostly UX stuff, but I am confused regarding CNAME+TXT.

According to the DNS spec, this won't work, see comment.

Soxasora and others added 2 commits April 23, 2025 08:40
Domain mappings:
- use direct db calls with cachedFetcher to map domains

Middleware:
- pass main domain, custom domain, subName to the customDomainMiddleware;
- retrieve main domain's hostname from env vars;
- only get subName from domain mappings cache (domain is the key);
- only check cache if we're not on the main domain; correct function declarations;
- light cleanup

Schema:
- use Citext for domain names as they're case-insensitive
Copy
- Uses CopyButton from Form to copy DNS record values

Domain Verification UI/UX
- Can trigger a re-verification by re-submitting the domain
- slight cleanup

Validation hints
= Better validation hint
Comment on lines 130 to 146
const dnsRecord = (host, value) => (
<div className='d-flex align-items-center gap-2'>
<span className={`${styles.record}`}>
<small className='fw-bold text-muted d-flex align-items-center gap-1 position-relative'>
host
<CopyButton
value={host}
append={
<ClipboardLine
className={`${styles.clipboard}`}
style={{ width: '1rem', height: '1rem' }}
/>
}
/>
</small>
<pre>{host}</pre>
</span>
Copy link
Member Author

@Soxasora Soxasora Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revamped the DomainGuidelines subcomponent to include copy buttons on every field (on hover) and a better layout

image

In my opinion it does seem better to look at and better to interact with ^^
In cleanup phase I would want to remove that extra space before <hr />

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the columns!

The clipboard on hover also looks cool but unfortunately, this won't be obvious on mobile that this is how you can copy since there's no cursor. In general, any feature that requires the hover state is limited to desktop. So hover should only be used for stuff that is just nice-to-have. A copy button is pretty important for UX though.

The same reasoning applies to DNS pending etc. I think the timestamp it shows is when the last check was made (this also needs to be clarified) which is pretty useful, but it's limited to desktop.

Why not use the existing CopyInput that we also use for the QR code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! I didn't think of mobile in this instance. I didn't want to bloat every field with a clipboard button, but by not doing so mobile can't actually use the feature. I'll try to find a way to make it permanent but less obtrusive!

You're also right on the last check popover, I'll find an appropriate place for it.

I didn't use CopyInput because of its big design that doesn't match the one for the various fields, but let's see if I can still use it by modifying it a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey it does still look cool 😎
image

Note: I applied w-100 to the Input field

</span>
)
}

Copy link
Member Author

@Soxasora Soxasora Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To include a copy button on DomainGuidelines I exported what we already have, adding an append prop that perhaps needs to be properly named in cleanup phase

edit: realized that GitHub crapped my selection, this references the CopyButton component in form.js

Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got all DNS and SSL verification to work!

But I couldn't test the custom domain because I think my DNS setup is weird:

$ dig snx.ekzy.is

; <<>> DiG 9.18.33 <<>> snx.ekzy.is
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35942
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;snx.ekzy.is.                   IN      A

;; ANSWER SECTION:
snx.ekzy.is.            3600    IN      CNAME   sn.ekzy.is.
sn.ekzy.is.             3600    IN      CNAME   ekzy.is.
ekzy.is.                3600    IN      A       172.105.95.124

;; Query time: 482 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Fri Apr 25 02:46:17 CEST 2025
;; MSG SIZE  rcvd: 87

https://sn.ekzy.is/ has the following nginx config on my server running at the given IP address:

nginx config
upstream sn {
  server 127.0.0.1:3333;
}

map $http_upgrade $connection_upgrade {
  default upgrade;
  ''      close;
}

server {
    server_name sn.ekzy.is;
    listen      80;
    listen      [::]:80;

    include letsencrypt.conf;

    return 301 https://sn.ekzy.is$request_uri;
}

server {
    server_name         sn.ekzy.is;
    listen              443;
    listen              [::]:443 ssl;

    ssl_certificate     /etc/letsencrypt/live/sn.ekzy.is/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/sn.ekzy.is/privkey.pem;

    client_header_timeout 30;
    client_body_timeout   30;
    keepalive_timeout     30;

    location / {
      proxy_set_header Host $http_host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-Host $host;
      proxy_read_timeout 30s;
      proxy_send_timeout 30s;
      proxy_pass http://sn$request_uri;
    }

    location /_next/webpack-hmr {
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;
      proxy_set_header Host $http_host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-Host $host;
      proxy_set_header X-Forwarded-Prefix /;
      proxy_pass http://sn$request_uri;
    }

    include letsencrypt.conf;
}

I then run ssh -fnNR 3333:localhost:3000 sn.ekzy.is to tunnel to SN running on my laptop.

But since I don't have a nginx config for snx.ekzy.is, it then shows me my default nginx site which is ekzy.is.

But I think it would work if sn.ekzy.is wouldn't have a CNAME record itself so I think we don't need to worry about this case.

Just need to figure out how I can test this further.

Requesting changes mostly because of the schema stuff.

Comment on lines +292 to +299
{sub && !customDomain &&
<div className='w-100'>
<AccordianItem
header={<div style={{ fontWeight: 'bold', fontSize: '92%' }}>advanced</div>}
body={<TerritoryDomains sub={sub} />}
/>
</div>}
{sub && customDomain && <Link className='text-muted w-100' href={`${process.env.NEXT_PUBLIC_URL}/~${sub.name}/edit`}>domain settings on stacker.news</Link>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-04-24-232904_745x881_scrot

I think everything should be above the 'save' button. I think you put this below the 'save' button because the custom domain has its own save button which makes this a non-trivial decision but I think it should still be above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it's weird to see a dropdown after the save button on territory edit, but I put it there because while the options dropdown will refer to that save button, the advanced dropdown won't.

It's a UI/UX dilemma, I don't necessarily think that people will confuse save buttons but at the same time there will be lots of save buttons. I think I have to see it before deciding, I'll try in a bit ^^

if (domain) {
const existing = await models.customDomain.findUnique({ where: { subName } })
if (existing && existing.domain === domain && existing.status !== 'HOLD') {
throw new GqlInputError('domain already set')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be an error for some reason? If I try to save the domain with the same value, I would expect it to just work™ but this throws.

Next to not sure if this has to be an error, the client only shows the generic error message "failed to update domain."

Copy link
Member Author

@Soxasora Soxasora Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a way to prevent users to trigger useless verifications that could lead to an offline state, changing TXT values etc.
It's possible to allow domain re-saving but this singlehandedly prevents mistakes!

About the error: I think I forgot to update the toasts after the latest commits, fixing it asap!

Comment on lines 130 to 146
const dnsRecord = (host, value) => (
<div className='d-flex align-items-center gap-2'>
<span className={`${styles.record}`}>
<small className='fw-bold text-muted d-flex align-items-center gap-1 position-relative'>
host
<CopyButton
value={host}
append={
<ClipboardLine
className={`${styles.clipboard}`}
style={{ width: '1rem', height: '1rem' }}
/>
}
/>
</small>
<pre>{host}</pre>
</span>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the columns!

The clipboard on hover also looks cool but unfortunately, this won't be obvious on mobile that this is how you can copy since there's no cursor. In general, any feature that requires the hover state is limited to desktop. So hover should only be used for stuff that is just nice-to-have. A copy button is pretty important for UX though.

The same reasoning applies to DNS pending etc. I think the timestamp it shows is when the last check was made (this also needs to be clarified) which is pretty useful, but it's limited to desktop.

Why not use the existing CopyInput that we also use for the QR code?

Comment on lines 72 to 73
// schedule domain verification in 5 seconds, apply exponential backoff and keep it for 2 days
// 12 retries, 42 seconds of delay between retries will fit 48 hours of trying for DNS propagation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this.

There are 12 retries and 42 seconds of delay between retries. How does this fit 48 hours? Does this depend on the exponential backoff?

I think it should be easier to know when this job will run so it's also predictable for the user.

Is it an expensive job? Can we not run it every 5 minutes with no backoff or something for 2 days?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retrycount, retrydelay, retrybackoff are all part of the pgboss' exponential backoff system.

By retrying 12 times, with a delay of 42 seconds each and with exponential backoff enabled, it basically does:
42 * 2^12 which equates to 172.032 seconds, roughly 48 hours.

The main reason why I did this is because we use exponential backoff on other jobs too and wanted to give some kind of continuity with the rest of the code.
But another reason was the fact that I was afraid of running an expensive job by resolving records for domains that could also not be properly set at all.

I mean, I don't think it's actually an expensive job, before I was checking EVERY domain at the same time every 5 minutes, which was undoubtedly worse.

I think that a mix of both can be good: still pgboss job scheduling at domain upsert but check it individually every 5 minutes for a maximum of 48 hours, if it fails we will mark it as HOLD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to a domain verification job triggered at domain update, that renews every time the domain is still in PENDING state. If during the verification process we're still in PENDING and 48 hours has passed since the domain has been created, it puts the domain in HOLD and stops further verification processes.

status String @default("PENDING")
failedAttempts Int @default(0)
lastVerifiedAt DateTime?
verification Json @default("{}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of status, failedAttempts, lastVerifiedAt by having another table that stores each verification attempt that links to a domain via foreign keys and then resolving these columns via JOINS.

This way, we also know what happened during each verification attempt and not just the latest one.

I hope it won't require too many changes to do this but I think it's pretty important to keep our schema as normalized as possible. @huumn is currently working on fixing existing issues like this in our current schema in #2084.

Before you do this though, we should consult @huumn to confirm the schema I described is the correct one so you don't have to change it again after his review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I kinda forgot about this schema after a long time and It shouldn't require lots of changes at all to normalize it, I'll wait before implementing it then ^^

@@ -177,6 +177,7 @@ AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
PERSISTENCE=1
SKIP_SSL_CERT_DOWNLOAD=1
LOCALSTACK_ENDPOINT=http://localhost:4566
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be this:

Suggested change
LOCALSTACK_ENDPOINT=http://localhost:4566
LOCALSTACK_ENDPOINT=http://aws:4566

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought I pushed it during a cleanup commit, so sorry, it remained in local.
I'll push it with the next commit!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I deleted my comment because I wanted to make it part of my review lol

Oh, I thought I pushed it during a cleanup commit, so sorry, it remained in local.

Cool, no worries!

error: uses the error thrown by the domains resolver
hover: clipboards and verification time are shown by default instead of being hidden by hover
cleanup: clearer dnsRecord subcomponent
- job will be scheduled after 30 seconds and will be maintained for 48 hours
- step-by-step verification process (to be modularized)
- don't throw an error if it's not critical
- if verification process results in a still PENDING domain we schedule it again after 5 minutes
- if the status is still NOT ACTIVE (e.g. PENDING) we put the status on HOLD to avoid re-scheduling
- deletes certificates, if any, when a domain is in HOLD
- better initial domain structure
- also deletes related jobs on domain deletion
- fix ACM validation values gathering
- put domain in HOLD after 3 consecutive critical errors
- independent verification steps
- fix typo on domain update via domain verification
Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed 4bfda81 because I didnt' realize that git fetch didn't fetch the new commits like 5130057. I assume that's the case because they have dates in the future:

2025-04-30-003238_1422x526_scrot

Please make sure that your commits are not created in the future so git isn't confused.


I looked at all the code for authentication now.

If I understood the authentication flow for custom domains right, this is how it works:

  1. Go to /login on custom domain
  2. Middleware for custom domain redirects us to stacker.news/login and sets domain query parameter

// Auth sync redirects with domain and optional callbackUrl and multiAuth params
if (pathname === '/login' || pathname === '/signup') {
const redirectUrl = new URL(pathname, mainDomain)
redirectUrl.searchParams.set('domain', domain)
if (query.get('callbackUrl')) {
redirectUrl.searchParams.set('callbackUrl', query.get('callbackUrl'))
}
if (query.get('multiAuth')) {
redirectUrl.searchParams.set('multiAuth', query.get('multiAuth'))
}
const redirectResp = NextResponse.redirect(redirectUrl, {
headers: requestHeaders
})
// apply referrer cookies to the redirect
return applyReferrerCookies(redirectResp, referrerResp)
}

  1. User authenticates themselves on stacker.news as usual
  2. When we have a session for stacker.news, we get redirected to /api/auth/sync with redirectUrl set to the custom domain inside callbackUrl

// If we're coming from a custom domain, set as callbackUrl the auth sync endpoint
if (domain) {
const params = new URLSearchParams()
params.set('redirectUrl', encodeURIComponent(callbackUrl))
if (multiAuth) { // take care of multiAuth if requested
params.set('multiAuth', multiAuth)
}
callbackUrl = '/api/auth/sync?' + params.toString()
}
if (session && callbackUrl && !multiAuth) {
// in the case of auth linking we want to pass the error back to settings
// in the case of multi auth, don't redirect if there is already a session
if (error) {
const url = new URL(callbackUrl, process.env.NEXT_PUBLIC_URL)
url.searchParams.set('error', error)
callbackUrl = url.pathname + url.search
}
return {
redirect: {
destination: callbackUrl,
permanent: false
}
}
}

  1. /api/auth/sync then creates a token in the db and redirects us back to the root of the custom domain (via redirectUrl) with type=sync and token as a query parameter

export default async function handler (req, res) {
const { redirectUrl, multiAuth } = req.query
if (!redirectUrl) {
return res.status(400).json({ status: 'ERROR', reason: 'missing redirectUrl parameter' })
}
// redirectUrl parse
const decodedRedirectUrl = decodeURIComponent(redirectUrl)
let customDomain
try {
customDomain = new URL(decodedRedirectUrl)
// not cached because we're handling sensitive data
const domain = await models.customDomain.findUnique({ where: { domain: customDomain.host, status: 'ACTIVE' } })
if (!domain) {
return res.status(400).json({ status: 'ERROR', reason: 'custom domain not found' })
}
} catch (error) {
return res.status(400).json({ status: 'ERROR', reason: 'invalid redirectUrl parameter' })
}
const mainDomain = process.env.NEXT_PUBLIC_MAIN_DOMAIN
// get the user's session
const session = await getServerSession(req, res, getAuthOptions(req, res))
if (!session) {
// redirect to the login page, middleware will handle the rest
return res.redirect(mainDomain + '/login?callbackUrl=' + encodeURIComponent(decodedRedirectUrl))
}
try {
const token = generateRandomString(32)
// create a sync token
await models.verificationToken.create({
data: {
identifier: `sync:${session.user.id}`,
token,
expires: new Date(Date.now() + 1 * 60 * 1000) // 1 minute
}
})
if (process.env.NODE_ENV === 'production') {
res.setHeader('Set-Cookie', [
'SameSite=Lax; Secure; HttpOnly'
])
}
// domain provider will handle this sync request
const customDomainCallback = new URL('/?type=sync', decodedRedirectUrl)
customDomainCallback.searchParams.set('token', token)
customDomainCallback.searchParams.set('callbackUrl', decodedRedirectUrl)
if (multiAuth) {
customDomainCallback.searchParams.set('multiAuth', multiAuth)
}
// TODO: security headers?
// redirect to the custom domain callback
return res.redirect(customDomainCallback.toString())
} catch (error) {
console.error('Error generating token:', error)
return res.status(500).json({ error: 'Failed to generate token' })
}
}

  1. This is then caught in a useEffect of DomainProvider and calls signIn:

// temporary auth sync
useEffect(() => {
if (router.query.type === 'sync') {
console.log('signing in with sync', router.query)
signIn('sync', {
token: router.query.token,
multiAuth: router.query.multiAuth,
redirect: false
})
router.push(router.query.callbackUrl) // next auth redirect only supports the main domain
}
}, [router.query.type])

  1. The provider for sync checks that the token exists and did not expire. If so, we return the user object which issues the cookies for the custom domain.

// Authentication Provider for syncing a user's session to a custom domain
const syncAuth = async (token, req, res) => {
try {
const verificationToken = await prisma.verificationToken.findUnique({ where: { token } })
if (!verificationToken) return null
// has to be a sync token
if (!verificationToken.identifier.startsWith('sync:')) return null
// sync has user id
const userId = parseInt(verificationToken.identifier.split(':')[1], 10)
if (!userId) return null
// delete the token to prevent reuse
await prisma.verificationToken.delete({
where: { id: verificationToken.id }
})
if (new Date() > verificationToken.expires) return null
return await prisma.user.findUnique({ where: { id: userId } })
} catch (error) {
console.error('auth sync error:', error)
return null
}
}

I wonder if we can simplify this flow if we never redirect to stacker.news but issue cookies for (only) the custom domain in step 3?

Since the browser sees the custom domain (and isn't aware that it's connecting to stacker.news), our backend should be able to set cookies for this custom domain.

Afaict, the only issue with that is how to tell that next-auth since it by default issues cookies for the domain set by NEXTAUTH_URL. Maybe we can configure the cookie domain via the advanced cookie options?

The three redirects until we call signIn makes this a little hard to follow and thus hard to be confident that there isn't anything to exploit in here like open redirects or worse, but so far, it looks good to me. Will take a closer look to find security issues tomorrow, but I understand now how the authentication works in detail.


update: I just realized that in step 7, next-auth already issues cookies for the custom domain without any additional configuration. Why are all these redirects needed?


// Auth sync redirects with domain and optional callbackUrl and multiAuth params
if (pathname === '/login' || pathname === '/signup') {
const redirectUrl = new URL(pathname, mainDomain)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this throws invalid URL because mainDomain is only the host

I fixed it to continue the review like this:

diff --git a/middleware.js b/middleware.js
index 8badffd9..a368c47f 100644
--- a/middleware.js
+++ b/middleware.js
@@ -231,8 +231,8 @@ export async function middleware (request) {
   // If we're on a custom domain, handle that next
   const domain = request.headers.get('x-forwarded-host') || request.headers.get('host')
   // get the main domain from the env vars
-  const mainDomain = new URL(process.env.NEXT_PUBLIC_URL).host
-  if (domain !== mainDomain) {
+  const mainDomain = new URL(process.env.NEXT_PUBLIC_URL)
+  if (domain !== mainDomain.host) {
     // get the subName from the domain mappings cache
     const cachedDomainMappings = await getDomainMappingsCache()
     const domainMapping = cachedDomainMappings?.[domain?.toLowerCase()]

Comment on lines +46 to +50
if (process.env.NODE_ENV === 'production') {
res.setHeader('Set-Cookie', [
'SameSite=Lax; Secure; HttpOnly'
])
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhh, this looks wrong

data: {
identifier: `sync:${session.user.id}`,
token,
expires: new Date(Date.now() + 1 * 60 * 1000) // 1 minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually use datePivot from lib/time.js for stuff like this

Copy link
Member

@ekzyis ekzyis Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is caused by this API handler, but I see API handler should not return a value, received object. during login.

app  | 2025-04-30T04:31:26.730315000Z  GET /login?domain=snx.ekzy.is&callbackUrl=https%3A%2F%2Fsnx.ekzy.is%2F 307 in 55ms
app  | 2025-04-30T04:31:27.305190000Z API handler should not return a value, received object.
app  | 2025-04-30T04:31:27.305349000Z  GET /api/auth/sync?redirectUrl=https%253A%252F%252Fsnx.ekzy.is%252F 307 in 8ms
app  | 2025-04-30T04:31:28.263846000Z  GET /?type=sync&token=3nvx7ndr6vuyedhe6z9f4fq5l7dzrf38&callbackUrl=https%3A%2F%2Fsnx.ekzy.is%2F 200 in 58ms

Comment on lines +276 to +285
CredentialsProvider({
id: 'sync',
name: 'Sync',
credentials: {
token: { label: 'token', type: 'text' }
},
authorize: async ({ token }, req) => {
return await syncAuth(token, req, res)
}
}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, this seems to get called twice:

app  | 2025-04-30T04:31:38.175024000Z  POST /api/auth/callback/sync 200 in 21ms
app  | 2025-04-30T04:31:38.744352000Z  POST /api/auth/callback/sync 401 in 5ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new product features that weren't there before territories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow territories to be hosted at custom domains
2 participants