bookmark: fix up double request and rework extractor (#3856)
This code has started to bitrot a bit and this freshens it up a bit.
- there's a double request happening for every bookmark paste at the
moment, yikes! One request originates from the paste logic, and the
other originates from the `onBeforeCreate` in `BookmarkShapeUtil`. They
both see that an asset is missing and race to make the request at the
same time. It _seems_ like we don't need the `onBeforeCreate` anymore.
But, if I'm mistaken on some edge case here lemme know and we can
address this in a different way.
- the extractor is really crusty (the grabity code is from 5 yrs ago and
hasn't been updated) and we don't have control over it. i've worked on
unfurling stuff before with Paper and my other projects and this reworks
things to use Cheerio, which is a more robust library.
- this adds `favicon` to the response request which should usually
default to the apple-touch-icon. this helps with some better bookmark
displays (e.g. like Wikipedia if an image is empty)
In general, this'll start to make this more maintainable and improvable
on our end.
Double request:
<img width="1496" alt="Screenshot 2024-05-31 at 17 54 49"
src="https://github.com/tldraw/tldraw/assets/469604/22033170-caaa-4fd2-854f-f19b61611978">
Before:
<img width="355" alt="Screenshot 2024-05-31 at 17 55 02"
src="https://github.com/tldraw/tldraw/assets/469604/fd272669-ee52-4cc7-bed7-72a8ed8d53a0">
After:
<img width="351" alt="Screenshot 2024-05-31 at 17 55 44"
src="https://github.com/tldraw/tldraw/assets/469604/87d27342-0d49-4cfc-a811-356370562d19">
### Change Type
<!-- ❗ Please select a 'Scope' label ❗️ -->
- [ ] `sdk` — Changes the tldraw SDK
- [x] `dotcom` — Changes the tldraw.com web app
- [ ] `docs` — Changes to the documentation, examples, or templates.
- [ ] `vs code` — Changes to the vscode plugin
- [ ] `internal` — Does not affect user-facing stuff
<!-- ❗ Please select a 'Type' label ❗️ -->
- [x] `bugfix` — Bug fix
- [ ] `feature` — New feature
- [x] `improvement` — Improving existing features
- [ ] `chore` — Updating dependencies, other boring stuff
- [ ] `galaxy brain` — Architectural changes
- [ ] `tests` — Changes to any test code
- [ ] `tools` — Changes to infrastructure, CI, internal scripts,
debugging tools, etc.
- [ ] `dunno` — I don't know
### Test Plan
1. Test pasting links in, and pasting again.
### Release Notes
- Bookmarks: fix up double request and rework extractor code.
---------
Co-authored-by: Steve Ruiz <steveruizok@gmail.com>
2024-06-10 10:50:49 +00:00
|
|
|
import cheerio from 'cheerio'
|
|
|
|
|
|
|
|
export async function unfurl(url: string) {
|
|
|
|
const response = await fetch(url)
|
|
|
|
if (response.status >= 400) {
|
|
|
|
throw new Error(`Error fetching url: ${response.status}`)
|
|
|
|
}
|
|
|
|
const contentType = response.headers.get('content-type')
|
|
|
|
if (!contentType?.includes('text/html')) {
|
|
|
|
throw new Error(`Content-type not right: ${contentType}`)
|
|
|
|
}
|
|
|
|
|
|
|
|
const content = await response.text()
|
|
|
|
const $ = cheerio.load(content)
|
|
|
|
|
|
|
|
const og: { [key: string]: string | undefined } = {}
|
|
|
|
const twitter: { [key: string]: string | undefined } = {}
|
|
|
|
|
|
|
|
$('meta[property^=og:]').each((_, el) => (og[$(el).attr('property')!] = $(el).attr('content')))
|
|
|
|
$('meta[name^=twitter:]').each((_, el) => (twitter[$(el).attr('name')!] = $(el).attr('content')))
|
|
|
|
|
|
|
|
const title = og['og:title'] ?? twitter['twitter:title'] ?? $('title').text() ?? undefined
|
|
|
|
const description =
|
|
|
|
og['og:description'] ??
|
|
|
|
twitter['twitter:description'] ??
|
|
|
|
$('meta[name="description"]').attr('content') ??
|
|
|
|
undefined
|
2024-06-11 11:04:29 +00:00
|
|
|
let image = og['og:image:secure_url'] ?? og['og:image'] ?? twitter['twitter:image'] ?? undefined
|
|
|
|
let favicon =
|
bookmark: fix up double request and rework extractor (#3856)
This code has started to bitrot a bit and this freshens it up a bit.
- there's a double request happening for every bookmark paste at the
moment, yikes! One request originates from the paste logic, and the
other originates from the `onBeforeCreate` in `BookmarkShapeUtil`. They
both see that an asset is missing and race to make the request at the
same time. It _seems_ like we don't need the `onBeforeCreate` anymore.
But, if I'm mistaken on some edge case here lemme know and we can
address this in a different way.
- the extractor is really crusty (the grabity code is from 5 yrs ago and
hasn't been updated) and we don't have control over it. i've worked on
unfurling stuff before with Paper and my other projects and this reworks
things to use Cheerio, which is a more robust library.
- this adds `favicon` to the response request which should usually
default to the apple-touch-icon. this helps with some better bookmark
displays (e.g. like Wikipedia if an image is empty)
In general, this'll start to make this more maintainable and improvable
on our end.
Double request:
<img width="1496" alt="Screenshot 2024-05-31 at 17 54 49"
src="https://github.com/tldraw/tldraw/assets/469604/22033170-caaa-4fd2-854f-f19b61611978">
Before:
<img width="355" alt="Screenshot 2024-05-31 at 17 55 02"
src="https://github.com/tldraw/tldraw/assets/469604/fd272669-ee52-4cc7-bed7-72a8ed8d53a0">
After:
<img width="351" alt="Screenshot 2024-05-31 at 17 55 44"
src="https://github.com/tldraw/tldraw/assets/469604/87d27342-0d49-4cfc-a811-356370562d19">
### Change Type
<!-- ❗ Please select a 'Scope' label ❗️ -->
- [ ] `sdk` — Changes the tldraw SDK
- [x] `dotcom` — Changes the tldraw.com web app
- [ ] `docs` — Changes to the documentation, examples, or templates.
- [ ] `vs code` — Changes to the vscode plugin
- [ ] `internal` — Does not affect user-facing stuff
<!-- ❗ Please select a 'Type' label ❗️ -->
- [x] `bugfix` — Bug fix
- [ ] `feature` — New feature
- [x] `improvement` — Improving existing features
- [ ] `chore` — Updating dependencies, other boring stuff
- [ ] `galaxy brain` — Architectural changes
- [ ] `tests` — Changes to any test code
- [ ] `tools` — Changes to infrastructure, CI, internal scripts,
debugging tools, etc.
- [ ] `dunno` — I don't know
### Test Plan
1. Test pasting links in, and pasting again.
### Release Notes
- Bookmarks: fix up double request and rework extractor code.
---------
Co-authored-by: Steve Ruiz <steveruizok@gmail.com>
2024-06-10 10:50:49 +00:00
|
|
|
$('link[rel="apple-touch-icon"]').attr('href') ??
|
|
|
|
$('link[rel="icon"]').attr('href') ??
|
|
|
|
undefined
|
|
|
|
|
2024-06-26 11:36:57 +00:00
|
|
|
if (image && !image?.startsWith('http')) {
|
2024-06-11 11:04:29 +00:00
|
|
|
image = new URL(image, url).href
|
|
|
|
}
|
2024-06-26 11:36:57 +00:00
|
|
|
if (favicon && !favicon?.startsWith('http')) {
|
2024-06-11 11:04:29 +00:00
|
|
|
favicon = new URL(favicon, url).href
|
|
|
|
}
|
|
|
|
|
bookmark: fix up double request and rework extractor (#3856)
This code has started to bitrot a bit and this freshens it up a bit.
- there's a double request happening for every bookmark paste at the
moment, yikes! One request originates from the paste logic, and the
other originates from the `onBeforeCreate` in `BookmarkShapeUtil`. They
both see that an asset is missing and race to make the request at the
same time. It _seems_ like we don't need the `onBeforeCreate` anymore.
But, if I'm mistaken on some edge case here lemme know and we can
address this in a different way.
- the extractor is really crusty (the grabity code is from 5 yrs ago and
hasn't been updated) and we don't have control over it. i've worked on
unfurling stuff before with Paper and my other projects and this reworks
things to use Cheerio, which is a more robust library.
- this adds `favicon` to the response request which should usually
default to the apple-touch-icon. this helps with some better bookmark
displays (e.g. like Wikipedia if an image is empty)
In general, this'll start to make this more maintainable and improvable
on our end.
Double request:
<img width="1496" alt="Screenshot 2024-05-31 at 17 54 49"
src="https://github.com/tldraw/tldraw/assets/469604/22033170-caaa-4fd2-854f-f19b61611978">
Before:
<img width="355" alt="Screenshot 2024-05-31 at 17 55 02"
src="https://github.com/tldraw/tldraw/assets/469604/fd272669-ee52-4cc7-bed7-72a8ed8d53a0">
After:
<img width="351" alt="Screenshot 2024-05-31 at 17 55 44"
src="https://github.com/tldraw/tldraw/assets/469604/87d27342-0d49-4cfc-a811-356370562d19">
### Change Type
<!-- ❗ Please select a 'Scope' label ❗️ -->
- [ ] `sdk` — Changes the tldraw SDK
- [x] `dotcom` — Changes the tldraw.com web app
- [ ] `docs` — Changes to the documentation, examples, or templates.
- [ ] `vs code` — Changes to the vscode plugin
- [ ] `internal` — Does not affect user-facing stuff
<!-- ❗ Please select a 'Type' label ❗️ -->
- [x] `bugfix` — Bug fix
- [ ] `feature` — New feature
- [x] `improvement` — Improving existing features
- [ ] `chore` — Updating dependencies, other boring stuff
- [ ] `galaxy brain` — Architectural changes
- [ ] `tests` — Changes to any test code
- [ ] `tools` — Changes to infrastructure, CI, internal scripts,
debugging tools, etc.
- [ ] `dunno` — I don't know
### Test Plan
1. Test pasting links in, and pasting again.
### Release Notes
- Bookmarks: fix up double request and rework extractor code.
---------
Co-authored-by: Steve Ruiz <steveruizok@gmail.com>
2024-06-10 10:50:49 +00:00
|
|
|
return {
|
|
|
|
title,
|
|
|
|
description,
|
|
|
|
image,
|
|
|
|
favicon,
|
|
|
|
}
|
|
|
|
}
|