tldraw/apps/dotcom-bookmark-extractor/lib/unfurl.ts

48 lines
1.4 KiB
TypeScript
Raw Normal View History

bookmark: fix up double request and rework extractor (#3856) This code has started to bitrot a bit and this freshens it up a bit. - there's a double request happening for every bookmark paste at the moment, yikes! One request originates from the paste logic, and the other originates from the `onBeforeCreate` in `BookmarkShapeUtil`. They both see that an asset is missing and race to make the request at the same time. It _seems_ like we don't need the `onBeforeCreate` anymore. But, if I'm mistaken on some edge case here lemme know and we can address this in a different way. - the extractor is really crusty (the grabity code is from 5 yrs ago and hasn't been updated) and we don't have control over it. i've worked on unfurling stuff before with Paper and my other projects and this reworks things to use Cheerio, which is a more robust library. - this adds `favicon` to the response request which should usually default to the apple-touch-icon. this helps with some better bookmark displays (e.g. like Wikipedia if an image is empty) In general, this'll start to make this more maintainable and improvable on our end. Double request: <img width="1496" alt="Screenshot 2024-05-31 at 17 54 49" src="https://github.com/tldraw/tldraw/assets/469604/22033170-caaa-4fd2-854f-f19b61611978"> Before: <img width="355" alt="Screenshot 2024-05-31 at 17 55 02" src="https://github.com/tldraw/tldraw/assets/469604/fd272669-ee52-4cc7-bed7-72a8ed8d53a0"> After: <img width="351" alt="Screenshot 2024-05-31 at 17 55 44" src="https://github.com/tldraw/tldraw/assets/469604/87d27342-0d49-4cfc-a811-356370562d19"> ### Change Type <!-- ❗ Please select a 'Scope' label ❗️ --> - [ ] `sdk` — Changes the tldraw SDK - [x] `dotcom` — Changes the tldraw.com web app - [ ] `docs` — Changes to the documentation, examples, or templates. - [ ] `vs code` — Changes to the vscode plugin - [ ] `internal` — Does not affect user-facing stuff <!-- ❗ Please select a 'Type' label ❗️ --> - [x] `bugfix` — Bug fix - [ ] `feature` — New feature - [x] `improvement` — Improving existing features - [ ] `chore` — Updating dependencies, other boring stuff - [ ] `galaxy brain` — Architectural changes - [ ] `tests` — Changes to any test code - [ ] `tools` — Changes to infrastructure, CI, internal scripts, debugging tools, etc. - [ ] `dunno` — I don't know ### Test Plan 1. Test pasting links in, and pasting again. ### Release Notes - Bookmarks: fix up double request and rework extractor code. --------- Co-authored-by: Steve Ruiz <steveruizok@gmail.com>
2024-06-10 10:50:49 +00:00
import cheerio from 'cheerio'
export async function unfurl(url: string) {
const response = await fetch(url)
if (response.status >= 400) {
throw new Error(`Error fetching url: ${response.status}`)
}
const contentType = response.headers.get('content-type')
if (!contentType?.includes('text/html')) {
throw new Error(`Content-type not right: ${contentType}`)
}
const content = await response.text()
const $ = cheerio.load(content)
const og: { [key: string]: string | undefined } = {}
const twitter: { [key: string]: string | undefined } = {}
$('meta[property^=og:]').each((_, el) => (og[$(el).attr('property')!] = $(el).attr('content')))
$('meta[name^=twitter:]').each((_, el) => (twitter[$(el).attr('name')!] = $(el).attr('content')))
const title = og['og:title'] ?? twitter['twitter:title'] ?? $('title').text() ?? undefined
const description =
og['og:description'] ??
twitter['twitter:description'] ??
$('meta[name="description"]').attr('content') ??
undefined
let image = og['og:image:secure_url'] ?? og['og:image'] ?? twitter['twitter:image'] ?? undefined
let favicon =
bookmark: fix up double request and rework extractor (#3856) This code has started to bitrot a bit and this freshens it up a bit. - there's a double request happening for every bookmark paste at the moment, yikes! One request originates from the paste logic, and the other originates from the `onBeforeCreate` in `BookmarkShapeUtil`. They both see that an asset is missing and race to make the request at the same time. It _seems_ like we don't need the `onBeforeCreate` anymore. But, if I'm mistaken on some edge case here lemme know and we can address this in a different way. - the extractor is really crusty (the grabity code is from 5 yrs ago and hasn't been updated) and we don't have control over it. i've worked on unfurling stuff before with Paper and my other projects and this reworks things to use Cheerio, which is a more robust library. - this adds `favicon` to the response request which should usually default to the apple-touch-icon. this helps with some better bookmark displays (e.g. like Wikipedia if an image is empty) In general, this'll start to make this more maintainable and improvable on our end. Double request: <img width="1496" alt="Screenshot 2024-05-31 at 17 54 49" src="https://github.com/tldraw/tldraw/assets/469604/22033170-caaa-4fd2-854f-f19b61611978"> Before: <img width="355" alt="Screenshot 2024-05-31 at 17 55 02" src="https://github.com/tldraw/tldraw/assets/469604/fd272669-ee52-4cc7-bed7-72a8ed8d53a0"> After: <img width="351" alt="Screenshot 2024-05-31 at 17 55 44" src="https://github.com/tldraw/tldraw/assets/469604/87d27342-0d49-4cfc-a811-356370562d19"> ### Change Type <!-- ❗ Please select a 'Scope' label ❗️ --> - [ ] `sdk` — Changes the tldraw SDK - [x] `dotcom` — Changes the tldraw.com web app - [ ] `docs` — Changes to the documentation, examples, or templates. - [ ] `vs code` — Changes to the vscode plugin - [ ] `internal` — Does not affect user-facing stuff <!-- ❗ Please select a 'Type' label ❗️ --> - [x] `bugfix` — Bug fix - [ ] `feature` — New feature - [x] `improvement` — Improving existing features - [ ] `chore` — Updating dependencies, other boring stuff - [ ] `galaxy brain` — Architectural changes - [ ] `tests` — Changes to any test code - [ ] `tools` — Changes to infrastructure, CI, internal scripts, debugging tools, etc. - [ ] `dunno` — I don't know ### Test Plan 1. Test pasting links in, and pasting again. ### Release Notes - Bookmarks: fix up double request and rework extractor code. --------- Co-authored-by: Steve Ruiz <steveruizok@gmail.com>
2024-06-10 10:50:49 +00:00
$('link[rel="apple-touch-icon"]').attr('href') ??
$('link[rel="icon"]').attr('href') ??
undefined
if (image && !image?.startsWith('http')) {
image = new URL(image, url).href
}
if (favicon && !favicon?.startsWith('http')) {
favicon = new URL(favicon, url).href
}
bookmark: fix up double request and rework extractor (#3856) This code has started to bitrot a bit and this freshens it up a bit. - there's a double request happening for every bookmark paste at the moment, yikes! One request originates from the paste logic, and the other originates from the `onBeforeCreate` in `BookmarkShapeUtil`. They both see that an asset is missing and race to make the request at the same time. It _seems_ like we don't need the `onBeforeCreate` anymore. But, if I'm mistaken on some edge case here lemme know and we can address this in a different way. - the extractor is really crusty (the grabity code is from 5 yrs ago and hasn't been updated) and we don't have control over it. i've worked on unfurling stuff before with Paper and my other projects and this reworks things to use Cheerio, which is a more robust library. - this adds `favicon` to the response request which should usually default to the apple-touch-icon. this helps with some better bookmark displays (e.g. like Wikipedia if an image is empty) In general, this'll start to make this more maintainable and improvable on our end. Double request: <img width="1496" alt="Screenshot 2024-05-31 at 17 54 49" src="https://github.com/tldraw/tldraw/assets/469604/22033170-caaa-4fd2-854f-f19b61611978"> Before: <img width="355" alt="Screenshot 2024-05-31 at 17 55 02" src="https://github.com/tldraw/tldraw/assets/469604/fd272669-ee52-4cc7-bed7-72a8ed8d53a0"> After: <img width="351" alt="Screenshot 2024-05-31 at 17 55 44" src="https://github.com/tldraw/tldraw/assets/469604/87d27342-0d49-4cfc-a811-356370562d19"> ### Change Type <!-- ❗ Please select a 'Scope' label ❗️ --> - [ ] `sdk` — Changes the tldraw SDK - [x] `dotcom` — Changes the tldraw.com web app - [ ] `docs` — Changes to the documentation, examples, or templates. - [ ] `vs code` — Changes to the vscode plugin - [ ] `internal` — Does not affect user-facing stuff <!-- ❗ Please select a 'Type' label ❗️ --> - [x] `bugfix` — Bug fix - [ ] `feature` — New feature - [x] `improvement` — Improving existing features - [ ] `chore` — Updating dependencies, other boring stuff - [ ] `galaxy brain` — Architectural changes - [ ] `tests` — Changes to any test code - [ ] `tools` — Changes to infrastructure, CI, internal scripts, debugging tools, etc. - [ ] `dunno` — I don't know ### Test Plan 1. Test pasting links in, and pasting again. ### Release Notes - Bookmarks: fix up double request and rework extractor code. --------- Co-authored-by: Steve Ruiz <steveruizok@gmail.com>
2024-06-10 10:50:49 +00:00
return {
title,
description,
image,
favicon,
}
}