Skip to main content

Reduce Docusaurus JS bundle Pt3 - Use less pages, links & tags

· 5 min read
Roger Jiang

As discussed in prior posts, Docusaurus (2.4.0) relies on react-router-config to statically serve up a monolithic route config object, with EVERY ROUTE for the WHOLE site. This gets stuffed down the client on first page load. There's no code splitting - just big fat TBT surpise for the client - not what'd you'd want for a seamless browsing experience. See for yourself, inspect your own docusaurus site's bundle and it's likely bursting at the seams with plaintext routes.

It's the Docusaurus Way or the Highway

The best (and only simple) solution is to USE LESS PAGES, LINKS & TAGS.

Otherwise, look for a better framework which implements more modern routing (React Router 6+, Next.js or anything that attempts to code split).

Skip Leaf Nodes & Consolidate Docs

Helper function to check map for whether doc ID has any references or children - whether it is a terminating 'leaf' node
export function isLeaf(id: string) {
const no_refs = !hasRefs(id)
const no_children = !hasChildren(id)
return no_refs && no_children
}
Helper function to return modified path for leaf doc - as anchor segment attach to parent. Helps to dedup num routes in routeConfig.
export function get_path_from_id(id: string) {
const leaf = isLeaf(id)
const path = path_map.get(id)
if (!path) return path
const anchor_path = !leaf ? path : return_anchor_from_parent(path)
return anchor_path
}

Dedup & Omit "orphan" tags

Refactor tag getter function to also loop over each tag and check/filter non-leaf docs
export function getNonOrphanTags(id: string) {
const tags = map_id_to_tags.get(id)
if (!tags) return []
tags.forEach((tag) => {
const ids = getTagIds(tag)?.filter((id) => !isLeaf(id))
if (ids?.length === 1) map_all_tags_to_ids.set(tag, [id])
})
return tags.filter((tag) => getTagIds(tag)!.length > 1)
}
Refactor tag creation logic - infer from path - include parent slugs - escape unsafe YAML - dedup
function map_to_tags(id: string, dirpath: string) {
let title = id_to_plaintext(id)?.replace(/"/g, `'`)
const title_yaml = title?.replace(/\\/g, "\\u005C\\u2028").trim() || ""
const hasTags = Boolean(id_to_tags(id)?.length)
let init_tags = hasTags
? [...(id_to_tags(id) as string[]), title_yaml]
: [title_yaml]
const prev_slugs = dirpath
.split("/")
.slice(1, -2)
.map((str) => str.replace(/-/g, " ").trim())
let tags = uniq(
[...prev_slugs, ...init_tags]
.filter((tag) => tag.length < 30)
.map((str) => startCase(str))
.filter((str) => str.length > 0)
)
tags = uniqWith(tags, (a, b) => kebabCase(a) === kebabCase(b))
map_id_to_tags.set(id, tags)
tags.forEach((tag) => {
num_tags += 1
const hasTag = map_all_tags_to_ids.get(tag)
if (hasTag) {
let p = map_all_tags_to_ids.get(tag)
if (!p) return
map_all_tags_to_ids.set(tag, [...p, id])
}
if (!hasTag) {
map_all_tags_to_ids.set(tag, [id])
return
}
})
}

The above code for hasChildren called on getChildren to check for existing IDs for listed child nodes - however, I discovered that Remnote also use the children field to embed metadata as well as actual docs. This includes tags, color highlight attributes, TODO status & external URL source links - fields that should be ignored for leaf terminating nodes.

The above needed to be modified to filter those out - to properly render leaf nodes.

Modified getChildren helper - with added map check to memoize filtered children
export function getChildren(id: string) {
const map_child = map_children.get(id)
if (map_child) return map_child
if (!map_child) {
const children = map_all.get(id)?.children?.filter(
(child_id) =>
map_all.get(child_id)?.key[0]?._id !== "KWSN4xHJXyvxWX2Px" && // Sources
map_all.get(child_id)?.key[0]?._id !== "AvyJPAFLACRPsmBGW" && // Status
map_all.get(child_id)?.key[1]?._id !== "AvyJPAFLACRPsmBGW" && // Status
map_all.get(child_id)?.key[0] !== "contains:" && // wtf is contains: ?!
map_all.get(child_id)?.key[1]?._id !== id && // wtf is contains: idid - seems like pointless circular self-refercing?!
map_all.get(child_id)?.key[0]?._id !== "RHoPcFwuXHTt89FK6" // Color
)
if (children) map_children.set(id, children)
// console.log(children)
return children
}
}

Cut down on long nested slugs

Currently, the JSON to MDX transform script uses doc keys to assign slug paths. However, a lot of my older notes included bullet points with long paragraphs (>50 chars) with nested content. I had opted to not

added snippet to debug longer slug keys
let long_slugs_arr: string[] = []

async function loop_docs_mkdir(__props) {
//...
if (!skip_next && slug_key && debug_slug && slug_key.length > 40)
long_slugs_arr.push(slug_key)
if (num === __DOC_LENGTH && debug_slug) {
try {
const long_slugs_mdx = long_slugs_arr.join("\n")
fs.outputFileSync(`test/long-slugs.mdx`, long_slugs_mdx)
} catch (error) {}
}
}

Current improvements & Way Forward

~8MB main bundle reduced by -66% to >3MB

Build times down from 3hr+ (with guaranteed crash) to 10 mins (if cache, else 26mins; no crash).

Quick docusaurus build times (after wiping docs folder) down to ~1min.

This is an okay improvement, but still unacceptable since it largely came down to cutting down on content, which merely sidesteps the issue. The reduced bundle is STILL largely bloated with lots of plaintext routes, which kicks mobile lighthouse in the TBT. Without this crap - it should be >500KB at most. It's a looming time-bomb for future expansion.

I think this implementation of client-based routing is a good example of "one step forward, two steps back". Perhaps it's better to opt out of client-based <Link> routing completely. It's a question of whether the inital laggy load time (for slow mobiles) is worth the smoother page transitions.

Docusaurus needs to migrate away from react-router-config. Looking at the code, there seems to be no obvious way to code-split the routeConfig object. @docusaurus/core is deeply coupled to react-router-config for sidebar/tag generation, which has been abandonware since 2019. Updating that dependency would require a lot of work - far too much effort compared to abandoning the dinosaur-ship for Nextjs/Nextra.