March 19, 2024

Contrast

Optimizing Web UI for Large Datasets

If you’re unfamiliar with bitdrift, our mission is to craft better observability tools. During development of Capture, we faced the challenge of efficiently handling potentially hundreds of thousands of logs in a single viewing session, and created an optimized web strategy capable of efficiently displaying and navigating those vast datasets. This post dives into what we did behind the scenes that made this a reality.

Imagine this: a developer at [insert mobile app here] is poised to tackle the latest bug. They’re armed with sample application sessions, each brimming with tens of thousands of logs, all of which need to be sifted through to uncover insights.

Traditionally, the debug journey might involve submitting a query to retrieve logs for each session, then embarking on the painstaking process of sifting through these logs.

Mobile app sessions, especially in a complex environment like ride-sharing apps, can span over multiple hours, generating hundreds of thousands of logs. The critical challenge is managing this immense volume of data without compromising the performance or user experience of the web view.

Note: All code snippets in this post are in vanilla JavaScript/TypeScript. In reality you’d use your view library of choice to optimize writing to the DOM, and cleanup any observers/listeners. For the sake of readability, I’m using a very simple tagged template function to construct elements:

typescript
/**
 * A tagged template literal function
 * for creating HTML fragments.
 * @usage
 *
 * const li = html`<li>${item.id}</li>`;
 */
const html = (
    strings: TemplateStringsArray,
    ...values: unknown[]): DocumentFragment => {
        return document
            .createRange()
            .createContextualFragment(
                strings.reduce((acc, str, i) =>
                    `${acc}${str}${values[i] || ''}`, ''
                )
            );
    }

Challenges

In scenarios involving high-volume data, such as reviewing logs from extensive app sessions, providing a seamless interaction where data appears to be instantly accessible without loading the entire dataset upfront is crucial to the user experience. Traditional pagination, which requires users to manually navigate through pages, interrupts the flow of investigation, making the process tedious and less intuitive. Therefore, a more efficient approach to pagination is essential to reduce initial load times and conserve memory. This approach prevents browser slowdowns or crashes, ensuring the application remains responsive and enhances the user's ability to interact with the data without disrupting their investigative process.

Efficient Management of Extensive Log Data

Our Pagination Approach

To accommodate users potentially skipping through the session logs non-sequentially, we’ve passed on cursor based pagination, in favor of offset pagination. This allows us to target a specific page to load without having to have traversed the preceding data and so we can load data dynamically, based on the user's position within the session, rather than just sequentially loading page after page. We have a couple of key bits of data that we can leverage for our implementation: We know the current total number of logs in the session, and we know the number of logs per page. In addition to this, we know that we would always render a log as a fixed height row within our list. Note: In our case logs for a session are written to cold storage, and when requested are hydrated into a SQLite database. It is this hydrated data which has a known number of logs. For sessions which are ongoing, there may be a need to rehydrate in the background in order to access the latest data. More on this in a future blog post!

typescript
const render = () => {
    const ul = document.createElement('ul');
    data.forEach((page, pageIndex) => {
        // Do not render if page isn't "loaded"
        if (!loadedPages.has(pageIndex + 1)) return;

        page.forEach((item, index) => {
            const saturation = pageIndex / CONSTANTS.pages;
            const itemIndex = pageIndex * CONSTANTS.pageSize + index;
            const y = itemIndex * (CONSTANTS.listItemHeight + CONSTANTS.listGap);

            const li = html`<li style=" --list-item-y:${y}; --page-saturation:${saturation};">${item.id}</li>`;

            ul.appendChild(li);
        });
    });

    app.innerHTML = ul.outerHTML;
};

window.addEventListener('scroll', () => {
    const { y } = app.getBoundingClientRect();

    // Check top/bottom edge of screen for page intersections
    const screenTop = getPageFromPosition(Math.abs(y - window.innerHeight));
    const screenBottom = getPageFromPosition(Math.abs(y));

    if (!isLoaded(screenTop) && screenTop) {
        fetchPage(screenTop);
    }

    if (!isLoaded(screenBottom) && screenBottom) {
        fetchPage(screenBottom);
    }

    render();
});

The approach above uses a simple scroll event listener to determine which page of data is in the viewport and “loads” it if needed. We could also use IntersectionObserver and drop the calculation of the page position within the screen.

typescript
const render = () => {
    const ul = document.createElement('ul');
    const observer = new IntersectionObserver((entries) => {
        entries.forEach((entry) => {
            if (entry.isIntersecting && entry.target instanceof HTMLElement) {
                fetchPage(entry.target.dataset.page);
            }
        });
    });

    data.forEach((page, pageIndex) => {
        page.forEach((item, index) => {
            const saturation = pageIndex / CONSTANTS.pages;
            const itemIndex = pageIndex * CONSTANTS.pageSize + index;
            const y = itemIndex * (CONSTANTS.listItemHeight + CONSTANTS.listGap);
            let li;

            if (!isLoaded(pageIndex + 1)) {
                // Render a placeholder element to observe
                li = html`<li style=" --list-item-y:${y};" data-unloaded="true" data-page="${pageIndex + 1}">
                    not loaded
                </li>`;
            } else {
                li = html`<li style=" --list-item-y:${y}; --page-saturation:${saturation};">${item.id}</li>`;
            }
            ul.appendChild(li);
        });
    });

    list.innerHTML = ul.outerHTML;
    // Observe all unloaded list items
    document.querySelectorAll('[data-unloaded]').forEach((el) => {
        observer.observe(el);
    });
    document.querySelector('[data-pages]').innerHTML = [...loadedPages].join(', ');
};

The end result would look something like this: As the user navigates the list, they’ll eventually reach a point where the data has not been fetched. By either scroll position or intersection observation, reaching that point triggers a fetch for the page of missing data. You can see a log of the loaded pages in the right hand corner, demonstrating the ability to load pages out of order.

Seamless and Instantaneous Data Navigation

The implementation of list virtualization (or windowing) is another pivotal tool. By rendering only a small portion of the data at any given time, we can significantly reduce the resources required to display the logs. Coupled with predictive data loading, based on user interaction patterns, we can ensure that subsequent data is ready before the user needs it, creating an illusion of instant access to the entire dataset.

Custom Virtualization Solution

There are third party solutions to list virtualization out there, and they do their job well. However during our evaluation we didn’t find one solution which ticked all the boxes we were looking for:

Lightweight bundle size: There are lightweight solutions out there, but with that comes an opinionated feature set which may or may not fit your needs.
Use of transform instead of top: With list virtualization not every element of a list is rendered at any given moment. This means that we can’t rely on the document flow to position the rendered elements correctly. Instead they have to be offset from the top of the container. Many libraries use absolute position and top to do this. Leveraging the GPU by using translate3d instead can make rendering more performant, especially when animating.
Ability to dynamically adjust list window size: We want to be able to manipulate the number of elements rendered (window size) dynamically to tweak the performance of our list.
Use of page scroll, instead of being bound to a scroll container: To give the impression that the entire list is available to a user, we want our page length to reflect the length of this list. This means the virtualization needs to be relative to window scroll rather than a scrollable container.
Fine grained control over list re-renders when data source changes: Given our pagination strategy, the source of the data used to populate the virtualized list will change every time a new page is loaded. We want to be able to tightly control how and when the virtual list reacts to those changes so as not to unnecessarily re-render the portion of the list which is currently on the screen.

Given these specific needs, we’ll build our own list virtualization solution targeting them.

typescript
const renderVirtualList = () => {
    // Clear list
    const ul = document.querySelector('ul') as HTMLUListElement;
    ul.innerHTML = '';

    // Calculate visible range
    const rowHeight = CONSTANTS.listItemWithGapHeight;
    const offsetTop = 0;
    const rangeStart = Math.min(data.length, Math.floor((window.scrollY - offsetTop) / rowHeight));
    const pageSize = Math.ceil(window.innerHeight / rowHeight);
    const rangeEnd = rangeStart + pageSize;

    // Render visible range
    data.slice(rangeStart, rangeEnd).forEach((item, index) => {
        const itemIndex = rangeStart + index;
        const y = itemIndex * (CONSTANTS.listItemHeight + CONSTANTS.listGap);
        const li = html`<li style=" --list-item-y:${y};">${item.id}</li>`;
        ul.appendChild(li);
    });
};

window.addEventListener('scroll', renderVirtualList);

css
ul {
  gap: var(--list-gap);
  height: var(--list-height);

  & li {
    position: absolute;
    height: var(--list-item-height);
    transform: translate3d(0, calc(var(--list-item-y) * 1px), 0);
  }
}

In the basic implementation above, we take a known row height and offset (i.e. where the list sits relative to the top of the document), and calculate the index range of the array that we’re rendering. Once calculated we slice the array at those indexes, and render only those elements ensuring that they are positioned relative to the list. There are a few shortcomings to the above basic implementation:

By only rendering the number of rows which fit in the viewport, we can end up with the situation where at the top/bottom of the viewport as a user scrolls, items jump in and out of existence. This is particularly obvious when scrolling quickly.
If the user jumps about frequently, or is scrolling through the session very quickly, we can waste resources attempting to render rows which the user is blasting past.

We’ll address these issues with a few additional features. To address the first issue, we’ll implement dynamic and incremental window sizes. As a user is scrolling a session, we render rows within the viewport as well as N rows either side of the visible section. When the user jumps to a new location within the session, we render only those rows which are visible, and then incrementally increase that window size until we reach the desired overlap size. This provides an optimized experience for both users who are scrolling through the session in a linear fashion, and those who are jumping around a lot. Adding this to the render function handles the range padding:

typescript
  // Pad the range if needed
  const start = Math.max(0, rangeStart - rangePadding);
  const end = Math.min(data.length, rangeEnd + rangePadding);

And then we can add a tick function to handle the gradual increase of the padding, up to a maximum:

typescript
const tick = () => {
    // Increase the padding and render the list
    rangePadding = Math.min(rangePadding + 1, RANGE_PADDING_MAX);
    renderVirtualList();

    // Keep calling tick until the padding has reached the maximum
    if (rangePadding < RANGE_PADDING_MAX) {
        requestAnimationFrame(tick);
    }
};

window.addEventListener('scroll', () => {
    // Reset rangePadding when scrolling, this can be optimized on only happen when scrolling in larger "jumps"
    rangePadding = 0;
    tick();
});

Finally, for the third issue we’ll implement scroll threshold detection. When a user is scrolling very quickly through the session (i.e. in large “jumps”), we temporarily disable all rendering. Only when the scrolling speed drops to a certain threshold do we re-enable rendering. During this fast render mode we apply a repeated background graphic to the session list which gives the impression of logs of data without having to render anything. To make things look nice while “speed scrolling” we’ll also apply a background image which looks like an outline of our eventually rendered data. There’s no browser native way to determine the speed of scrolling, so we’ll fake it by looking at the size of each scrolling step compared to the last. This gives us the size of the change in position. From there we can check the delta against some kind of threshold, and toggle the scrolling state based on that comparison.

typescript
const TIMEOUT_LENGTH = 50; // How long to delay before toggling the scrolling state back
const THRESHOLD = 100; // In pixels, but this could be calculated as a percentage of the list height
let previousScrollY = 0;
let timeout = 0;

const state = new Proxy(
    { isScrolling: false },
    {
        set(target, prop: 'isScrolling', value: boolean) {
            if (target[prop] !== value) {
                // Set some kind of flag to prevent list from rendering
                shouldRender = !value;
            }
            target[prop] = value;
            return true;
        },
    },
);

// Clears the current timeout
const clear = () => {
    window.clearTimeout(timeout);
    timeout = 0;
};

const setScrolling = (scrolling: boolean) => {
    if (scrolling) {
        state.isScrolling = true;
    }

    // Set back to false after a delay
    timeout = window.setTimeout(() => {
        state.isScrolling = false;
    }, TIMEOUT_LENGTH);
};

window.addEventListener('scroll', () => {
    const scrollDelta = Math.abs(window.scrollY - previousScrollY);
    const exceedsThreshold = scrollDelta > THRESHOLD;

    clear(); // Clears any existing timeout
    setScrolling(exceedsThreshold && !state.isScrolling);

    previousScrollY = window.scrollY;
});

And for the placeholder background we can construct an svg which represents an abstract outline of our eventual row. For our use case we used a combination of shapes, but the snippet below shows a simplified version.

typescript
  html`<ul
    style="background: url(data:image/svg+xml;utf8,${encodeURIComponent(
      '<svg xmlns="http://www.w3.org/2000/svg" height="44" viewBox="0 0 400 44"><rect width="400" height="40" x="0" y="0" fill="#fff"/></svg>'
    )});"
  ></ul>`

Future Enhancements

In scenarios involving exceptionally large datasets, such as sessions generating hundreds of thousands of logs, our current system might still strain under the sheer volume of data, especially if a user navigates through an entire session from start to finish. The accumulation of session data in memory, in these cases, could lead to performance bottlenecks or even browser crashes. To mitigate this, we're exploring more robust memory management strategies. One approach is the adoption of a more dynamic First-In-First-Out (FIFO) buffering system for pages. This would involve selectively purging less frequently accessed pages from memory, thereby ensuring that the system's performance remains optimal, even as the dataset size increases. This refinement aims to balance data accessibility with efficient resource utilization, ensuring that our application can scale to meet the demands of even the most data-intensive sessions.

Conclusion

In crafting a solution to navigate and manage extensive log data efficiently, we've integrated dynamic pagination and list virtualization into a cohesive system that addresses both performance and usability challenges. This comprehensive approach not only ensures our application remains responsive across vast datasets of hundreds of thousands of logs, but also enhances the user experience by providing seamless navigation and instant access to crucial information. Moving forward, we're excited to explore further advancements, continually pushing the boundaries to meet and exceed the expectations of our users. If this sounds interesting to you, please sign up for a free trial and/or come and chat with us in our community Slack. We would love to hear from you!

If this sounds interesting to you, please get a demo and/or come and chat with us in our community Slack. We would love to hear from you! If you ’re excited about what we’re building, and would like to build it with us, check out our open positions. Let’s create something amazing together!

Author

Jackson Hardaker

March 19, 2024

typescript

Challenges

Efficient Management of Extensive Log Data

Our Pagination Approach

typescript

typescript

Seamless and Instantaneous Data Navigation

Custom Virtualization Solution

typescript

css

typescript

typescript

typescript

typescript

Future Enhancements

Conclusion

Stay in the know, sign up to the bitdrift newsletter.