Fast Websites on Slow Networks

A look at how cellular network speed really can affect website performance.

Time is Money

In 1748 Benjamin Franklin wrote in an essay titled Advice to a Young Tradesman, Time is Money. This maxim was as relevant then as it is now as we apply pressure on our web development teams to deliver not only high functionality websites but also ever faster websites. To achieve this developers are implementing proven best practices, such as image compression, content delivery networks and techniques to avoid render blocking to meet an almost insatiable demand for fast loading content to improve the user experience.  

The complexity of the technology required to deliver fast websites means that numerous different causes exist of why user experience can be impacted through slow loading or slow responding webpages. Even though considerable effort is expended in optimizing the speed of loading of webpages, the potential impact a slow network connection can have is often overlooked.

This may be because we develop websites on fast desktop devices connected through fast network backbones to the internet or pre-delivery testing does not include proving performance non-functional requirements (NFRs) across all network speeds and device platforms.

Unfortunately the end result is not as envisioned, except for those connected with the fastest devices on the fastest networks!

In this article we look at what happens to user experience when networks slow down or fast network coverage is not available at all.

What Devices Are Used to Access Websites

Network issues are experienced generally by smart devices, such as tablets and smartphones that are connected through the cellular networks, and are rarely experienced on devices connected on cable connections.

An overview of the extent of the potential problem that exists is shown in Figure 1 the chart shows the breakdown of about 100 million page impressions, organised by operating system (OS). The breakdown shows that over 65% of traffic occurs on smart devices (IOS and Android). These devices will be connected by either cellular network or Wifi.

Of these smart devices, three-quarters of them are running IOS with only 25% of them Android.

This breakdown confirms why many website owners consider the Apple IOS market as one they should be targeting as it is possibly their most lucrative.

Webpage impressions by OS for May 2019
Figure 1: Breakdown by OS Family: May 2019

The Mobile Marketplace

However, when we look at the market share of IOS against Android devices a different picture emerges.

Figure 2 shows that over time in the UK the share of Android devices has risen to be on a par with IOS devices and since 2015 market share has been similar in number. In the same period the competitive OS devices have fallen away leaving only IOS and Android.

Breakdown by OS/Device family in UK
Figure 2: Breakdown by OS/Device Family UK

The global market is significantly different though. Figure 3 shows how Android devices has had a dominate market share of 80% for the last 6 years.

This skew is affected by the high number of Android devices in third world countries where price can be a key factor in the purchase, consequently, many of these devices may not be feature rich or very powerful in CPU and network connectivity.

Breakdown by OS/Device family globally
Figure 3: Breakdown by OS/Device Family Global

Historically Apple’s IOS-based devices have had a significantly higher performance capability than Android phones and it is only recent editions of Android phones, such as the Samsung Galaxy S9 and S10, that some degree of competitive performance has been delivered. However, recent Apple announcements (June 2019) have re-established the performance superiority of Apple’s IOS-based devices.

Even though faster devices are being continually brought to market, these faster devices are in a minority compared to the high number of Android devices in use today.

Consequently, it is not unreasonable to consider that a considerable latent demand may exist of Android users if they are able to connect and utilise websites at acceptable speeds.

Cellular Network Advancement

As the demand for newer and faster smart devices continues, so has the need for the rapid evolution and improvement in cellular network speed and availability.

Proposed as an original idea by Bell labs in 1947 to develop an hexagonal cellular network it was not until the late 1970s before technology advancement allowed for the development of a commercially viable opportunity. This manifested itself as an analog-based service and was effectively the first generation (1G) of mobile telephony. It is in the 1990’s, termed as 2G, that full digital services became a reality.

Since then we have seen many iterations and developments of the standards, (Figure 4) that ensure a common approach to ensuring effective communication services are delivered. Whereas these technical advancements have been successful in driving communications services on a global scale fast, the advent of the fifth generation, 5G, technology is now being delivered with pilot programmes being conducted in many different countries.

Figure 4 shows how the standards have evolved and the theoretical network speeds associated with each standard.

We use smart devices for so many different things now but it is interesting to see how far and fast these developments have brought us and is easy to see why expectations of instant response are demanded.

Cellular network generations and standards
Figure 4: Cellular Network Generations and Standards

What is a Good Connection Speed

The use of raw download bandwidth is a blunt approach on defining how to view the performance of a network service. Advertisements by network providers talk about the fast download speeds of 50 to 300 Mbps as the primary reason to buy their services and the faster the bandwidth, the more expensive it should be.

Data Bandwidth, is the bit-rate of a data transmission media, and is represented as bits-per-second (bps). In English this means how much data is sent in a second. So 50 Mbps means 50 million bits per second. It takes about 5 Mbps to stream a film in HD on Netflix, so we can see that 50 Mbps is a considerable network bandwidth speed.

Network Measurement Tool from speedtest.org
Figure 5: Network Connection on my Desktop

On my office desktop Figure 5 shows that I receive a pretty good service as the speed test, (speedtest.org) shows a 77 Mbps download speed and a very impressive upload speed of 87 Mbps.

At this level of service an instantaneous response could be expected for a webpage to load. Looking at the maths, the median transfer size of a webpage, (source: httparchive June 2019) is 1.9 MB and a download speed of 77 Mbps can deliver 9.6 MB per second. However, actual webpage responses can take seconds to complete, so there are other factors affecting the load of a webpage that we must be aware of.

Looking at Latency

Latency is the delay that data experiences as it makes a round trip through the network. It is measured in milliseconds and for a webpage resource is the time taken for a request from the client browser through the network to the webserver and back again.

Generation Typical Latency
2G 500 ms (0.5 seconds)
3G 100 ms (0.1 seconds)
4G 50 ms (0.05 seconds)
5G 1 ms (0.001 seconds)*

Figure 6: Latency standards by Network Generation

Each of the cellular networking standards sets out the requirements for latency that should be met and as Figure 6 shows, the expected latency has been substantially improved on in each new standard.

Consequently latency is a key component that contributes to network performance.

In 2009 Mike Belshe, a computer scientist with Google, investigated the impact of latency on performance. His paper, More Bandwidth Doesn’t Matter (much), documents that by maintaining a constant latency of 60 ms while increasing bandwidth against 25 selected webpages shows a minimal reduction in page load time (PLT) occurs from about 5 Mbps. Figure 7 shows his published results.

Maintaining Latency While Increasing Bandwidth
Figure 7: Maintaining Latency While Increasing Bandwidth

However, in a second set of tests on the same set of candidate websites he set the bandwidth at 5 Mbps and progressively reduced the latency. Figure 8 shows a linear progression of improvement was observed for page load time. He concluded from this is that lower latency will improve webpage performance.

Maintaining Bandwidth While Reducing Latency
Figure 7: Maintaining Bandwidth While Reducing Latency

The Truth is Out There

However when we get away from the fibre, cable and Wifi connections and we become reliant on cellular networks a different level of experience is received. Cellular technology is unable to deliver consistently high performance, such as 77 Mbps download speeds and although the specifications document the theoretical capabilities of the technology, the delivery speeds are rarely delivered and service levels are inconsistent.

Download and Upload Survey Results
Figure 8: Download and Upload Survey Results

Figure 8 shows the download and corresponding upload bandwidth speeds collected from a survey across southern and eastern England, including London. The left hand Y-axis provides scale in Mbps. The histogram is ordered by ascending download speed. Consisting of over 100 tests, using the speedtest.org app on an Apple iPhone 6+ on the Vodafone network the survey measured the native connection speeds.

The survey shows that there is a wide range in the service were observed including 2.75G, 3G, 3.5G and 4G services. The data points represent the average speeds achieved during the candidate tests.

A key observation is that none of the candidate records show anything close to the theoretically available bandwidth for each of the services delivered.

The results also show that there is a wide spectrum of service delivered, even in built up areas and that upload speed does not appear to be a function of download speed.

Figure 9 overlays the latency for each test, with the right hand side Y axis providing the scale of latency in milliseconds. It is significant that latency is also shown to be variable and cannot be relied on as being a set value at any given download speed.

The higher values for download speed, above 10 Mbps see the latency reduce to 50 ms and below and in many cases even below 40 ms. However, the observations also show that latency can vary to over 100ms especially on slower bandwidth connections.

The 2018 annual review by Opensignal found that 4G coverage is now highly accessible in the UK with an average of over 75% connections being served on a 4G network. However, even though the connection maybe 4G, the network may scale the connection back to a lower standard connection to alleviate capacity or performance issues in their network. Consequently, at certain times 3.5G or even 3G is still necessary.

Consequently, it is necessary to consider how the UX for a website is delivered at different bandwidths as your website may be accessed across slow performing network connections.

Download and Upload Survey Results With Latency Overlaid
Figure 9: Download and Upload Survey Results With Latency Overlaid

Can 5G Help?

As the new kid on the block all of the talk and hype is about 5G. It is undoubtedly a technology with considerable potential to change how many services are delivered and as an enabler for new services. 5G boasts very high bandwidth potential with latency as low as 1ms but little is really known about how it will actually impact on web performance and user experience.

Global Intelligence, as in Figure 10, have predicted that by 2025 5G will account for under 20% of all network connections and that 3G and 4G technologies will be still be critical in delivering 80% of cellular network connections.

This is borne out by Computer Weekly who believe that early adopters can expect to pay 32% premium for 5G but both the roll-out of the technology and acceptance by the general public will delay its replacement of both 3G and 4G.

5G is a promising technology and will open up many new avenues of business opportunity and usage but although it offers almost zero latency, smart devices will have to become considerably more powerful to ensure they are not overrun by the considerable speed up of web content and programming functionality.

Global Share of Mobile Connections
Figure 10: Global Share of Mobile Connections

What Are the Next Steps

From a cellular network perspective it may be possible to exploit 5G for loading webpages, but there is currently little or no experience of how this could be achieved as mobile devices will also need to become considerably more powerful so that they can process data that is delivered considerably quicker.

Therefore faster networks is currently not the panacea as it will be many years before 5G will be ubiquitous and it’s effects on webpage loading become clearer.

User experience can be improved though by all by aligning UX testing against the major devices and networks used by your cellular connected online visitors. This is not a small task but necessary especially if your target market is predominately based on 3G technology and low power Android devices.

Other technologies are emerging or becoming established that can help with improving user experience on mobile devices and also over cellular networks including HTTP/2, QUIC, Progressive Web Apps (PWA) and Accelerated Mobile Pages (AMP). Each of these technologies can bring performance gains but as with all technologies, their use and implementation requires careful planning and testing.

The Performance Implications Of Web Security

Over the past few years all of the browser vendors have substantially enhanced the surety of their product and with it the security of the internet.  This has led to a much wider adoption of the encrypted hypertext transport protocol HTTPS, and browser will now mark websites as ‘not secure’, especially when a form, for example a login or credit card details, is requested. 

Specifically, pages that included forms where login credentials or credit card details could be entered would be labelled as not secure. 

This approach makes perfect sense and has long been part of Google’s intentions, and eventually, we can expect to see all non HTTPS, (HTTP), pages flagged as insecure.

For now though, websites that haven’t yet upgraded to HTTPS from HTTP and the site contains an input form, such as a search box at the top of every page, will see warnings triggered for all the pages on its site. 

However, while this type of protection does not have a performance implication, there are many other security initiatives that you should be aware of and how they can impact on the delivery performance of web pages, and with it customer experience.

This article takes a look at some of the security measures and their potential performance impact as a precursor to later articles that cover each of the technologies in greater depth.

HTTPS protocol displayed in the browser
HTTPS protocol displayed in the browser

Performance Impact

Other things being equal, an HTTPS website will be almost inevitably be slower than its less secure counterpart, thanks to the extra round-trips required for the TLS (Transport Layer Security) handshake that makes HTTP secure.  While this may only amount to a maximum of a less than a hundred milliseconds, human interaction could perceive that the website is slower. What’s more, this effect will be more noticeable on high-latency mobile networks.

Fortunately, there are plenty of ways to make TLS fast. Here are a few:

OCSP Stapling

OCSP stands for Online Certificate Status Protocol. This is a way to ensure that a site’s TLS certificate is valid. The client does completes this task by querying the certificate with the certifying authority. However, this is far from ideal, as it means the client has to retrieve information from a third-party before it can even start getting content from the website.

OCSP stapling works around this delay by passing responsibility for certificate verification from the client, on each webpage request, to the server.  Instead of the client having to do the look-up when it accesses the site, the server will carry out the look-up from time to time to verify the status of the certificate with the authority and will then store, or ‘staple’ the certificate on the webserver.  This enables the client request to be verified by the web server and as such negates the extra TLS handshake protocol and the time it would normally take to execute.  

TLS Session Resumption

TLS session resumption works by storing information about a TLS connection that’s already been established. This allows a client to reconnect to a host with an abbreviated TLS handshake, cutting the time it takes to make the connection.  Consequently, should the client request a second resource from the web server it will no longer need to request certificate authorisation a second time.

HSTS

HSTS stands for HTTP Strict Transport Security and it is designed as an important security enhancement to help prevent man-in-the-middle attacks on the HTTP stream.   This function comes with a knock-on benefit for web performance.

Essentially, HSTS means telling the browser or user agent that it should only ever access your website over HTTPS. This saves a costly redirect to use the HTTPS protocol when a visitor to your website requests the HTTP version. 

To implement HSTS the ability for the server issues a response header to the browser or user agent’s first call enforces connection over HTTPS and disregards any other request, such as via a script, to load over HTTP.  Although this has the disadvantage of only working after the first visit someone makes to your site. 

HTTP/2

HTTP/2, now more commonly known simply as H2, offers a range of performance enhancements that complements the improvements in security.    A pre-requisite for H2 is HTTPS which enforces a high security regime for the fastest HTTP protocol. 

As H2 implements multiple requests and responses to be multiplexed over a single connection, the risk that one slow-loading asset will block other resources is reduced.  Response Headers are also compressed, reducing the size of both requests and responses. 

Other features, such as server push, are evolving, but should offer even more performance benefits.

A More Secure Future

As we are edging ever closer to an HTTPS-only web, delivering greater privacy and better security for all web users, browser vendors seem intent on accelerating the pace of change so an understanding of how security impacts on web performance can help you prepare to ensure customer experience is not impacted. 

Exposing the vagaries of ORM in load testing

Taking a look at database issues and why an ORM-accessed RDBMS may cause web performance issues

After a long journey in design, development and functional testing, rarely is a new website or application completed without a performance load test as one of its final milestones. Project, and sometimes regulatory, sign-off is dependent on certain criteria being met so there is often much riding on the successful outcome of the load test. Because of this there can be considerable trepidation in the team as they start the load testing process as load testing is designed to exercise both the software and hardware configurations to their limits.

Although the clear objective is to obtain the load test milestone, enabling sign-off for the project, real benefits can be achieved from a well-managed load testing programme that exercises different performance aspects, such as normal and peak operations together with stress, spikey and endurance scenarios. For new applications the load test can often be the first time more than a handful of end-to-end interactions occur concurrently and it is also the time when API’s and endpoints, potentially located in different environments, are tested for stability and scalability.

Its Not Just the Software

In our previous article A Cloudy Day in Performance Load Testing we covered some aspects of how cloud-based environments can influence an application’s scale-ability, but other paradigm’s, such as databases, can be just as influential.

Our chart above shows how influential poor-performing databases can be. These results, from a 4 hour endurance load test, show that after an hour of running at 500 concurrent users, degradation of successful test completions occurs. At this time server response slows, causing page load failures, identified by the black lines, for a significant number of tests.

During the test the client was able to determine that this was caused by a specific database query that was scanning a table for a result rather than using an index. Although the table scan gave the correct result, increasing load was developing on the database server that it could not serve in the time expected of it as the load test continued. The client was able to dynamically resolve the issue creating an index and enabling very fast index searches to occur and which alleviated the bottleneck for the remainder of the test.

ORM and RDBMS

Resolving database scan issues is not uncommon in load testing and when working directly with native databases, schema design, indexes and foreign key issues are easy to identify and resolve. However, things become more complicated when the database is abstracted through object relational mapping (ORM), technology.

ORM is a technology that sits between the database (RDBMS) and application code. As it cr eates a level of abstraction between the application code and the database it reduces the amount and complexity of code necessary to interact with a database. It is not a new technology and there are many commercial frameworks that support ORM and because of the abstraction and programming benefits, take up in application development is increasing.

However, ORM comes with a cost especially in its effect on databases and their performance. A quick search on the internet will unearth many good resources that cover ORM performance but sometimes performance enhancements in ORM are insufficient.

In a recent load test the client found that a particular ORM-accessed database resource was serialising and long queues were developing. This in turn was slowing response as the resource use was pervasive across the application. Having run out of ORM performance tuning options, a quick refactoring taking the resource out of ORM control and using a traditional direct SQL-based transactions resolved the issue. Database resources reduced from 70% to 7% utilisation at peak load and further load testing proved the environment could now run on 50% less configured database resources, saving further costs in their budget for cloud resources.

Closing Observation

Correctly structured, load testing is able to deliver a wealth of performance benefits to any web application project and as this article shows sometimes only when you really exercise the application do you find some real opportunities for performance enhancement and real cost saving.

A Cloudy Day in Performance Load Testing

Deploying websites on Cloud infrastructure may be a panacea for many but problems may still exists.

Nowadays it is rare for a website or web application to go live without passing pre-determined performance criteria that has been previously defined as necessary by the business.  This normally comes down to number of webpages or transactions per second all loaded within a specific download speed. Testing to prove the website against this type of performance criteria is necessary to ensure the end-user experience is as intended.

However as more websites move onto cloud infrastructures the benefits of cloud, such as the dynamic addition of infrastructure through auto-scaling and the setting of predictable performance levels with database transaction units (DTUs – in Microsoft’s Azure ) must also be proven in load testing. This is not only to ensure that these features work as expected but also to ensure that costs are not unnecessarily incurred or business not transacted through their mis-configuration or even under-assignment. 

Auto-scaling and Performance Load Testing

As it is a major benefit of cloud environments it is important to get auto-scaling to work correctly as the load on your website fluctuates otherwise you may end up paying for services you are not using.  Fortunately, cloud vendors have made the configuration of auto-scaling simple so that the service is quick and easy to use, but when dealing with dynamic loads on a system you need to ensure that the rules will work when you need them most. 

In a recent load test the client began a user journey (UJ) test that would continually increase load up to 500 concurrent users over a 30 minute period. They were confident that their application would perform well but what we found was the way that auto-scaling had been configured, end-user experience would have been extremely poor. In our results chart we can see that after 2.5 minutes of the test, the download speed of the UJ started to increase at an alarming rate. 

Performance Load Test Results with Auto-scaling

From our previous benchmark test, the UJ should complete in approximately 47 seconds but in the auto-scaling test as the load went over 50 users the download speed started to climb very quickly.  We observed that the CPU was also moving towards 100% at this point, but the planned auto-scaling had not kicked in.      

Auto-scaling had been configured to bring more resources to bear from 70% CPU utilisation, but only if CPU was at 70% or greater for 10 minutes, so we had found the reason why more servers had not spun up.   This proved to be the case as about 10 minutes after auto-scaling had occurred the download speed started to improve due to the extra resources within the environment .

The client quickly configured a new rule that stated that if CPU is 80% or greater, then just spin up the extra resources.  This dynamic change to configuration proved worthy as CPU started to climb again at about 18 minutes. This was sufficient to contain the increasing workload on the CPU resources and start to reduce download speed.

In the graph we can see than the UJ never recovers back to target time of 47 seconds. Further investigation showed that this was due to a secondary problem that had been identified through performance load testing.

Database Transaction Units (DTUs)

Because we could now see the CPU resources had settled to an acceptable level this secondary problem was something new.  

The client’s databases are hosted in Microsoft’s Azure where it is possible to procure a specific performance level for your database. This is where Microsoft is committing a certain level of resources for your specific database with the intention of delivering a predictable level of performance.  The Azure platform implements this approach through the concept of Database Transaction Units (DTUs) and this enables Microsoft to charge a price that will be dependent upon how performant you want your databases to be, so it is important to get these settings right.

Further analysis showed that the number of DTUs assigned to one of the three databases used in this UJ had reached its maximum.  Consequently, the database was unable to work any faster. 

Of course, the simple thing to do at this point would be to add ore DTUs, which over time could prove expensive as this could be a never ending cost to the operating budget.  However, the client decided it was worth a review of how this specific database was being used in this UJ. Using the knowledge gained of how the application works under load the client was able to make some minor enhancements that resolved this without incurring further infrastructure costs.

It’s Not a Cloudy Day Though

As a mainstream technology Cloud delivers many benefits and advanced infrastructure features that enable consistent performance and the dynamic scaling of services.  However, regular and effective performance load testing is essential to maximising cost benefits while ensuring that your website or application will perform for your end-users as expected when most needed.