Desktop multiprocessing: Not so fast

Until recently, you could reasonably expect this year's software to run faster on next year's machines, but that's not necessarily true going forward. For the foreseeable future, significant performance improvements are likely to be achieved only through arduous reprogramming.

Some time ago, computer vendors passed the point of diminishing returns concerning processor clock speeds, and could no longer keep hiking frequency rates. To maintain continued performance improvements, suppliers turned to installing multiple instances of the processor — multiple cores — on a processor chip, and as a result, multicore processors are now mainstream for desktops. But to realize any performance improvements the software has to be able to use those multiple cores.

And to do that, most software will need to be rewritten.

“We have to reinvent computing, and get away from the fundamental premises we inherited from von Neumann,” says Burton Smith, technical fellow at Microsoft Corp., referring to the theories of computer science pioneer John von Neumann (1903 – 1957). “He assumed one instruction would be executed at a time, and we are no longer even maintaining the appearance of one instruction at a time.”

But software cannot always keep up with the advances in hardware, says Tom Halfhill, senior analyst for the Microprocessor Report newsletter in Scottsdale, Ariz. “If you have a task that cannot be parallelized and you are currently on a plateau of performance in a single-processor environment, you will not see that task getting significantly faster in the future.”

New law in town

For four decades, computer performance progress was defined by Moore's Law, which said that the number of devices that could economically be placed on a chip would double every other year. A side effect was that the smaller circuits allowed faster clock speeds, meaning software would run faster without any effort from programmers. But overheating problems on CPU chips have changed everything.

“The industry has hit the wall when it comes to increasing clock frequency and power consumption,” says Halfhill. There are some chips edging above 4GHz, “but those are extreme cases,” he says. The mainstream is still below 3GHz. “The main way forward is through multiple processors.”

By adding more cores to the CPU, vendors offer the possibility of higher performance. But realizing higher performance through multiple cores assumes that the software knows about those cores, and will use them to run code segments in parallel.

Even when the software does that, the results are gated by Amdahl's Law. Sometimes called Amdahl's Curse, and named for computer pioneer Gene Amdahl, it lacks the upbeat outlook of Moore's Law. It says that the expected improvement from parallelization is 1 divided by the percentage of the task that cannot be parallelized plus the improved run time of the parallelized segment.

In other words, “It says that the serial portion of a computation limits the total speedup you can get through parallelization,” says Russell Williams, chief architect for Photoshop at Adobe Systems in San Jose, Calif. “If 10% of a computation is serial and can't be parallelized, then even if you have an infinite number of infinitely fast processors, you could only get the computation to run 10 times faster.”

People in the know often refer to Photoshop as a model desktop application in terms of multi-core support and parallelization. Williams says that Photoshop has been supporting multi-processor operations since about 1995, but adds that, even so, much of Photoshop's code is devoted to opening and closing dialog boxes, and therefore is not subject to parallelization.

Lots of algorithms have “significant chunks” of serial code, Williams notes. “People with PhDs have been working on this problem for 20 years — it is not a matter of solving it by sitting at your desk and thinking hard for a few minutes. Typically, the way around Amdahl's Law is to simply find embarrassingly parallel problems, but you can't escape the fact you're limited by the serial portion of your calculations.”

Even with parallelization, Williams explains, performance does not scale linearly — two cores can give nearly 2X acceleration, but four cores gives less than 4X acceleration. This is due to memory bandwidth issues (i.e., the RAM being slower than the processor) and delays imposed by inter-processor communications.

“We can't take advantage of eight cores without improved memory bandwidth, and I know of no application that could take advantage of 16 cores,” he says. “Memory bandwidth is a huge issue because after a while you are just waiting for the memory.” New processors with on-board memory controllers are offering some help, he adds. On-board memory controllers speed up RAM access; however, they also lock the CPU into using a specific type of memory.

When Microsoft first shipped Windows, most programs were still written for DOS, and it was a good 10 years until the industry saw more Windows than DOS software. Similarly, “most of the software on the shelf now is not parallel and some, like word processors, never will be,” says Halfhill.

On the other hand, “We are talking about a similar thing here, but the presence of parallelization APIs in Windows 7 and in the Macintosh Snow Leopard operating systems will speed up the process, and the low-hanging fruit may be done in three to five years,” Halfhill says. Further, not every program needs to be rewritten.

Microsoft's Smith agrees. “Not all software will be converted in five years, but we will have made significant progress. This is a more profound change than has ever been seen before in computing.”

Microsoft's current desktop operating systems, Windows XP and Vista, “like most other systems,” use the kernel to schedule threads on the multiple cores of the system, Smith explains. A thread is a code segment that the computer will execute entirely before executing another thread, which may be from another application entirely.

“When a thread needs to wait for something, like I/O or another thread's output, the kernel runs some other ready-to-go thread on the freed-up core,” Smith explains. “When the first thread's wait is over and it becomes eligible to run again, it will eventually get a core assigned to it.”

In general, consumer operating systems “don't do anything very smart” with multiple cores, says Jim Turley, head analyst with Silicon Insider, a consulting service and newsletter in Pacific Grove, Calif. Vista is “reasonably aware” of multiple cores, and is “fairly smart about dividing up background tasks and foreground tasks.” Vista can run games on one or two cores while housekeeping tasks run the other cores.

Rob Enderle, principal at the Enderle Group in San Jose, Calif., says that Windows 7 does an even better job of it. “Windows 7 is designed to use as many cores as the machine has, and will partition an application among the multiple cores — but that does not give as much benefit as if the application used the cores directly.”

Windows 7 has an alternative mechanism called User Mode Scheduling (UMS), which lets thread multiplexing onto cores take place within the application itself instead of in the kernel. Multiplexing of threads is the process of deciding which thread is executed next. Handling this multiplexing within the application instead of in the operating system kernel “makes thread scheduling more efficient,” Smith says.

A Microsoft blog link he supplied indicates that programmer access to UMS is possible through Visual Studio 2010, currently in beta, and involves use of the operating system's Concurrency Runtime facility. Windows 7 will also be able to use 256 cores, arranged in four groups of 64.

Meanwhile, most applications will run on only one core, “so you get the benefit of having multiple cores only when running multiple applications,” Enderle says. Virus checkers and utilities that run in the background “tend to not visibly drag down your machine, whereas on a single-core processor they definitely do,” he says. Two cores seem to be optimum and a third “gives you headroom.” When watching the performance meter in Windows “you can light up two cores really easily, three occasionally and four hardly ever. Four cores are for video games, heavily threaded applications or DNA analysis.”

Some Intel processors additionally offer a form of on-chip dual processing, called Hyper-Threading Technology, where each core can run two threads in parallel, so the software sees twice as many cores as there really are. It is not as good as having two separate cores, and the boost you get varies greatly, but most people get a 20% to 30% boost through Hyper-Threading, according to George Alfs, an Intel spokesman.

Enderle notes that the Windows performance meter displays each core with Hyper-Threading as if it were two cores.

Re-coding

“What we'd all like is a magic compiler that takes yesterday's source code and spreads it across multiple cores, and that is just not happening,” Turley says. “There are C compilers that make a modest dent, but a lot of research indicates that C will never take you very far since the fundamental problem is C itself — it is inherently serial. There is no easy way to program in parallel; it's like writing poetry in Klingon.”

Turley says that the world does not need yet another programming language. “Any third-year student worth his salt has invented one, but the trouble is getting people to adopt it — no one wants to learn a new language.” Since there are so many alternative approaches, “no one wants to commit,” he says. If some authority would declare for one approach people would rally around it, but in the meantime there is widespread confusion and competing claims. “We may have to wait for the current generation of programmers to die off and be replaced by programmers brought up on a new paradigm,” Turley laments.

The easiest way to add parallelism is to call code that is already parallelized, from a library, says Williams at Adobe. The next easiest is to use bottleneck routines, or separate little routines that only know about specific pixels. “That is the way we did it for a long time,” he says. A third way is to write a parallel version of a complicated algorithm. But “that can easily take twice as much work [as writing a non-parallel version]. We're not talking about 10% more work here.”

A fourth approach is functional parallelism, “where you let the user do different things simultaneously, such as getting thumbnail images while changing meta-images,” Williams explains. “Photoshop was written before system software supported that, so we don't do a lot of that. Modern operating system facilities let you do functional threading without a huge amount of effort — maybe 50% more — but converting a large algorithm written before such stuff was available is a big effort,” he says.

What is needed is not more code but different code — and a different way to organize the application, adds Smith at Microsoft. “You must understand parallelism and that is not always obvious.”

A first step is to minimize the use of variables. “Variables are artifacts of sequential execution,” Smith says. “If it is always true that A+B=C, what if someone gets in the middle of that and adds something to B so that the equation no longer holds true? You must have a consistent state where that is prevented.” Traditionally this prevention is done by locking the variables, but he advocates the use of transactional memory, which does much the same thing automatically by isolating the variables from other code that is running at the same time.

Hefring Marine launches new app for comprehensive fleet management

MoIAT, e& ink MoU to empower ICV-certified small and medium-sized enterprises

TAMM wins best E-Government project at United Nations-backed WSIS Prizes 2025

Progressive regulation and zero tax policy drive UAE’s $34 billion crypto boom

Emirati entrepreneurs learn, sell, and grow in a digital world, says new GoDaddy data

MFTA launches Saudi Chapter, co-chaired by Mona Alsemayen and Sophie Guibaud

Cluster 2 signs agreement to advance smart airport operations in Saudi Arabia

Nokia drives cloud-native, AI-driven, secure networks for hyperconnected world

Belkin unveils new gaming portfolio featuring power-packed charging accessories, gaming essentials

Smart security adoption rises in Saudi homes with a digital-first approach

Microsoft AI Tour showcases groundbreaking AI innovations for Oman

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

KROHNE delivers insights to inspire the next generation of engineers in Oman

Oracle supports major project to accelerate Oman digital economy

Ooredoo accelerates cybersecurity in Oman with new deal

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

BDB launches “tijara” platform for SMEs

Bahrain achieves full nationwide 5G coverage

Batelco, SonicWall launch integrated security solutions for SMEs in Bahrain

Bahrain to offer COVID-19 test results on WhatsApp, Facebook Messenger

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Infopercept opens its first Middle East office in Kuwait

Microsoft Compliance Manager now available in Kuwait

Commercial Bank of Kuwait gets mobile payments moving with Thales Digital Solutions

Ooredoo chooses Fortinet to deliver secure SD-WAN managed services in Kuwait

Dubai’s Omining unveils first African site in Kenya’s Special Economic Zone

Rise of Fearless unites 2,500+ gamers through African heritage, battle royale

Rise of Fearless launches $700K investment round to advance Web3 mobile gaming in Africa

e& enterprise and RAIN Technology to revolutionise Operating Room efficiency in hospitals across MEA

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

NTT DATA launches AI powered software defined infrastructure services for Cisco

Belkin unveils new gaming portfolio featuring power-packed charging accessories, gaming essentials

TwitchCon 10th anniversary brings new products and language expansion

Dynatrace drives real-time AI governance, data sovereignty in enterprise landscape

UAE takes lead in AI-driven digital transformation with Dynatrace’s Observability vision

EU expresses interest in developing AI gigafactories

UK operators seek to connect rural areas

U.S proposes ban on Chinese AI models

China responds to Taiwan’s tech blockade

U.S. on alert for Iranian cyberattacks

American University of Sharjah, Ghaf Labs partner to boost student industry exposure

emt, DESC to deliver Cyberspace Leaders Program 2025

MBZUAI’s MAILIS, AD Gaming to spotlight AI’s role in future of game development

ASUS examines the use of AI in Education at ‘The Tech Social’ Event

UAE sets pace in GenAI-powered upskilling and inclusive digital transformation

Amazon enters nuclear energy partnership to power data centres

DOE inks agreement with Presight, AIQ for AI solutions and digital transformation

Solis poised to transform Dubai’s skyline and deserts into beacons of sustainability

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Huawei launches ground-breaking solar inverter at World Future Energy Summit

American University of Sharjah, Ghaf Labs partner to boost student industry exposure

American based insurance giant suffers cyber breach

Qi, K2 Integrity join forces to align Iraq’s financial sector with global standards

ruya unveils AI-generated brand film: “You’ve Got Better Things to Do”

MENA Fintech Association launches Türkiye Chapter in collaboration with Insha Ventures

MBZUAI’s MAILIS, AD Gaming to spotlight AI’s role in future of game development

China introduces stricter online control with internet ID

OpenAI enters into lucrative deal with U.S. government

Trump launches smartphone mobile service

Trump-Musk feud leads to reevaluation of SpaceX contracts

Aster Clinics introduce Smyl AI – UAE’s first AI dental tool

Genomics company fined over data breach

SandboxAQ improves drug discovery with data creation

Aster DM Healthcare recognised for Workplace quality

DHA to leverage AI-powered ‘Genesys’ system in contact centre services

DLD boosts transparency with AI-enabled real estate advertising governance

MBRHE and Beyond Limits AI MoU to enhance digital transformation

Huspy launches GCC’s first AI-powered mortgage chatbot to transform home financing

DLD, VARA collaborate to boost leadership in realty and virtual assets regulation

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Global second-hand smartphone market sees annual drop

Hushday enters UAE market with private luxury sales and steep discounts

Jacky’s Business Solutions unveils Agentic AI offering to accelerate GCC’s autonomous business future

Skills gap, data hurdles, and ethics key to unlocking AI in GCC retail, says Al-Futtaim

ASUS unveils latest ExpertBook P1 models

Google launches Flow & Gemini’s photo-to-video capability in Middle East North Africa

Hytera, WAFA ink contract to transform UAE’s energy sector communications

NTT DATA appoints director of Smart Manufacturing and Industry for MEA

du, Huawei renew partnership to accelerate Emiratisation and digital talent development

Emirates, Crypto.com ink MoU to integrate Crypto.com Pay as payment option