Nvidia touts power, efficiency of 200M GPUs
While the GeForce GTX 280M and GTX 260M remain Nvidia's top dogs, you're now going to find a whole bunch of flavors filling out the line. They range from the GTS 260M (with 1GB of GDDR5 RAM) down to the G210M (with 512MB of GDDR3 RAM). These new 40nm chips support DirectX 10.1 and run CUDA applications, and all except the G210M offer built-in Nvidia PhysX tech for GPU-bound physics calculations. For a full breakdown of what these new GPUs offer, look below.
But rather than get razzle-dazzled by the spokespeople, let's cut to the chase. These GPUs, certainly the lower-powered ones, could make a big difference in cracking the mainstream market. That is, we're already hearing about Ion-based notebooks coming out later this summer, but how about something with a little more oomph? You could soon see an affordable all-purpose laptop that's capable of running an HD movie without breaking a sweat. And, yeah, playing more than solitaire or a game from 16 years ago (not that there's anything wrong with X-Com) wouldn't hurt, either.
Who will provide those laptops down the line, and how much will they cost? Expect the usual suspects--Nvidia claims "100 new design wins." Matt Wuebbling, senior product manager for notebooks GPUs at Nvidia, says that we should "imagine a notebook in the $600-to-$700 price range six months ago with a discrete GPU [the G210M] that now has twice the graphics power versus the G110M." But the price, obviously, is up to the laptop maker. Wuebbling expects to see notebooks showing up as early as July; Asus and Acer are confirmed to be the first players bringing the 200M series to market in new notebooks. While I can't extol the performance virtues of these chips just yet, I can at least eyeball the specs below and look forward to kicking the tires on a couple of 200M-fueled laptops soon enough.
By the Numbers
GTS 260M396 gigaflops; 96 processor cores; 550MHz graphics clock; 1375MHz processor clock; 1800MHz; 1GB GDDR5 RAM; 128-bit memory width; 38-watt TDP
GTS 250M360 gigaflops; 96 processor cores; 500MHz graphics clock; 1250MHz processor clock; 1600MHz; 1GB GDDR5 RAM; 128-bit memory width; 28-watt TDP
GT 240M174 gigaflops; 48 processor cores; 550MHz graphics clock; 1210MHz processor clock; 800MHz; 1GB GDDR3 RAM; 128-bit memory width; 23-watt TDP
GT 230M158 gigaflops; 48 processor cores; 500MHz graphics clock; 1100MHz processor clock; 800MHz; 1GB GDDR3 RAM; 128-bit memory width; 23-watt TDP
GT 210M72 gigaflops; 16 processor cores; 625MHz graphics clock; 1500MHz processor clock; 800MHz; 512MB GDDR3 RAM; 64-bit memory width; 14-watt TDP
12:32 AM | 0 Comments
AMD to drop clock speed in 12-core chips
The company's upcoming 12-core server chips, code-named Magny-Cours, put two six-core chips in one package. The same silicon is used in its six-core chips, code-named Istanbul, which are part of the Opteron line of server processors. AMD designed Magny-Cours chips to draw the same power as Istanbul chips, said Pat Conway, a member of AMD's technical staff, in a presentation at the Hot Chips conference at Stanford University.
Responding to an audience question about how Magny-Cours, with two chips, will use the same power as one Istanbul chip, Conway said that AMD is reducing the clock speeds of the Magny-Cours and added that power management features are being added.
However, Conway declined to comment on potential clock speeds of 12-core chips in response to a question. "That's a detail we're going to save for the product launch," Conway said. The chips are aimed at servers and are due out in the first quarter of 2010.
Chip makers like Intel and AMD reverted to adding cores to boost chip performance earlier in the decade, as cranking up clock speed led to excessive heat dissipation and power consumption.
Even though the clock frequencies will fall, Magny-Cours chips will pack more performance compared to existing Opteron chips, Conway said. The larger cache and increased cores will make servers faster, Conway said. For example, a server will be able to execute tasks faster in virtualized environments with a larger number of cores, enabling servers to host a larger number of virtual machines.
Conway also talked about finer details in the Magny-Cours chip. Two six-core chips are connected by four hyperthreaded interconnects and are targeted at two- and four-socket servers, Conway said. It includes a total of 12MB of L3 cache, with each core supporting 512KB of L2 cache. The chips will be manufactured by AMD's spinoff, GlobalFoundries, using existing 45-nanometer technology.
AMD is also working on a new x86 chip architecture code-named Bulldozer. The architecture will be used in chips manufactured using the 32-nm process in 2011. The company has scheduled a 16-core chip code-named Interlagos for release in 2011.
12:31 AM | 0 Comments
Intel sees PCs spreading, becoming more desirable in future
In the same way telephones moved from one per house to one per room to one for each person, PCs are also becoming personalized devices, said Mooly Eden, vice president and general manager, of Intel's Mobile Platforms Group, at a meeting in Bangalore.
"It could be a notebook or a netbook or a mobile Internet device (MID)," Eden said.
Selling PCs has become a "consumer game", focused as much on the elegance and sleekness of the device, as its performance and other specifications, Eden said.
In most markets, affordability is not an issue for marketers of PCs, but the "desirability" of the computers, Eden said. To become desirable to a large segment of potential users who can afford PCs, the devices have to have content, user interface, and applications that they can relate to, he added.
This optimistic view of the market may hold true for mature markets, but not in emerging markets like India where 10,000 rupees (US$200) is seen as the magic price level for a reasonably configured PC to take off in large volumes.
Netbook prices are likely to come down because of economies of scale, said Eden, but he was not willing to forecast when the price would be below US$200.
Eden expects that demand in emerging markets could get a push through subsidies, for example if telecommunications service providers offer netbooks at a discount or free as part of a service plan.
Intel originally thought demand for netbooks powered by its Atom processor would first ramp up in emerging markets, because of the low-cost of these devices, Eden said. Currently about 85 percent of the sales of netbooks are in mature markets, Eden said.
A lot of netbook customers in these markets want to go beyond basic browsing and communications to applications like storing movies, which led vendors to include large hard-disk drives, and in some cases Microsoft's Windows operating system, Eden said.
To cut netbook prices, vendors in emerging markets can, for example, use the Moblin Linux operating system, backed by Intel, which is good enough for basic applications like browsing and communications, he added.
12:30 AM | 0 Comments
IBM zooms into molecule for power-efficient chip research
The image maps the anatomy of a molecule at an atomic scale, which could help researchers understand and manipulate molecules and atoms in chips.
"Basically it's a pioneering science achievement that helps open up exciting new possibilities for exploring electronic building blocks and devices at the ultimate atomic and molecular scale -- devices that might be vastly smaller, faster and more energy-efficient than today's processors and memory devices," said an IBM spokeswoman in an e-mail.
For decades chip makers have been etching smaller patterns on chip surfaces to speed up performance and reduce power consumption. But as chips get smaller, the assembly and fabrication of chips becomes far more difficult and expensive. Many experiments in IBM's nanotechnology research initiative focus on technology that could make chips smaller, faster and more power-efficient in the future.
In the experiment, IBM scientists were able to map the chemical structure of a pentacene molecule using atomic force microscopy (AFM). The probe microscope was able to provide scientists with an atomic-level view of the molecule as viewed in chemistry textbooks, which could be an important step in understanding future chip structures. Pentacene molecules could be deposited in transistors that are used in semiconductors, according to IBM researchers.
The role of individual molecules is still not fully known when it comes to developing chips, said Gerhard Meyer, a scientist at IBM Research in Zurich, in an e-mail. There are examples of using a single molecule as a memory element, and there is a larger question surrounding how a large number of molecules contact and connect to form molecular networks.
With this development, IBM is making progress in studying the transport of individual electrons in molecules or across molecular networks. The study also helps better understand how a charge distribution occurs across the molecular networks, which is an essential element in building smaller and more power-efficient chips.
"It will, for example, help us to understand how the molecular geometry changes if we change the charge of the molecule," Meyer said. "What we want is an atomic/molecular level understanding of these processes."
Fully understanding molecules for basic chip research could be 10 to 20 years away, Meyer said.
Earlier this month, IBM made a research breakthrough when researchers said they were experimenting with the use of DNA -- one of the body's building blocks -- as a way to create tiny circuits that could form the basis of smaller, more powerful computer chips.
12:29 AM | 0 Comments
Parallelism needs killer app for mass adoption
Most software today is still being written for sequential execution, and programming models need to change to take advantage of faster hardware and an increasing number of cores on chips, panelists said. Programmers need to write code in a way that enables tasks to be divided up and executed simultaneously across multiple cores and threads.
A lot of focus and money have gone into building fast machines and better programming languages, said David Patterson, a computer science professor at the University of California, Berkeley, at the conference in Stanford on Monday. Comparatively little attention has been paid to writing desktop programs in parallel, but applications such as gaming and music could change that. Users of such programs demand the best real-time performance, so programmers may have to adopt models that break up tasks over multiple threads and cores.
For example, novel forms of parallelism could improve the quality of music played back on PCs and smartphones, Patterson said. Code that does a better job of separating channels and instruments could ultimately generate sound through parallel interaction.
UC-Berkeley has a parallel computing lab where researchers are trying to understand how applications are used, which could help optimize code for handheld devices. One project aims to bring desktop-quality browsing to handheld devices by optimizing code based on specific tasks like rendering and parsing of pages. Another project involves optimizing code for faster retrieval of health information. The lab is funded primarily by Intel and Microsoft.
Berkeley researchers are trying to bring in parallelism by replacing bits of code originally written using scripting languages like Python and Ruby on Rails with new low-level C code. The new code specifically focuses on particular tasks like analyzing a specific voice pattern in a speech recognition application, Patterson said in an interview Wednesday. The code is written using OpenMP or MPI, application programming interfaces designed to write machine-level parallel applications.
Experts are needed to write this highly specialized parallel code, Patterson said. It reduces development time for programmers who would otherwise use Python and Ruby on Rails, which make application development easier, but do not focus on parallelism, Patterson said in the interview. The lab has shown specific task execution jump by a factor of 20 with the low-level machine code.
The concept of parallelism is not new, and has been mostly the domain of high-performance computing. Low levels of parallelism have always been possible, but programmers have faced a daunting task with a lack of software tools and ever-changing hardware environments.
"Threads have to synchronize correctly," said Christos Kozyrakis, a professor of electrical engineering and computer science at Stanford University, during a presentation prior to the panel discussion. Code needs to be written in a form that behaves predictably and scales as more cores become available.
Compilers also need to be made smarter and be perceptive enough to break up threads on time so outputs are received in a correct sequence, Kozyrakis said. Faulty attempts to build parallelism into code could create buggy software if specific calculations are not executed in a certain order. That is a problem commonly referred to as race conditions. Coders may also need to learn how to use multiple programming tools to achieve finer levels of parallelism, panelists said.
"There's no lazy-boy approach to programming," Patterson said at the conference.
Memory and network latency have created bottlenecks in data throughput, which could negate the performance achieved by parallel task execution. There are also different programming tools for different architectures, which make it difficult to take advantage of all the hardware available.
Many parallelism tools available today are designed to harness the parallel processing capabilities of CPUs and graphics processing units to improve system performance. Apple, Intel, Nvidia and Advanced Micro Devices are among the companies promoting OpenCL, a parallel programming environment that will be supported in Apple's upcoming Mac OS X 10.6 operating system, also called Snow Leopard, which is due for release Friday. OpenCL competes with Microsoft, which is promoting its proprietary DirectX parallel programming tools, and Nvidia, which offers the CUDA framework.
OpenCL includes a C-like programming language with APIs (application programming interfaces) to manage distribution of kernels across hardware such as processor cores and other resources. OpenCL could help Mac OS decode video faster by distributing pixel processing across multiple CPU and graphics processing units in a system.
All the existing tools are geared toward different software environments and take advantage of different resources, Patterson said. OpenCL, for example, is geared more toward execution of tasks on GPUs. Proprietary models like DirectX are hard to deploy across heterogeneous computing environments, while some models like OpenCL adapt to only specific environments that rely on GPUs.
"I don't think [OpenCL] is going to be embraced across all architectures." Patterson said. "We need in the meantime to be trying other things," like trying to improve on the programming models with commonly used development tools, like Ruby on Rails, he said.
While audience members pointed out that parallelism has been a problem for decades, the panelists said that universities are now taking a fresh approach to working on multiple programming tools to enable parallelism. After years of funding chip development, the government is also paying more attention to parallel processing by funding related programs.
Kozyrakis said Stanford has established a lab that aims to "make parallel application development practical for [the] masses," by 2012. The researchers are working with companies like Intel, AMD, IBM, Sun, Hewlett-Packard and Nvidia.
An immediate task test for developers could be to try to convert existing legacy code in parallel for execution on modern chips, Berkeley's Patterson said. A couple of companies are offering automatic parallelization, but rewriting and compiling the legacy code originally written for sequential execution could be a big challenge.
"There's money to be made in those areas," Patterson said.
12:28 AM | 0 Comments
