ad info

 
CNN.com  technology > computing
    Editions | myCNN | Video | Audio | Headline News Brief | Feedback  

 

  Search
 
 

 
TECHNOLOGY
TOP STORIES

Consumer group: Online privacy protections fall short

Guide to a wired Super Bowl

Debate opens on making e-commerce law consistent

(MORE)

TOP STORIES

More than 11,000 killed in India quake

Mideast negotiators want to continue talks after Israeli elections

(MORE)

MARKETS
4:30pm ET, 4/16
144.70
8257.60
3.71
1394.72
10.90
879.91
 


WORLD

U.S.

POLITICS

LAW

ENTERTAINMENT

HEALTH

TRAVEL

FOOD

ARTS & STYLE



(MORE HEADLINES)
*
 
CNN Websites
Networks image


Sun admits to memory problem with 'few dozen' servers

Computerworld

(IDG) -- Problems with a memory component that Sun Microsystems has been quietly trying to fix for the past several months are continuing to plague some large users of Sun's Ultra Enterprise Unix servers. And Sun has gone to extraordinary lengths to keep its customers quiet about the issue.

The problem involves an external memory cache on Sun's UltraSPARC II microprocessor module. Under certain conditions, it has been triggering system failures and frequent server reboots at dozens of customer locations.

Sun Executive Vice President John Shoemaker this week acknowledged that the company has been grappling with memory-related problems on "a few dozen" of its Ultra Enterprise servers for nearly a year.

Sun customers who have been affected by the problem are unwilling to speak openly about it because Sun has persuaded many of them to sign nondisclosure agreements, said Tom Henkel, an analyst at Gartner Group Inc. in Stamford, Conn.

MORE COMPUTING INTELLIGENCE
IDG.net   IDG.net home page
  Computerworld's home page
  Four servers for Web fail briefly
  Debugging server problems
  Sprint PCS 'overhyped' fast wireless
  Reviews & in-depth info at IDG.net
  E-BusinessWorld
  TechInformer
  Questions about computers? Let IDG.net's editors help you
  Subscribe to IDG.net's free daily newsletter for IT leaders
  Search IDG.net in 12 languages
  News Radio
  * Fusion audio primers
  * Computerworld Minute

The nondisclosure agreements were apparently offered with a claim that signing them would bolster Sun's commitment to resolving the problem quickly, Henkel said. Sun customers began reporting the problem as long as 18 months ago, he said.

Shoemaker this week acknowledged that it may have been a bad idea for Sun to get its users to sign nondisclosure agreements. But he said the company took that measure only because Sun itself was struggling to pinpoint a reason for the system failures. He added that Sun has stopped requiring such agreements.

The long-standing nature of the problem and Sun's handling of the issue raise troubling questions about the quality of Sun's hardware and support, Henkel said.

One high-profile customer that has had very public problems with Sun hardware is eBay Inc. The online auctioneer has suffered a series of hardware-related outages over the past year, including one this week. It is unclear whether eBay's problems are related to the memory issue, however.

Gartner plans soon to release an advisory on the memory component issue, updating one released in November, because of continued and "frequent client complaints of persistent downtime" caused by the problem.

Sun insisted this week that the problem hasn't caused any data loss for customers. But the frequency of reboots disrupts availability and can cause data loss if applications don't restart properly, users said.

In the past year, Henkel said, he has talked with at least 50 Sun customers who complained of hardware reliability issues caused by defective memory. Systems affected by the problem appear to be those based on 400-MHz UltraSPARC-II CPU modules using either a 4MB or 8MB cache.

"There are a lot of very unhappy campers out there," Henkel said. "Sun has been experimenting for too long now to find a solution to this problem."

Meta Group Inc. in Stamford, Conn., also has clients that have experienced the problem.

"There was a rash of reliability issues relating to this problem in the March-to-April time frame," though none since then, said Meta Group analyst Brian Richardson. Eight out of 20 of Meta's large Sun accounts reported the problem, Richardson said.

According to Shoemaker, the issue has triggered a massive overhaul of Sun's quality processes and has already directly resulted in about eight major hardware and software changes being incorporated into Sun's Ultra Enterprise server line.

Sun has also put in place far more rigorous quality and availability testing of its products and is mandating more stringent audits of customer sites, environmental conditions and planned configurations before taking orders on its high-end servers, Shoemaker said.

By year's end, Sun will release a mirrored memory module that should address this issue once and for all, Shoemaker added. In the past several months, Sun has also been in direct contact with the CIOs at several of the affected companies to explain Sun's new quality initiative, he said.

"This has been a watershed event for Sun," Shoemaker said, adding that the company has moved from the back of the class to class leader with respect to quality.

But according to an MIS manager in North Carolina who has experienced the memory problem and who spoke on condition of anonymity, Sun has offered no explanation for the problems. "Sun has not disclosed any information to me about their memory issues - not even a brief description," the manager said.

In the past three months, all of the manager's six Sun servers have crashed because of memory-related problems, he said. In each instance, Sun swapped out entire CPU modules but offered no explanation for doing so, he said.

A user at a Midwestern manufacturing company, who also spoke on condition of anonymity, had a similar experience.

"As soon as we reported the issue to Sun, the affected processors were replaced under service contract," he said. The company was able to resolve the problem by rearranging "our data center with the express purpose of lowering system temperatures," he said. "The systems run 10 to 15 degrees Fahrenheit cooler than before, and we haven't seen a problem since."

According to Shoemaker, Sun hasn't been able to narrow the problem to any one specific cause. Sun believes the problems may have been caused by a combination of factors, including defective components from one of Sun's suppliers, poor packaging of the memory chips on the system boards and environmental factors.




RELATED STORIES:
Big names plan to pre-install Sun's StarOffice
August 14, 2000
Sun, Microsoft tackle security flaws
August 8, 2000
Visa, Sun, Cisco team up on universal commerce
July 20, 2000
Sun opens StarOffice code, delays StarPortal
July 20, 2000
Sun opens up Java specification process -- somewhat
June 5, 2000

RELATED IDG.net STORIES:
Four servers for Web fail briefly
Debugging server problems
Sprint PCS 'overhyped' fast wireless
Intel invests in server communication technology
Another way to boost your Web server's capacity
DynaHelp gives e-commerce sites solid help
Computer operating environments coevolve
Next from Microsoft: Reliability

RELATED SITES:
Sun's UltraSPARC II microprocessor
Gartner Group Inc.
Meta Group Inc.
eBay Inc.

Note: Pages will open in a new browser window
External sites are not endorsed by CNN Interactive.

 Search   

Back to the top   © 2001 Cable News Network. All Rights Reserved.
Terms under which this service is provided to you.
Read our privacy guidelines.