PDA

View Full Version : How to optimize an audio application for PPC & Intel macs


cjed
05-20-2008, 07:18 AM
Hi, there are some links that explain how to optimize an audio application simply for both PowerPC and Intel macs.
Since MacOS 10.4 (Tiger) the vector engine APIs are hardware agnostic (the same code to write for PPC Altivec or Intel SSE). It is called the accelerate framework (it previously only supported PPC Altivec) :
http://developer.apple.com/releasenotes/Performance/RN-vecLib/index.html

"Accelerate.framework is the fundamental support library for SIMD programming (both AltiVec and SSE/SSE2/SSE3/...) on MacOS X. The vecLibTypes.h header (automatically included when you #include <Accelerate/Accelerate.h>) defines unified 128-bit SIMD data types that work for both AtliVec and Intel's vector architecture. Using these types (e.g. vFloat, vSInt32), it is possible to write a single piece of vector code that compiles and runs on both the PowerPC and Intel vector engines on both 32- and 64-bit architectures using GCC."

Among other dedicated APIs (to image processing or huge maths calculations), the Accelerate framwork includes the vDSP unit, that provides fast computations (Fast Fourier Transformation, etc.) aimed at audio applications :

"Digital Signal Processing: vDSP
The vDSP library is focused primarily in the realm of Fourier Transforms, vector-to-scalar, and vector-to-vector operations. The vDSP library has a wide range of applications, including signal processing (audio, digital image, and speech), physics, statistics, and cryptography. The vDSP library can perform both one and two dimensional Fourier transforms. vDSP functions operate on both real and complex data types.
vDSP uses vectorized code to implement functions that operate on single precision data. This code uses AltiVec extensions when a PowerPC G4 or G5 is present, or the SSE extensions when an Intel microprocessor is present. On the PowerPC G3 processor, vDSP uses scalar code...

http://developer.apple.com/documentation/Darwin/Reference/ManPages/man7/SSE3.7.html

The vDSP functions have been implemented in two ways: as vectorized code, using the vector unit on the PowerPC and Intel microprocessors, and as scalar code, which runs on all machines. Vector code often has special alignment restrictions. If your data is not properly aligned it is common for vDSP to use the scalar path as a fallback. For best results, align your data to a multiple of 16 bytes. (Malloc naturally aligns memory blocks that it allocates to 16 bytes on MacOS X.)

It is noteworthy that vDSP's FFTs are one of the fastest implementations of the Discrete Fourier Transforms available anywhere."

http://developer.apple.com/documentation/Performance/Conceptual/PerformanceOverview/BasicTips/chapter_3_section_3.html#//apple_ref/doc/uid/TP40001410-CH204-DontLinkElementID_4


So bus and memory speed considerations appart, EastWest PLAY should perform well on PowerPC. I wonder what specific Intel optimizations have been made ?

cjed
05-20-2008, 08:58 AM
And a very useful thread where an Apple's CoreAudio team engineer explains what led me to think there is probably something wrong in core audio handling in PLAY if audio cuts appear randomly without high CPU load :

http://lists.apple.com/archives/Coreaudio-api//2008/Mar/msg00021.html

An overload, in the context of Core Audio, basically means that the HAL's IO thread failed to meet it's real time deadline...
it is caused by either <the plugin> just trying to do too much work in the allocated time or <the code> is doing something silly like blocking on a mutex or something like that...
The number of CPUs really has no bearing on this phenomenon since there is only one IO thread and it is the amount of time this thread spends running doing the signal processing or whatever that causes overloads.