summaryrefslogtreecommitdiffstats
path: root/docbook/results.docbook
blob: f14ba6ca045d77801e61c16847fa58d16faebd94 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.0//EN" [
  <!ENTITY c700 SYSTEM "results-c700.docbook">
  <!ENTITY ipaq SYSTEM "results-ipaq.docbook">
  <!ENTITY z5000d SYSTEM "results-z5000d.docbook">
  <!ENTITY j720 SYSTEM "results-j720.docbook">
]>
<article>
<artheader>
<title>Fullscreen</title>
<author><firstname>Matthew</firstname>
<surname>Allum</surname>
<email>matthew@openedhand.com</email>
</author>
<copyright>
      <year>2003</year>
      <holder>Openedhand Ltd</holder>
    </copyright>
</artheader>
<section><title>Introduction</title>
<para>

This report aims to benchmark fullscreen 'blits' on handheld ARM based
devices, Specifically for the evaluation of the suitability for 
fullscreen 2D games and video playback.

There are numerous available handheld ARM computers capable of running
a Linux kernel and implementing a supported graphical display.

</para>

<para>

Graphics output is assumed to be by means of writing data to a 'dumb' kernel
framebuffer device via direct means or an XServer. 

</para>

</section>
<section><title>Tests</title>
<para>

For the tests three simple test programs were created. They are
written in C and the same binaries of each program used on each
platform. The source of the tests is <link url="tests/">available</a>.

Each performs a timed single 'full' blit or a fullscreen blit made up
of multiple 1 pixel rows or columns. Also if supported by the output
mechanism ( eg XServer with XRandR ) blits are performed at different
screen orientations. The blit method is then repeated 100 times and
timings calculated.

All of which exhibit similar behavior and code structure but output
to to the display by different means. 

<variablelist>

<varlistentry>
<term>test-fb</term>
<listitem>
Performs a blit by writing directly to the framebuffer device by means of a 
memcopy. 

</listitem>
</varlistentry>

<varlistentry>
<term>test-x</term>
<listitem>

Performs blits under X windows via client side
XImages. Once created, an XImage is transfered to the X11 server for display
by either local sockets or shared memory ( MIT-SHM extension ).

</listitem>
</varlistentry>

<varlistentry>
<term>test-sdl</term>
<listitem>
<para>

The 'Simple Direct MediaLayer' ( SDL ) is a multi-platform graphics
library that abstracts the mechanics of communication with a
particular device. For example the same binary will run both with X
and the framebuffer.

</para>
<para>

A SDL 1.2 ARM Binary from Debian Sid was used.

</para>

</listitem>
</varlistentry>

</variablelist>
<para>

Each test was run with various parameters using this <ulink
url="scripts/run-fb-tests.sh">script</ulink>. The script runs each test 10 times and averages
results.

</para>
<para>

There is a freely available popular X benchmark tool - 'X11perf'. This
could have been used rather than test-x, but it exhibits problems
running on a small display. However results from X11Perf and task-x
on a larger display were comparably similiar.

</section>
<section>
<title>Test Platforms</title>

The tests were run on the following platforms;

<variablelist>

<varlistentry>
<term>HP/Compaq Ipaq 3870</term>
<listitem>
<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
CPU: SA1110 StrongARM 206Mhz
</listiem>
<listitem>
RAM: 64MB
</listitem>
<listitem>
Display: 240x320x16 LCD
</listitem>
<listitem>
  
GFX Chip: Na
</listitem>
<listitem>

X11: XFree86 v4.3, kdrive Xfbdev server
</listitem>
<listitem>

kernel: 2.4.19-rmk6-pxa1-hh13
</listitem>

</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>

<varlistentry>
<term>Sharp Zaurus 5000d</term>
<listitem>
<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
CPU: SA1110 StrongARM 206Mhz
</listiem>
<listitem>
RAM: 16MB
</listitem>
<listitem>
Display: 240x320x16 LCD
</listitem>
<listitem>
  
GFX Chip: Na
</listitem>
<listitem>

X11: XFree86 v4.3, kdrive Xfbdev server
</listitem>
<listitem>

kernel: 2.4.6-rmk1-np2-embeddix
</listitem>

</listitem>
</itemizedlist>
</para>

</listitem>
</varlistentry>

<varlistentry>
<term>Sharp Zaurus C700</term>
<listitem>

<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
CPU: PXA 250 XSCALE ARM 400Mhz
</listiem>
<listitem>
RAM: 32MB
</listitem>
<listitem>
Display: 640x480x16 LCD
</listitem>
<listitem>
  GFX Chip: ATI IMAGEON 100
</listitem>
<listitem>

X11: XFree86 v4.3, kdrive Xfbdev server
</listitem>
<listitem>

kernel: 2.4.18-rmk1-embedix
</listitem>

</listitem>
</itemizedlist>
</para>

</listitem>
</varlistentry>

<varlistentry>
<term>HP Jornada 720</term>
<listitem>

<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
CPU: SA1110 StrongARM 206Mhz
</listiem>
<listitem>
RAM: 32MB
</listitem>
<listitem>
Display: 640x240x16 LCD
</listitem>
<listitem>
  
GFX Chip: Epson SED1356
</listitem>
<listitem>

X11: XFree86 v4.3, kdrive Xfbdev server
</listitem>
<listitem>

kernel: 2.4.19-rmk6-pxa1-hh3-j720
</listitem>

</listitem>
</itemizedlist>
</para>

</listitem>
</varlistentry>

</variablelist>
<para>
All devices have a bus speed of approximate 100Mhz. 
</para>
<section><title>XServer Notes</title>

<para>

All machines have the same XServer and X library binarys. Both of
which are from the XFree86 4.3 Distubution. The Xserver used is a
kdrive Xfbdev server, more applicable to embedded type platforms. The
XFree core server, used on many desktop machines, has a larger
and more complex driver interface.

</para>

<para>

kdrive Xfbdev is non accelerated and simply writes to the framebuffer
device.  Kdrive supports on the fly screen rotation via the XRandR
extension, but lacks support for the DGA extension which is an older
XFree extension that allows for direct fullscreen framebuffer access.

</para>

<para>

The kdrive server was compiled to support tslib, a touchscreen
abstraction library and patched to improve VT switching. Binary
packages, the build config and the applied patches are available from
<ulink
url="http://handhelds.org/~mallum/xpkgs">http://handhelds.org/~mallum/xpkgs</ulink>

</para>
</section>

</section>

</para>

</section>
<section><title>Benchmark Results</title>
<para>
&ipaq
&z5000d
&c700
&j720

</para>

</section>

<section><title>Discussion</title>

<section><title>Devices</title>

<para>

The tests produced some very interesting and varied results. There are
very many factors that effect performance including BUS Speeds, CPU
speed and features, fb driver implemtation  etc and it would seem the
results reflect this between devices.

</para>
<para>

The Ipaq and Zaurus devices are very similar pieces of hardware and
therefore produce very similar results. The C700 results are 
dissapointingly poor. The ATI chipset inside has a very impressive
feature set for acceleration but its seems the specifications for this
are not freely available to take advantage of. The GPL'd framebuffer
driver for the chipset ( as used ) seems very limited.

</para>
<para>

Linux support for the Jordana 720 is very immature, so I am untrusting
of the Jordana's results, Im sure they could be much more optimized. 

</para>


</section>

<section><title>Access Methods</title>

<para>

As what would be expected Direct framebuffer access is
fastest. However Direct framebuffer write access should theoretically
be faster than what the results have indicate, when compared to
device bus speeds.

This could likely be due to a poor framebuffer driver or not quite
optimal gcc code generation. It has been demonstrated to me that
taking advantage of XSCALE CPU extensions and replacing the mempy with
a custom assembler implementation can improve speeds by quite a large
margin.

</para>
<para>

There will always be some overhead executing blits with an XServer (
input device management, context switching etc ). However running with
an XServer does come with much more infrastructure than just a raw
framebuffer and it should be noted that X is still responsive with the
test running.

</para>
<para>

As expected Client to Server Image transfer over the MIT-SHM is on
average approximately 30% faster than a 'plain' image transfer over
Sockets. MIT-SHM is mature stable extension with
no known serious problems and it is used extensively by numerous
toolkits and applications.

</para>
<para>

MIT-SHM blit speeds under X can get very close to that of raw
framebuffer access ( See C700 and J720 results ) and this should at
least in theory be the case. 

However we see with both the Ipaq and Z5000d that X at best lags by
approximately 30% compared to that of raw framebuffer access. Because
of all the factors involved its difficult to suggest a reason for
this, however the author of kdrive has been informed of the results
and is looking into possible causes.

</para>
<para>

Running the tests on my laptop ( power PC 550Mhz, radeon FB, 'Big'
XFree with radeon driver ) I get equal speeds of approximately 50Mb/sec
across raw FB, X MIT-SHM and DGA. 

</para>
<para>

It should be noted my SDL knowledge is very little. I had some
problems with the SDL binaries used , notably they did not seem to be
able to detect the full display size correctly and I had to manually
pass dimensions to the executable.  

</para>

</section>

<section><title>Window / Application changing</title>

<para>

Apart from performance, one should also consider the various effects
each method has on interactivity with the user. X provides a rich set
of features for building GUI's whilst a raw framebuffer provides very
little. Using something like SDL running on the framebuffer is a
possible improvement to this situation.

</para>

<para>

Running an application as a fullscreen window in X in a window managed
environment shouldn't be a problem with a modern window
manager. Switching between the fullscreen application and a normal
application should be very strait forward. The window manager also
should be able to provide features such as toggling the fullscreen mode of
the application.

</para>
<para>

With the fullscreen application running directly on the fb, it
requires a 'VT Switch' for the X Server to free up the framebuffer
device. Once switched the fullscreen application is outside of both
the X Server and window managers control, which will lose flexibility. 

With my referenced patches , it should be noted that switching VT's
from X to a raw framebuffer and back seems quite stable on the tested devices.

</para>
<para>

There are other possible problems that should be considered if the
platform uses 2 different methods for rendering general
applications/GUI enviroment and for rendering software such as games
and video playback. One is constancy of look and feel and another is
how to interactions between the two environments - for example if the
user is playing a game, how is he notified the battery is running low?
Working round these problem cause code complexity and may limit
overall functionality.

</para>
<para>

Using the DGA extension would have similar problems to a VT switch, As
it seems to write fullscreen out of the windows managers control and
regular windows cannot be displayed above. 

</para>

</section>

<section><title>Network Transparency and Screen Rotation</title>

<para>

A unique feature of X Windows is its ability to run transparently
across a net work meaning the X test can be run on one machine but
output to the display of another. Toolkits like GTK2.2 and
applications like Emacs can take advantage of this to transfer running
application windows between displays.

</para>

<para>

However only a plain image transfer will work - there is no shared
memory across a network. The blitspeeds here are comparable to the
available network bandwidth with something like a wireless connection
quickly becoming saturated. Also it should be noted by default X does
no compression of XImages transfered across the network.

</para>
<para>

Direct blits to the frame buffer are incapable of network transparency. 

</para>
<para>

Also mentioned has been X Windows ability to rotate the display
on the fly, via the XRandR extension. Whilst this kind of effect would
be possible with the framebuffer, It would have to be done by the
application writing to the framebuffer or by 'physically' rotating the
framebuffer in the kernel driver.

</para>

<para>

XRandR ( rotated ) effects results considerably. Blitspeeds can drop
to one fifth of the maximum in certain orientations. With a rotated
display, X writes via an internal 'shadow framebuffer' to the real
framebuffer rather than directly. It is this which causes the
slowdown. 

</para>
<para>

The Natural orientation of the framebuffer is unlikely to be the
physical default orientation of the display. In these tests only the
j720 has this setup. This further frustrates the xrandr problem where
by X11 will be running at less than optimal speeds in the most
regularly used orientation.

</para>
<para>

It has been suggested a simple solution to this is to take advantage
of newer PDA graphics chipsets ability to rotate the framebuffer in
hardware. Though this is not a solution for platforms lacking a
dedicated graphics chipset, solving the problem there would require
complex changes to X's code base I believe.

</para>

</section>

<section><title>Improvements and Future Directions</title>

<para>

Some ideas for future tests.

</para>
<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
Tests with newer PXA255 XSCALE based platforms such the the Sharp C750 and Ipaq 54xx series. 
</listiem>
<listitem>
Similar tests with Pocket PC and Qt Embedded Environments.
</listiem>
<listitem>
Network Tests with very fast ( eg Gigabit ) Networking. 
</listiem>
<listitem>
Network Tests with a Compressing Proxy such as NX ( nomachine.org ). 
</listiem>
<listitem>
ASM optimised test scripts and results. 
</listitem>
<listitem>
Porting DGA to kdrive and testing.
</listitem>
</itemizedlist>
</para>

</section>


</section>
<section><title>References</title>
<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
<ulink url="sources/">Test Source Code</ulink>
</listitem>
<listitem>
<ulink url="http://handhelds.org/~mallum/xpkgs/">http://handhelds.org/~mallum/xpkgs/</ulink> ARM Binary X packaged used for tests
</listitem>
<listitem>
<ulink url="http://xfree86.org">http://xfree86.org</ulink> - X servers, x11perf source etc.
</listiem>
<listitem>
<ulink url="http://www.sdl.org">http://www.sdl.org</ulink> SDL library
</listitem>
<listitem>
<ulink url="http://nomachine.org">nomachine.org</ulink> NX Compressing Proxy for X11.
</listitem>
</itemizedlist>

</para>

</section>

</article>