summaryrefslogtreecommitdiffstats
path: root/docbook/results-2.docbook
blob: 28d36fb33492625410f3e2986586c7b62612df89 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">

<article id="fullscreen2" lang="en">
  <articleinfo>
    <title>Fullscreen 2 ( DRAFT III )</title>
    <author>
      <firstname>Matthew</firstname>
      <surname>Allum</surname>
      <affiliation>
        <orgname>Opened Hand Ltd</orgname>
      </affiliation>
      <email>mallum@openedhand.com</email>
    </author>

    <copyright>
      <year>2005</year>
      <holder>OpenedHand Ltd</holder>
    </copyright>
  </articleinfo>

<section><title>Introduction</title>
<para>

This report builds on the original fullscreen blit benchmark tests on
handheld ARM based devices. The focus is moved to font glyph rendering
speeds via different mechanisms, image blitting via GDK and the
original tests on a newer 2.6 kernel.

</para>

<para>

Graphics output is assumed to be by means of writing data to a 'dumb' kernel
framebuffer device via direct means or an XServer. 

</para>

</section>

<section><title>Tests</title>
<para>

For the tests simple test programs were created. They are
written in C. the initial tests written are as follows.

</para>
<para>

As well as the original tests, the following new tests have been created;

</para>

<para>

<variablelist>

<varlistentry>
<term>test-fb</term>
<listitem>

<para>

Performs blits directly to the raw framebuffer device ( no X ). From the original tests.
</para>

</listitem>
</varlistentry>

<varlistentry>
<term>test-x</term>
<listitem>

<para>

Performs blits to an X window via SHM shared memory X Images.  From the original tests.
</para>

</listitem>
</varlistentry>


<varlistentry>
<term>test-gdk</term>
<listitem>

<para>

Performs blits via GDK-pixbufs on X. Blits are performed to a GTK
drawing area widget with double buffering turned off. This makes the
test comparable to the others as they perform no double buffering.

</para>

</listitem>
</varlistentry>

<varlistentry>
<term>test-freetype</term>
<listitem>
<para>

Renders to lines of glyphs to the framebuffer using the freetype library.
The original version generated glyphs per glyph blit, an improved version
was then created which pregenerated ( 'cached' ) the glyph bit masks.

</para>

</listitem>
</varlistentry>

<varlistentry>
<term>test-xft</term>
<listitem>

<para>

Renders lines of glyphs to an X window using the Xft2 extension.

</para>

</listitem>
</varlistentry>

<varlistentry>
<term>test-pango</term>
<listitem>
<para>

Renders lines of glyphs to an X window using the Pango-Xft library. 
No pango layout or GTK functionality is used.

</para>

</listitem>
</varlistentry>

<varlistentry>
<term>test-pango-layout</term>
<listitem>
<para>

Renders lines of glyphs to a GTK drawing area ( with double buffering
disabled ) via Pango layouts. GTK/GDK must be used as only versions of
pango &lt; 1.8 expose layout functionality to 'raw xft'. One layout per
line is used.

</para>

</listitem>
</varlistentry>

</variablelist>

</para>

<para>

Note all font based tests take similar arguments to specify what text
is rendered ( run tests with -h to see ). By default Vera Sans fonts
is used at 18 points with 20 lines of the ascii alphabet ( a -> z)
being rendered 200 times.

</para>

</section>
<section><title>Test Platforms</title>

<para>
The tests were run on the following platforms;
</para>

<variablelist>

<varlistentry>
 <term>Sharp Zaurus c760 ( Husky )</term>
 <listitem>
  <para>
   <itemizedlist mark="bullet" spacing="compact">
     <listitem>
     <para>CPU: XScale-PXA255 rev 6</para>
     </listitem>
     <listitem>
     <para>RAM: 64MB</para>
     </listitem>
     <listitem>
     <para>Display: 640x480x16 LCD</para>
     </listitem>
     <listitem>
     <para>GFX Chip: ATI IMAGEON W100</para>
     </listitem>
     <listitem>
     <para>X11: Freedesktop.org  kdrive Xfbdev server</para>
     </listitem>
     <listitem>
     <para>kernel: 2.6.11-rc2-openzaurus ( softfloat )</para>
     </listitem>
   </itemizedlist>
  </para>
 </listitem>
</varlistentry>


<varlistentry>
<term>Ipaq 5500</term>
<listitem>
<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
<para>CPU: XScale-PXA255 rev 6 </para>
</listitem>
<listitem>
<para>RAM: 128MB</para>
</listitem>
<listitem>
<para>Display: 320x240x16 LCD</para>
</listitem>
<listitem>
<para>GFX Chip: MediaQ</para>
</listitem>
<listitem>
<para>X11: Freedesktop.org  kdrive Xfbdev server</para>
</listitem>
<listitem>
<para>kernel: 2.4.19-rmk6-pxa1-hh37</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>


<varlistentry>
<term>Ipaq 3850</term>
<listitem>
<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
<para>CPU: StrongARM-1110 rev 9 </para>
</listitem>
<listitem>
<para>RAM: 128MB</para>
</listitem>
<listitem>
<para>Display: 320x240x16 LCD</para>
</listitem>
<listitem>
<para>GFX Chip: None</para>
</listitem>
<listitem>
<para>X11: Freedesktop.org  kdrive Xfbdev server</para>
</listitem>
<listitem>
<para>kernel: 2.4.19-rmk6-pxa1-hh37</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>

<varlistentry>
<term>IBM Thinkpad T40p</term>
<listitem>
<para>
<itemizedlist mark="bullet" spacing="compact">
<listitem>
<para>CPU: x86 Pentium M 1600Mhz </para>
</listitem>
<listitem>
<para>RAM: 1Gig</para>
</listitem>
<listitem>
<para>Display: 1400x1050x16 LCD</para>
</listitem>
<listitem>
<para>GFX Chip: ATI Radeon</para>
</listitem>
<listitem>
<para>XFree86 4.3</para>
</listitem>
<listitem>
<para>kernel: 2.6.9</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>

</variablelist>

</section>
<section>
<title>Platform Notes</title>

<para>

All ARM machines have the same version XServer and X librarys. Both of
which are from recent checkouts of the freedesktop.org cvs kdrive
source.  In all of the above cases no hardware acceleration was
used. The display is also running in its 'natural' orientation.

</para>

<para> 

The c760 device is very similar hardware wise to that of the c700,
except having a larger battery and increased internal flash
storage. The binaries built on the c760 are built using the soft-float
floating point emulation provided by newer gcc's. This is reportadly
supposedly much better performing than kernel 'hardfloat' floating
point performance.

</para>

<para>

The Thinkpad is x86 hardware and has an XFree86 accelerated
server.

</para>

</section>

<section><title>Benchmark Results</title>

<section><title>Blit Results</title>

<para>

<table frame='all'><title>Test Results</title>
<tgroup cols='4' align='left' colsep='1' rowsep='1'>
<thead>
<row>
  <entry>Device</entry>
  <entry>test-fb</entry>
  <entry>test-x</entry>
  <entry>test-gdk</entry>
</row>
</thead>
<tbody>
<row>
  <entry>c760</entry>
  <entry align='right'>12177 KB/Sec</entry>
  <entry align='right'>11015 KB/sec</entry>
  <entry align='right'>6163 KB/sec</entry>
</row>

<row>
  <entry>Ipaq 5550</entry>
  <entry align='right'>7425 KB/Sec</entry>
  <entry align='right'>6412 KB/sec</entry>
  <entry align='right'>5184 KB/sec</entry>
</row>

<row>
  <entry>Ipaq 3800</entry>
  <entry align='right'>30241 KB/Sec </entry>
  <entry align='right'>23547 KB/Sec</entry>
  <entry align='right'>11144 KB/sec</entry>
</row>

<row>
  <entry>Thinkpad T40p</entry>
  <entry align='right'>137896 KB/Sec</entry>
  <entry align='right'>370451 KB/Sec</entry>
  <entry align='right'>317215 KB/sec</entry>
</row>

</tbody>
</tgroup>
</table>

</para>

</section>

<section><title>Blit Discussion</title>

<para>

We see no marked improvements on blit speeds since previous tests with
results much the same. This is to be expected though as no major
developments have happened in this area since the tests were last run.

</para>
<para>

The c760, however, is using a 2.6 kernel and performance has actually
degraded. This is not too much of a worry though, the 2.6 kernel on
the c760 is very immature and the performance degration has been
reported to the fb driver author. The fb driver is infact a rewrite of
the 2.4 driver without access to the display chips technical details.

</para>
<para>

The 5500 results are very odd, its seems actual framebuffer access is
slow during heavy blits but actual font rendering was very fast in
comparison.  The fb driver lacks any acceleration functionality
provided by the mediaq chip. Could it possibly be the driver or
hardware imposes some kind of bottleneck under heavy load that is 
causing strnage results ? The same results appeared even after a second
seperate run of the benchmarks.

</para>

<para>

The 3800 is fastest of all ARM devices with direct access to the
display. It has no graphics chip driver. The linux support for the
hardware is very mature when compared to the other two devices. The
CPU however is the slowest.

</para>

<para>

GDK pixbuf blits take a further large speed hit over pure X SHM blits. A
reason for this could be the pixbuf internals having the extra work of
rounding down from 24bpp RGB to 16bpp RGB before blitting to the
server.

</para>

<para>

Interestingly this difference is not as large when run on an x86
system. Could there perhaps be a more serious issue with gtk on ARM ?
This needs further investigation. Version 2.4 of GTK was use for the
tests which apparently does not suffer the previously reported SHM
bug.

</para>

<para>

The gtk blit test disabled the internal double buffering on the
drawing area widget ( via gtk_widget_set_double_buffered(FALSE) ) to
make the test similar to that of other fullscreen blit tests which use
no double buffering.

</para>

<para>

Gtk double buffering working in such away that the widgets visible
window is replaced with an offscreen pixmap before its expose()
handler is called, on returning from this handler the pixmap is copied
to the visible window. To accomplish a similar test with double
buffering the blit must happen else when in the code so the double
buffering expose mechanism can still take place. It was therefor
placed in an idle handler which after blitting would trigger the
expose handler.

</para>

<para>

Such a test was created ( test-gdk-idle ) and the results, from Ipaq
3800, were just slightly worse with;

</para>
<para>

<literallayout class="monospaced">

 ./test-gdk-idle
test-gdk-idle: write speed: 11227 KB/sec

</literallayout>

</para>
<para>

Any performance loss in the above is likely due to the frequency of
the idle handler getting called. This assumes the cost is moving the
pixmap from off to on screen is made up by the time save blitting to
an off screen pixmap.

</para>

<para> 

On x86 test-x is 3 times faster than test-fb, this is the
effect of having an accelerated server.

</para>

</section>

<section><title>Glyph Results</title>

<para>

<table frame='all'><title>Test Results</title>
<tgroup cols='5' align='left' colsep='1' rowsep='1'>
<thead>
<row>
  <entry>Device</entry>
  <entry>test-freetype</entry>
  <entry>test-freetype-cached</entry>
  <entry>test-xft</entry>
  <entry>test-pango</entry>
  <entry>test-pango-layout</entry>
</row>
</thead>
<tbody>
<row>
  <entry>c760</entry>
  <entry align='right'>1156 glyphs/sec</entry>
  <entry align='right'>Not Run</entry>
  <entry align='right'>9386 glyphs/sec</entry>
  <entry align='right'>6712 glyphs/sec</entry>
  <entry align='right'>5557 glyphs/sec</entry>
</row>

<row>
  <entry>Ipaq 5550</entry>
  <entry align='right'>1711 glyphs/sec</entry>
  <entry align='right'>Not Run</entry>
  <entry align='right'>18991 glyphs/sec</entry>
  <entry align='right'>12192 glyphs/sec</entry>
  <entry align='right'>9344  glyphs/sec</entry>
</row>

<row>
  <entry>Ipaq 3800</entry>
  <entry align='right'>957 glyphs/sec</entry>
  <entry align='right'>25304 glyphs/sec</entry>
  <entry align='right'>17937 glyphs/sec</entry>
  <entry align='right'>11458 glyphs/sec</entry>
  <entry align='right'>9282 glyphs/sec</entry>
</row>

<row>
  <entry>Thinkpad T40p</entry>
  <entry align='right'>28904 glyphs/sec</entry>
  <entry align='right'>28812 glyphs/sec</entry>
  <entry align='right'>16634 glyphs/sec</entry>
  <entry align='right'>15384 glyphs/sec</entry>
  <entry align='right'>15298 glyphs/sec</entry>
</row>

</tbody>
</tgroup>
</table>

</para>

</section>

<section><title>Glyph Discussion</title>

<para>

With pregenerated glyphs freetype is fastest, then xft. The plain
pango line rendering is approximately 30% slower, with pango layout
rendering being approximately a further 10-20% slower.

</para>

<para>

Although total speeds vary between each platform, the fraction of
difference in speed between each test type stays approximatly the same
( though this is not so true on Thinkpad ).

</para>

<para>

The Thinkpad results, though fast, are slower than expected when
compared to blit speeds on both fb and X. I am not sure why this is.

</para>

<para>

The non cached freetype test is much slower than expected on ARM
platforms. On a desktop x86 system the results are much improved with
speeds as expected greater than that of xft. The reason for the low
performance on arm is likely the lack of any glyph bitmap caching per
glyph render and the bitmap generation using much floating point.

</para>
<para>

This proves that xft is caching glyph bitmap generation and it is definetly 
required for acceptable performance. 

</para>
<para>

To further improve on this a version of test-freetype (
test-freetype-cached.c ) was created that pregenerated glypth bitmaps
in a simple cache before painting them. Running on the Ipaq 3800 gave
( including cache generation time );

</para>
<para>

<literallayout class="monospaced">

test-freetype-cached: pre generated glyphs in 1159 ms
test-freetype-cached: Total time 2055 ms, 
                      52000 glyphs rendered = approx 25304 glyphs per second

</literallayout>

</para>

<para>

It should also be noted that the test-freetype test very crudely
renders just the 8 bit mask to the display ( all bits > 0 are blitted ).
No subpixel or even basic anti-aliasing was performed.

</para>

<para>

test-pango writes text via the low level pango xft calls to render
lines of text to an X window. No gdk/gtk calls are used. To
investigate the overhead of rendering to a gtk widget and window two
further tests were created - test-pango-gdk to a GDk Window and
test_pango_gtk - to GTK drawing area. Benchmarks from these on the
3800 were approximately equal. Another test was created using
gdk_draw_glyphs() instead of pango_xft_render() again results were
comparable - indicating draw_glyphs is just a wrapper around
pango_xft_render().

</para>

<para>

test-pango-layout uses the pango layout api to render onto a gtk
drawing area - most GTK widgets use layouts. There is an overhead
involved, this could be worse if we were rendering more than just a
simple line.

</para>

<para>


</para>

</section>

</section>

<section><title>Improvements and Future Directions</title>

<para>

Some ideas for future tests.

</para>
<para>

<itemizedlist mark="bullet" spacing="compact">
<listitem>
<para>Investigate gtk slow blits more fully.</para>
</listitem>

<listitem>
<para>Create a pango test with all lines in a single layout.</para>
</listitem>

<listitem>
<para>Investigate slow glyph speeds on x86.</para>
</listitem>


</itemizedlist>

</para>

</section>

<section><title>References</title>
<para>

<itemizedlist mark="bullet" spacing="compact">
<listitem>
<para><ulink url="fstests-0.1.tar.gz">Test Source Code</ulink></para>
</listitem>
<listitem>
<para><ulink url="http://www.freetype.org/">Freetype.org</ulink></para>
</listitem>

<listitem>
<para><ulink url="http://www.pango.org/">Pango</ulink></para>
</listitem>

<listitem>
<para><ulink url="http://www.fontconfig.org/wiki/">Xft/Fontconfig</ulink></para>
</listitem>

</itemizedlist>

</para>

</section>

</article>