Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
What's new
7
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
advanced_computer_architecture
exercises
Commits
403f7aaa
Commit
403f7aaa
authored
Aug 11, 2016
by
michael
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
neon formatierung
parent
205d56ff
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
34 additions
and
34 deletions
+34
-34
aufgaben/blatt05/blatt05.md
aufgaben/blatt05/blatt05.md
+34
-34
No files found.
aufgaben/blatt05/blatt05.md
View file @
403f7aaa
...
...
@@ -67,49 +67,49 @@ In dem Programm *neon_convert.c* wurde eine Funktion zur Konvertierung
von Farbbildern in Graustufen einmal wie im folgenden in reinem C implementiert
```c
void reference_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n){
int i;
for (i=0; i<n; i++){
int r = *src++; // load red
int g = *src++; // load green
int b = *src++; // load blue
// build weighted average:
int y = (r*77)+(g*151)+(b*28);
void reference_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n){
int i;
for (i=0; i<n; i++){
int r = *src++; // load red
int g = *src++; // load green
int b = *src++; // load blue
// build weighted average:
int y = (r*77)+(g*151)+(b*28);
// undo the scale by 256 and write to memory:
*dest++ = (y>>8);
}
}
}
```
Und einmal mit neon intrinsics implementiert.
void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n){
int i;
uint8x8_t rfac = vdup_n_u8 (77);
uint8x8_t gfac = vdup_n_u8 (151);
uint8x8_t bfac = vdup_n_u8 (28);
n/= 8;
for (i=0; i<n; i++) {
uint16x8_t temp;
uint8x8x3_t rgb = vld3_u8 (src);
uint8x8_t result;
temp = vmull_u8 (rgb.val[0],
rfac);
temp = vmlal_u8 (temp,rgb.val[1], gfac);
temp = vmlal_u8 (temp,rgb.val[2], bfac);
result = vshrn_n_u16 (temp, 8);
vst1_u8 (dest, result);
src += 8*3;
dest += 8;
}
}
```c
void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int n){
int i;
uint8x8_t rfac = vdup_n_u8 (77);
uint8x8_t gfac = vdup_n_u8 (151);
uint8x8_t bfac = vdup_n_u8 (28);
n/= 8;
for (i=0; i<n; i++) {
uint16x8_t temp;
uint8x8x3_t rgb = vld3_u8 (src);
uint8x8_t result;
temp = vmull_u8 (rgb.val[0], rfac);
temp = vmlal_u8 (temp,rgb.val[1], gfac);
temp = vmlal_u8 (temp,rgb.val[2], bfac);
result = vshrn_n_u16 (temp, 8);
vst1_u8 (dest, result);
src += 8*3;
dest += 8;
}
}
``
`
Ihre Aufgabe ist es nun die beiden implementierungen miteinander zu vergleichen und den Effekt von Pipelinebreiten und Compileroptimierungen auf
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment