

# **Transceiver Design Basics**

**Renesas Technology Europe** 

Systems Platform Dr. Mirco Pieper

13/10/2009

©2008. Renesas Technology Europe Ltd., All rights reserved.

### Overview

- Communication chain
- Main implementation parts of TX and RX
- Details for embedded modem implementation
- The new RX microcontroller



### Communication chain

ТΧ

Data source / data modulator (MCU)
 Low pass filter for image removal
 Transmission amplifier

RX

Anti aliasing filter

Automatic gain control

Data demodulator (MCU)







### Automatic Gain Control solutions



High performance AGC

Variable gain amplifier (VGA) needed
 Gain range depends on VGA (linear)
 Digital interface or DAC needed to control VGA



### **Discrete AGC**

Each gain stage requires Op Amp
 General purpose Op Amp
 MCU must have several ADC inputs



### Automatic Gain Control solutions



Analogue switch based discrete AGC

Analogue switch needed
 Digital interface to control switch needed
 Number of gain stages depend on analogue switch
 Only one Op amp needed
 Only one ADC input needed



**Discrete AGC** 

Same performance as above
 No analogue switch needed

No interface needed

Number of gain stages depend on ADC inputs

### Data capturing

- Incoming data must be capturing while previous data are processed!
- DMA based Ping-Pong buffer
- Data are written to the Ping-buffer while data are read from the Pong buffer
- Pong-buffer has to be processed before Ping-buffer is full
- After Ping-buffer is full switch buffer pointers
- Incoming data rate and buffer size define the available processing time





## General signal processing overview





### **RX** state machine

Wait for signal Energy detection Envelope detection Preamble detection Channel estimation Synchronization no more Idle data Calculate starting point of data Energy Data demodulation detected no more data Sync & Data Channel

detector

estimation

sync

found

no sync

found

incoming

data

### What is needed to receive a frame?

- How to detect a frame?
- How to compensate channel impact?
- How to set up automatic gain control section?
- Add a preamble (excellent aperiodic autocorrelation properties) to detect a frame in noise environment!
- Use preamble or additional training section for channel equalization







### FFT analysis

Radix-2
 Radix-4
 Split Radix FFT (SRFFT)
 Fast Hartley Transform (FHT)
 Quick Fourier Transform (QFT)
 Decimation-in-Timer-Frequency (DITF)

Number of operations is equivalent to speed on CISC architecture

Fastest implementation on RISC architectures due to pipelining and cache usage Mathematical operations involved in a 1024-point complex FFT

| Algorithm | Float<br>Mults | Float<br>Adds | Int<br>Mults | Int<br>Adds | Bin<br>Shifts |
|-----------|----------------|---------------|--------------|-------------|---------------|
| Radix-2   | 20480          | 30720         | 0            | 15357       | 1024          |
| Radix-4   | 15701          | 28842         | 336          | 8877        | 2738          |
| SRFFT     | 10016          | 25488         | 502          | 12448       | 2937          |
| FHT       | 18704          | 32056         | 0            | 8367        | 4246          |
| QFT       | 8448           | 31492         | 16           | 70058       | 316           |
| DITF      | 16640          | 28800         | 1076         | 18839       | 1086          |



### Second Order Structure-Filter

Convert an IIR Filter to cascaded IIR filters of lower order (2<sup>nd</sup> order)
 Filter is stable if each sub-filter is stable
 Less quantization error impact of filter coefficients



## Sliding window based filtering





## **RMPA - Repeat Multiply and Accumulation**

- Performs sum-of-product operation, with the
  - multiplicand address indicated by A0,
  - the multiplier address indicated by A1, and
  - the number of operation indicated by R7R5.
  - The result is stored to R3R1:R2R0 as 64-bit data.
- Cycles of word based RMPA
  - R32C: 11+1.5m\* cycles
    M32C: 7+2m\* cycles
    M16C80: 7+2m\* cycles
    M16C: 6 + 9m\*
  - M16C Tiny: 6 + 9m\*

\*m is the number of operations







### Fast Convolution – Frequency domain vs. Time domain

Comparison of 32<sup>nd</sup> order FIR-filtering based on convolution and FFT.



### Frequency domain

■960 input values must lead to 991 output values

■512 point FFT (zero padding)

 $\blacksquare$ FFT => (N) log<sub>2</sub> (N) operation

=> 4608 \* 2 = 9216 operations

Required operations

■FFT (9216)

Filter tabs are pre calculated

1024 mults

■IFFT (9216)

32 additions

19488 operations

### Time domain

■Required operations ■960 \* 32 mults ■960 + 32 + 1 additions ⇒30720 + 960 + 32 +1 = <u>31713</u> operations



### **Overlap and Add Method**

- Continuous data stream is split into segments due to ping pong buffering
- Signal processing of first and last part of buffer is related data of previous and following buffer
- Overlap and Add method to combine adjacent data





# The RX Concept

Enhance the current Renesas 16 and 32 bit CISC device families to meet future market requirements



# **RX Target Indices**



The aim is to realize higher maximum operating frequency, better performance, improved code efficiency, and lower power consumption than previous products, by enhanced design techniques and utilising 90 nm technology.





## Features of RX CPU – Improved Performance







Renesas Technology Europe

©2008. Renesas Technology Europe Ltd., All rights reserved.

# **RX610 Group Specifications**

Features: 165MIPS, 50mA and single cycle Flash access at 100MHz operation 10bit ADC 4ch x 4unit

### High-performance CPU

- High-speed operation: Single-state basic instruction execution: 10ns(100MHz@3.0 to 3.6V)
- 32bit multiplier/divider, multiplier-accumulator and singleprecision FPU

#### On-chip memory

- Flash: 2 MB/RAM 128 KB
- Flash: 1.5 MB/RAM 128 KB
- Flash: 1 MB/RAM 128 KB
- Flash: 768 KB/RAM 128 KB
- No flash memory/RAM 128 KB

### Peripheral functions

- External bus expansion: 16-bit separate bus (ROM/RAM I/F, byte control SRAM I/F)
- -Transfer module: DMAC and DTC
- -Timers:

High-performance general purpose timer: 16 bits x 12ch (TPU) Timer optimized for OS and similar applications: 16 bits x 4ch (CMT)

General-purpose timer: 8 bits x 4ch (TMR)

- -High-speed 10-bit A/D converter (conversion time: 1µs)
- -10-bit D/A converter x 2ch
- -Communication functions: Clock synchronous/asynchronous SCI x 7ch I2C x 2ch (Fast-mode Plus)

### Development environment

- On-chip debug emulator
- Full emulator

#### Package

LQFP144, BGA176-pin

### **RX610 Group Block Diagram**



Everywhere you imagine. **RENESAS** 



### Modulation basics



### BPSK / QPSK Scatter plot n -0.5 0.2 -0.4 -0.6 -0.6 -1 00 n 11 | 7.







DCSK

Connect short waveforms to make long Basic Waveform



16 patterns waveforms

