SL²A-INR: Single-Layer Learnable Activation for Implicit Neural Representation

Abstract

Implicit Neural Representations (INRs) excel at modeling continuous signals but often struggle with spectral bias—learning low frequencies first and missing fine high-frequency structure. Current methods using hand-crafted activation functions (SIREN, WIRE, FINER) or positional encodings still face limitations in capturing diverse signal types and high-frequency components.

We introduce SL²A-INR, a hybrid architecture that combines a learnable Chebyshev activation block with a lightweight ReLU fusion network. The Chebyshev polynomials learn higher-order coefficients directly, expanding the representable frequency range without fragile periodic initialization. The fusion block modulates feature flow through skip connections, enabling adaptive spectral control, sharper reconstructions, and faster convergence.

Through comprehensive experiments, SL²A-INR sets new benchmarks in accuracy, quality, and robustness across image representation, 3D shape reconstruction, and novel view synthesis tasks.

🎯 Learnable Activation

Chebyshev polynomial activations learn higher-order coefficients directly during training, expanding the representable frequency range without fragile periodic initialization schemes.

⚡ Efficient Fusion

Low-rank ReLU layers modulated by the learnable activation output balance computational efficiency with expressive high-frequency modeling capabilities.

🚀 Superior Performance

State-of-the-art results across images, 3D shapes, and NeRF scenes with faster convergence and stable training under varied hyperparameters.

Method Overview

A two-block architecture designed for flexible spectral-bias tuning

1

Learnable Activation (LA) Block

Each activation function is parameterized using Chebyshev polynomials:

ψ_i,j(x) = ∑ K k=0 a_i,j,k T_k(tanh(x))

where T_k are Chebyshev polynomials of the first kind and a_i,j,k are learnable coefficients optimized via backpropagation. Layer normalization stabilizes training of high-order polynomials.

2

Fusion Block with Skip Modulation

The output of the LA Block modulates each layer via element-wise products:

z₁ = Ψ(x)
z_l = ReLU(W_l(z_l-1 ⊙ z₁) + b_l)

This persistent modulation preserves high-frequency information throughout the network while maintaining computational efficiency through low-rank linear layers.

3

Adaptive Spectral Control

The polynomial degree K controls the spectral spread. Higher K values enable the network to capture finer details and higher-frequency components. Unlike fixed activation functions, our learnable approach adapts to the data, mitigating spectral bias without manual tuning.

Spectral Bias Analysis

Demonstrating reduced spectral bias on 1D function approximation

Frequency approximation error comparison across training steps — **Figure 2:** Convergence and frequency approximation error on a 1D periodic function with four dominant frequencies. SL²A-INR (e) shows significantly lower frequency approximation error across all frequencies compared to ReLU (b), SIREN (c), and FINER (d), demonstrating effective mitigation of spectral bias.

Key Observations

• ReLU exhibits strong spectral bias, learning higher frequencies very slowly
• SIREN mitigates some bias but still struggles with high-frequency approximation
• FINER shows improved performance but with slower convergence on some frequencies
• SL²A-INR maintains consistently low error across all frequencies from early training

Experimental Results

State-of-the-art performance across diverse signal representation tasks

2D Image Representation

Image representation comparison showing text detail preservation — **Figure 3:** Qualitative comparison on image representation. SL²A-INR produces sharper text and preserves fine details better than FINER, SIREN, Gauss, WIRE, and ReLU+P.E.

Table 1: PSNR (dB) / SSIM comparison on DIV2K images (512×512). Best results in bold, second-best underlined.

Method	#Params	Image 00	Image 05	Image 10	Image 15	Average
FINER	198.9K	32.00 / 0.862	32.92 / 0.889	40.08 / 0.965	36.29 / 0.932	36.35 / 0.924
Gauss	198.9K	30.08 / 0.847	31.33 / 0.862	39.74 / 0.961	35.59 / 0.938	34.96 / 0.914
ReLU+P.E.	204.0K	30.59 / 0.851	31.22 / 0.854	40.27 / 0.973	34.59 / 0.947	35.27 / 0.916
SIREN	198.9K	29.29 / 0.831	30.73 / 0.836	37.25 / 0.950	32.23 / 0.915	33.47 / 0.896
WIRE	91.6K	28.00 / 0.773	29.26 / 0.821	33.77 / 0.862	30.49 / 0.805	30.63 / 0.818
SL²A (Ours)	330.2K	33.40 / 0.892	34.02 / 0.903	41.04 / 0.974	36.70 / 0.951	36.88 / 0.933

3D Shape Reconstruction (Occupancy Fields)

For 3D shape representation, we maintain the same architectural settings as in image representation, translating 3D coordinates into signed distance function (SDF) values. We evaluate on five shapes from the Stanford 3D Scanning Repository dataset.

Figure 4 shows the Dragon model reconstruction. SL²A-INR (0.9989 IoU) achieves superior quality with well-preserved details in both smooth low-frequency regions (body curves) and rough high-frequency areas (face details).

Dragon occupancy field reconstruction comparison — **Figure 4:** Occupancy volume representation comparison on the Dragon model.

Table 2: IoU comparison on signed distance field representation (Stanford 3D Scanning Repository)

Method	Armadillo	Dragon	Lucy	Thai Statue	Bearded Man
FINER	0.9899	0.9895	0.9832	0.9848	0.9943
Gauss	0.9768	0.9968	0.9601	0.9900	0.9932
ReLU+P.E.	0.9870	0.9763	0.9760	0.9406	0.9939
SIREN	0.9895	0.9409	0.9721	0.9799	0.9948
WIRE	0.9893	0.9921	0.9707	0.9900	0.9911
SL²A (Ours)	0.9983	0.9989	0.9988	0.9986	0.9987

Novel View Synthesis (NeRF)

Table 3: PSNR (dB) on Blender dataset with 25 training images (reduced from standard 100 to test high-frequency detail capture)

Method	Chair	Drums	Ficus	Hotdog	Lego	Materials	Mic	Ship
ReLU+P.E.	31.32	26.38	21.46	20.18	24.49	30.59	25.90	25.16
Gauss	32.68	33.59	22.28	23.16	26.10	32.17	28.29	26.19
SIREN	33.31	33.28	22.25	24.89	27.26	32.85	29.60	27.13
WIRE	29.31	32.35	21.15	22.22	25.91	30.11	25.76	25.05
FINER	33.90	33.96	22.47	24.90	28.70	33.05	30.04	27.05
SL²A (Ours)	34.70	33.88	23.43	24.33	28.31	33.83	30.63	28.62

Analysis & Ablations

Understanding the design choices and robustness of SL²A-INR

Heatmap showing PSNR across different batch sizes and learning rates — **Figure 5:** Hyperparameter robustness analysis. SL²A-INR demonstrates greater stability across learning rates and batch sizes compared to FINER, SIREN, Gauss, and other methods, requiring less careful hyperparameter tuning.

🔄 Block Synergy

Removing the ReLU fusion block results in a PSNR drop of up to 6.22 dB, demonstrating the critical importance of coupling learnable activations with modulated ReLU layers for maintaining expressive power.

📊 Polynomial Degree Effect

Increasing Chebyshev polynomial degree K from 4 to 512 progressively improves performance. Skip connections (modulation) significantly enhance results, preserving high-frequency information throughout the network.

⚡ Computational Efficiency

Despite a modest parameter increase (0.33M vs 0.20M for baselines), SL²A-INR trains a 512² image in just 0.77 minutes, faster than Gauss (3.08 min) and ReLU+PE (3.43 min).

Neural Tangent Kernel (NTK) Perspective

The eigenvalue distribution of the Neural Tangent Kernel provides insights into training dynamics. Components corresponding to larger eigenvalues are learned faster, which is crucial for overcoming spectral bias.

Increasing K in SL²A-INR reduces the rate of eigenvalue decay, resulting in higher values that enhance the model's ability to capture high-frequency components. The figure shows a hierarchy: ReLU exhibits the most rapid decay, followed by SIREN, then FINER, with SL²A-INR maintaining the slowest decay—preserving spectral properties most effectively.

NTK eigenvalue distribution comparison — **Figure 6:** NTK eigenvalue distribution showing SL²A-INR's superior spectral properties with slower decay, enabling better high-frequency learning.

Single Image Super-Resolution

To demonstrate generalization to inverse problems, we evaluated SL²A-INR on single-image super-resolution (×2, ×4, ×6 upsampling). Our method consistently outperforms FINER with higher PSNR and SSIM while producing less noisy, sharper results—particularly visible in the ×6 setting where texture preservation is critical.

Super-resolution comparison showing better detail preservation — **Figure 7:** Single-image super-resolution comparison on a parrot image (1356×2040×3) demonstrates superior detail preservation and noise reduction.

🎲 Initialization Robustness

Unlike SIREN which is highly sensitive to initialization schemes, SL²A-INR maintains stable performance across Xavier uniform, Kaiming uniform/normal, and orthogonal initialization (PSNR variance < 1.5 dB).

🔧 Design Rationale

Chebyshev polynomials offer superior convergence, numerical stability, and minimax approximation properties compared to B-splines. They efficiently capture high-frequency components with fewer parameters and better stability.

📈 Scalability

Using learnable activations only in the first layer with low-rank MLPs in subsequent layers provides an optimal trade-off between expressivity and computational efficiency, avoiding the scalability issues of full KAN architectures.

Resources

📄 Paper

Read the full paper with detailed experiments, theory, and supplementary materials.

arXiv PDF

💻 Code

PyTorch implementation with training scripts, SL²A module, and dataset loaders.

GitHub Repository

🎥 Video

Watch the video presentation explaining SL²A-INR and our experimental results.

YouTube

📧 Contact

For questions, collaborations, or discussions about the work.

moein.heidari@ubc.ca

Citation

@article{heidari2024sl2a,
  title={SL$^{2}$A-INR: Single-Layer Learnable Activation for Implicit Neural Representation},
  author={Heidari, Moein and Rezaeian, Reza and Azad, Reza and 
          Merhof, Dorit and Soltanian-Zadeh, Hamid and Hacihaliloglu, Ilker},
  journal={arXiv preprint arXiv:2409.10836},
  year={2024},
  note={Accepted to ICCV 2025}
}

Acknowledgements

This work was supported by the Canadian Foundation for Innovation-John R. Evans Leaders Fund (CFI-JELF) program grant number 42816, Mitacs Accelerate program grant number AWD024298-IT33280, and the Natural Sciences and Engineering Research Council of Canada (NSERC), RGPIN-2023-03575.

We thank the authors of ChebyKAN, WIRE, and FINER for their publicly available code.