I am a first-year Ph.D. Candidate at SNU Computer Vision Lab advised by Prof. Kyoung Mu Lee (Editor in Cheif of TPAMI). Before joining SNU (Seoul National University), I was an undergraduate student at The Pennsylvania State University - University Park majoring Statistical Modeling Data Science. I finished my Visiting Student Researcher position at KAIST Vision & Learning Lab where I had a wonderful experience with Prof. Seunghoon Hong including close advising from MS candidate Jinwoo Kim. During my junior year, I worked as an Undergraduate Research Assistant at The Pike Group (advised by Prof. Dongwon Lee) helping PhD candidate Hye Joon Park.

My primary research focus are 1) to apply inverse graphics mechanism onto computer vision models, 2) to develop brain-inspired artificial neural network design in reference to the GLOM model by Geoffrey Hinton, and 3) to advance graph neural network models to perform better on medical applications such as protein structure prediction and drug discovery.

The Capsule Network proposed by Geoffrey Hinton suggested a strong but often unrecognized problem from current Convolutional Neural Networks (CNN). The invariant characteristic of CNN craves for more data that would otherwise not be needed, if the network had equivariant properties on geometry such as angle, position, and pose like the Capsule Network.

The Vision Transformer proposed by Google Brain Team is an extension from the NLP Transformer model that uses interesting aspects of self-attention mechanism. However, due to lack of translational equivariance, the self-attention mechanism is even more susceptible to lack of data and has high computational complexity.

Physics aspect of real world (i.e., Newton’s laws of motion) has not been well applied to vision models with only few comparable research such as Neural Radiance Fields, which added viewing direction (𝜃, 𝜙) as major feature of the input. Interactions among masses of each object, various forces applied to them, and relative movement or deformation caused by such forces are well researched in the field of Computer Graphics. Vision models should pursue inverse graphics mechanism (i.e., Capsule Network) to utilize such advancement in the field of Computer Graphics rather than to pursue analysis of distribution between pixels in many different combinations (i.e., CNN, Vision Transformer) to fully incorporate real-world effects onto intelligent systems.

It has always been a privilege to continue my studying on machine learning under the supervision of remarkable faculties and colleagues. My broad long-term goal is to perform research that is beneficial to our society that can make our society a better place to live.