HPC systems administration
Operate Linux/Slurm research computing systems, monitor queues and node health, manage access, troubleshoot user issues, and keep shared cluster environments usable for research groups.
Technical Profile
I build and maintain Linux computing environments, software stacks, and automation that make research workflows easier to run, debug, and reproduce.
At CSU Northridge, I managed two NSF-funded HPC clusters for the College of Science and Mathematics, supporting Slurm scheduling, Linux administration, software environments, GPU-capable nodes, and day-to-day user troubleshooting.
Operate Linux/Slurm research computing systems, monitor queues and node health, manage access, troubleshoot user issues, and keep shared cluster environments usable for research groups.
Install, configure, and debug software stacks for computational chemistry, Python analysis, GPU-enabled workloads, and shared research workflows.
Write shell and Python tools for job setup, filesystem organization, log inspection, data parsing, and repeatable command-line workflows.
Build and run PyTorch training and inference workflows, manage checkpoints and logs, prepare tensor data, and analyze model outputs on GPU-backed systems.
Research computing operations
Software support
Automation
I treat computing infrastructure, software environments, automation, and documentation as part of the research workflow: the system should be reliable enough that the science can stay in focus.