COM7098

Research Methods and Study Skills

Module Coursework Reflection  ยท  MSc Computer Science (Conversion)  ยท  Author: Orville Fernandes

CW1: Research Proposal

The Task and the Topic

CW1 for COM7098 Research Methods and Study Skills required the submission of a research proposal that could serve as the foundation for a dissertation. At the time I decided to investigate the relationship between IQ-style reasoning benchmarks and performance transfer in Large Language Models (LLMs). I was fascinated by the comparison between LLM and human intelligence and found myself asking a specific question: would improvements on IQ-style tests actually reflect broader reasoning capability in LLMs, or would they just be an artefact of benchmark-specific learning? That felt like a question worth exploring.

The proposal was structured across eight sections as required by the brief: introduction, rationale, aim, objectives, initial literature review, methodology, references, and a Gantt chart appendix.

Approach and Strengths

Given that this would be a preliminary research proposal, I invested a good amount of effort in the literature review, exploring existing literature on the subject, and trying to evaluate if this would be a viable piece of research to pursue. I drew on a range of recent and established sources: early benchmark frameworks such as GLUE (Wang et al., 2018), holistic evaluation approaches including HELM (Liang et al., 2022) and BIG-Bench (Srivastava et al., 2022), and work specifically applying IQ-style measurement to LLMs (Abdelkarim et al., 2025; Jas, 2025; Pham et al., 2025). My attention was also drawn to the issue of benchmark contamination, where the risk is that models perform well on evaluation sets because similar items appeared in training data โ€“ and the resulting need for strict train/test separation to produce interpretable results (Xu et al., 2024).

The methodology I proposed was quantitative and experimental: baseline evaluation of open-source LLMs on IQ-style and established benchmarks, followed by supervised fine-tuning using parameter-efficient methods such as LoRA (Hu et al., 2022) and QLoRA (Dettmers et al., 2023) to enable controlled adaptation within realistic computational constraints. Post-training re-evaluation would then allow analysis of transfer effects. I was satisfied with how the methodology held together logically, and the Gantt chart laid out a realistic seven-month plan from February to September.

Feedback and Grade

I received a grade of 58/100. The overall comment acknowledged that the project had potential and could demonstrate genuine analytical skills. Several of the specific feedback points were fair. I used vague quantifiers such as "numerous" in the Rationale rather than citing specific figures, and a couple of factual claims lacked in-text citations where they were required. A diagram illustrating the experimental process would have strengthened the methodology section, and more explicit contingency discussion beyond the Gantt chart was expected.

However, one piece of feedback I want to address directly is the comment requesting engagement with the ESRC Ethical Framework. Reviewing the assignment brief, the required sections were: Introduction, Rationale, Aim, Objectives, Initial Literature Review, Methodology, References, and Appendix. There was no ethics section listed, and the ESRC framework is not referenced anywhere in the brief's instructions. Given the tight anticipated word count of approximately 1,000 words, I made a deliberate decision to focus on what was explicitly required. My proposal involved no human participants; it was entirely computational, working with open-source models and publicly available datasets. In that context, I judged the ethical considerations to be minimal and not warranting dedicated discussion at the expense of other sections.

In hindsight, at postgraduate level the expectation appears to be that ethical reasoning is embedded in methodology sections as standard practice, regardless of whether it is explicitly listed in the brief. Even a brief acknowledgement of ethical concerns may have addressed the marker's concern. A lesson I have taken forward is that at Level 7, some expectations are assumed rather than stated, and it is the student's responsibility to anticipate them. That said, I maintain that greater clarity in the brief would have helped all students, not just me.

What I Learned

Despite the grade, I found genuine value in this coursework. Writing the proposal forced me to engage seriously with the LLM evaluation literature at a depth I had not previously reached. It also sharpened my thinking about what it actually means to measure intelligence in a non-human system โ€“ a question my psychology background made particularly compelling. The tension between psychometric validity and computational benchmarking is not a settled question, and that is the kind of problem I find intellectually interesting.

The experience of designing an experimental methodology has also given me frameworks I am now applying to my dissertation project, where I need to evaluate AI model performance in a shoplifting detection context to then integrate it into a software solution. The discipline of thinking through baseline evaluation, controlled fine-tuning, and post-training analysis has direct relevance to how I will approach model assessment in that work.

PR1: Critical Reflection Portfolio

I will be honest: when the Critical Reflection coursework (PR1) first appeared on the horizon, my heart sank a little. The idea of producing a long, sprawling reflective document felt like a marathon I had not trained for. Reflective writing, at its worst, can become an exercise in eloquent rambling โ€“ pages of words that say very little, submitted to a tired marker who may not read past the first section. I could already picture it: Arial 12, double-spaced, going on forever. The thought of it horrified me.

What softened the blow was knowing I was not starting from nothing. Throughout the year, our lecturer, Gavin, had structured the RMS module around regular tasks: news watch entries, smaller reflective exercises, and at the end, formative assessments in the form of reflections on each module. I had engaged with them, not always enthusiastically, but consistently enough. As Moon (2013) argues, reflective practice is most effective when it is embedded into ongoing learning rather than bolted on at the end as a separate chore, and that is exactly what those tasks had achieved. By the time I sat down to write up this coursework, I already had content. That was a relief.

But having content and being motivated are entirely different things. I still was not particularly excited. I kept returning to the same question: would a marker actually read all of it? I knew I would not, or would at least have to push myself to. A long static document feels passive โ€“ it asks nothing of the reader and rewards curiosity with more text. I wanted something that would hold someone's attention, that would feel like it had been made with care rather than typed under obligation.

The assignment brief offered some creative latitude for its method of delivery and even suggested a piece to camera. I considered it, but the same problem applied: sitting in front of a camera and talking about my reflections for twenty minutes would only move the boredom from the page to the screen. Then, somewhere between waking and sleeping โ€“ which, in my experience, is when most of my better ideas arrive โ€“ it clicked. A website and a short, fun video to introduce myself as a person rather than just text. The product of the 'incubation' stage as Wallas (1926) would describe, where the mind continues working below conscious awareness; I cannot claim I was thinking about it deliberately, but the idea arrived fully formed nonetheless.

The moment the idea landed, so did the motivation. Ryan and Deci (2000) describe intrinsic motivation as arising when an activity aligns with genuine personal interest and offers real autonomy. I can confirm, empirically, that this is accurate. Ideas for pages, features, and content started flowing in a way that drafting a Word document never would have prompted. I made notes obsessively. I had to stop myself getting ahead of the actual work.

Before committing fully, I needed Gavin's sign-off. I wrote up a draft of my reflections first โ€“ partly to show him something concrete, and partly as a contingency: if he said no, I would have some semblance of a report ready to submit anyway. I had mentally prepared to make a case for the idea if he was hesitant, and equally prepared to accept it if the answer was still no. As it turned out, he said he was impressed with the idea and was excited to see how it would turn out. That endorsement was quite encouraging.

What followed was probably more effort than the grade strictly required. Designing the website, scripting and filming and editing the video โ€“ none of which was obligatory. A solid written report would have earned a decent mark. But I wanted to build something I could feel proud of, something that tested me to hone my skills, and showcase what I am capable of.

The deeper lesson is about motivation itself. When I find a task uninteresting, my motivation does not diminish โ€“ it disappears entirely. When I am engaged, it is practically unlimited. The gap between those two states is enormous, and learning to bridge it โ€“ learning to find the angle on a dull task that makes it worth doing โ€“ is a skill I am still developing. Zimmerman (2002) describes self-regulated learners as those who strategically direct effort towards personally meaningful goals; I would add that part of that strategy is being honest with yourself about what you actually find meaningful, and engineering your way towards it wherever possible. This coursework taught me that the format of a task is not fixed, and that changing the format can change everything.

References

Abdelkarim, S., Lu, D., Flores, D.L., Jaeggi, S. and Baldi, P. (2025) 'Evaluating the intelligence of large language models: a comparative study using verbal and visual IQ tests', Computers in Human Behavior: Artificial Humans, p. 100170.

Dettmers, T., Pagnoni, A., Holtzman, A. and Zettlemoyer, L. (2023) 'QLoRA: efficient finetuning of quantized LLMs', Advances in Neural Information Processing Systems, 36, pp. 10088โ€“10115.

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L. and Chen, W. (2022) 'LoRA: low-rank adaptation of large language models', ICLR, 1(2), p. 3.

Jas, J. (2025) IQ Progression of Large Language Models. Available at: ijs.si.

Liang, P. et al. (2022) 'Holistic evaluation of language models', arXiv preprint arXiv:2211.09110.

Moon, J.A. (2013) A Handbook of Reflective and Experiential Learning: Theory and Practice. Abingdon: Routledge.

Pham, T.H. et al. (2025) 'IQBench: how "smart" are vision-language models? A study with human IQ tests', arXiv preprint arXiv:2505.12000.

Ryan, R.M. and Deci, E.L. (2000) 'Intrinsic and extrinsic motivations: classic definitions and new directions', Contemporary Educational Psychology, 25(1), pp. 54โ€“67.

Srivastava, A. et al. (2022) 'Beyond the imitation game: quantifying and extrapolating the capabilities of language models', Transactions on Machine Learning Research.

Wallas, G. (1926) The Art of Thought. London: Jonathan Cape.

Wang, A. et al. (2018) 'GLUE: a multi-task benchmark and analysis platform for natural language understanding', Proceedings of the 2018 EMNLP Workshop BlackboxNLP, pp. 353โ€“355.

Xu, C., Guan, S., Greene, D. and Kechadi, M. (2024) 'Benchmark data contamination of large language models: a survey', arXiv preprint arXiv:2406.04244.

Zimmerman, B.J. (2002) 'Becoming a self-regulated learner: an overview', Theory into Practice, 41(2), pp. 64โ€“70.

Marks

CW1 โ€“ Research Proposal 58 / 100
PR1 โ€“ Critical Reflection Portfolio Pending
Overall โ€“ %