Friday, November 13, 2009

Caleb, the Data Mining Undergrad

Below is an email from a Jacobs School undergraduate who is taking advantage of the many research opportunities available to undergraduates at the Jacobs School and at UC San Diego, more generally.

My name is Caleb Sotelo; I'm a third year in pursuit of a BS in Computer Science, and an undergrad member of the Gordon Engineering Leadership Center. This past summer I was funded by CAMP (California Alliance for Minority Participation) through UCSD's Academic Enrichment Programs office to find and engage in a research project for [CSE] 199 credit.

I decided I wanted to pursue data mining, without any substantial knowledge of the discipline. Finding a project was as easy as searching the CSE website for professors by their research interests. I met with computer science professor Charles Elkan, who recommended a project based on my experience with Java and software engineering.

I was privileged enough to receive a travel scholarship to present my research at the SACNAS National Conference in Dallas, TX, where I received an award for Outstanding Contribution and Research Presentation in the area of Computer and Information Technology. The software is now nearly release-ready, and we're hoping to submit a paper to a Machine Learning journal by the end of the month. My experience researching as a UCSD undergrad been excellent.

My mentor, Charles Elkan, has entrusted me with much responsibility and is eager to provide me with advice, resources, and more, and programs like AEP and SACNAS are invested in my success. The Jacobs School of Engineering is a top-notch engineering school and a rewarding place to invest your time; I would highly recommend taking 199 [Independent Study for Undergraduates] to any engineering student to get your feet wet.

Abstract below:


Caleb D. Sotelo, Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA.
A typical challenge in data mining is the discovery of patterns that are actually interesting to the user, as opposed to patterns that are coincidental or already well-known. A solution to this problem is to create a system that allows the user to fluidly explore the rule space and facilitates discovery by users of significant patterns. One such system was proposed and implemented by Lei Zhang and others [1] for Motorola Inc., but to our knowledge no similar system exists as open source software. Rapidminer is a widely-used, highly functional, and robust open source data mining environment written in Java. We are developing a graphical interface extension to Rapidminer that will allow for intuitive and user-friendly pattern exploration, inspired by the Motorola system. Future work will add novel interactive capabilities.
[1] Zhang, L., Liu, B., Benkler, J., Zhou C. “Finding actionable knowledge via automated comparison.” IEEE ICDE, 2009.

No comments: