Learn about data lakes, machine learning & more innovations

How USC's Person Data Integration Project Went Enterprise-Wide

Guest Author

Like other institutions, USC has many different legacy and modern systems that keep operations running on a daily basis. In some cases, we have the same kind of data in different systems; in other cases, we have different data in different systems. The majority of our student data, is in the Student Information System, which is over 30 years old, whereas the majority of our staff data is now in a cloud software with APIs. At the same time, the majority of our financial data is in another system, and the faculty research data in yet another system. You get the idea.

Naturally, all of these different systems make it hard to combine the data to make better decisions. A plethora of ways to extract data from these systems along with different time intervals make it easier to make mistakes each time a report needs to be created. Even though many groups on campus need to extract/combine the same kind of data, each group needs to individually do the work to extract it, which wastes time and resources across the entire organization.

There is no single system that will be able to accommodate all of the different functions USC needs to operate; USC simply does too much: Academics, Athletics, Healthcare, Research, Construction, HR, etc. Because of this, the solution is to systematically integrate the data from the different systems one time, so every department on campus can access the same data while the different systems continue to operate.

The Person Entity Project  
At a high level, the Person Entity (PE) Project is essentially a database of everything applicable to a Person. This includes every kind of person – pre-applicant, applicant, admit, student, alum, donor, faculty, staff, etc. It aims to be a centralized, high-integrity database, encompassing data from multiple systems, that can supply data to every department at USC. Such complete, high-quality data encompassing every Trojan's entire academic and professional life at USC can then motivate powerful decision-making in recruiting, admissions, financial aid, advancement, advising, and many other domains.

As one of the original members of the team that started the PE as a skunk works project, I have seen some of the challenges of transforming a small project to an enterprise-backed service.

Where It All Began
The project came about shortly after Dr. Douglas Shook became the USC Registrar. He wanted the data at USC to be more easily accessible to the academic units on campus as well as the university for its business processes. The initial business processes of getting data and using data was filled with manual text file ftps, excel files emailed back and forth, nightly data dumps, etc. and providing delayed, incomplete, costly data was the norm.

The odds were against us because there had been numerous attempts in the past to transition out of our custom-coded legacy student information system that died mid-project. Though our project was not entirely the same as the failed ones, it was quite similar so that if we were able to move the data into a relational database and make it more accessible to other systems, the eventual transition would be a lot easier. There was not a lot of documentation on the student information system, and there was 30 years worth of custom code and fields in the system. We were also using a message broker for the first time to connect to a legacy system like ours.

Working on this project was rewarding and worthwhile because we were able to not only overcome technical hurdles, but get some early wins for the project such as providing data for a widely used applicant portal and providing updated data faster than the existing method. We steadily grew our list of data consumers and now our service provides data for the entire University—schools and administrative units.

If you are currently working on a small experimental project in a large organization, here are some key drivers to keep in mind as you embark on your journey.  

1. Create your own success criteria.
Good success criteria for a small project are the small proof of concepts that can be completed for the people on campus that are interested in your services or have not had all of their needs met by one of the enterprise services. The initial ROI will definitely not be as high as other projects so don’t measure things by the monetary amount. For us, the success criteria were things like successfully moved data from one system to the other and reducing the amount of hours a process used to take to get to the same result. We also had a timeline of the “Firsts” that the project had. For instance, first successful push of data, first database tables created, first triggers, first procedures, first production database, first integration with another system, first data client, etc. We could show to the sponsors that we were making progress, and we had a growing list of people who were excited about the problems we could solve for them. By keeping the lines of communication open about our progress, we were given the time and funding we needed to keep working on the project.

2. Minimize spending to maximize your budget.
Don’t be afraid to ask others in your organization for help. In my organization, we piggybacked on other internal organizations' licenses to lock in lower prices on the renewals versus signing on as a new customer. Another way to cut costs is to use student workers or interns to do research and work. Student workers bring a different set of skills along with challenges, but have been instrumental in our project. In our case, the project initially started with only one full time member and three to four student workers; I was one of them. We did the data modeling at first, some requirements gathering, and the coding as well. Though our work was far from perfect and we needed some guidance from full-time employees, the entire team was able to gain traction on the project by accomplishing tasks with the help of the students at a very low cost.

I would also suggest doing a cost benefit analysis for software purchases. For example, we decided to purchase a professional data modeling tool even though we could have continued using Visio, because the monetary and time cost we would spend on finicking with Visio would be higher than the license fee in the long run. Lastly, find others willing to partially sponsor the project with equipment. I.e. You can ask other groups if they can give you a slice of their Virtual Machine while you are just getting started developing.

3. Work in bite-sized chunks.
It’s easy to be overwhelmed if your project scope is large. That’s why it’s extremely useful to do proof of concepts and pilot projects. For us, the project was so large that if we tried to plan everything out in the very beginning we would get too overwhelmed to even start. The goal of the project was to migrate all of the data over to the Oracle relational database. We needed to divide the system into different data domains and start with one or two. We chose to do Admissions and Person first. This is a little unconventional I think but we just deployed some tables and created some procedures to populate the tables just so we could get started, before we completely finished the model. As a small project it is usually okay to be wrong, because you can just start over!

4. Balance selling/promoting the project with working on the project.
The project sponsor is an important member of the team for this point. If there is no interest in your project then it will die, but if there is too much interest and too many expectations you will fall short of those expectations and people will lose faith in the project. Both are important but too much of one could be fatal. He pitched the project solutions to the problems that USC was trying to solve, how much faster and easier it would be, got people excited about using it and lining up to be the next customer. We started partnering with other groups on campus and doing proof of concepts together.

5. Base your design choices on mass adoption and impact.
When working on the project you need to think of what the best design choice is once the project is in production. This is easier said than done because the future is unknown. Essentially, don’t knowingly make design choices that will cripple your project in the future once it is in production. Think of the potential benefits to the organization if what you are working on is implemented at scale. It is a balancing act to get things out and get things perfect, find this balance that your organization finds acceptable, I think it is different for each feature and for our case, each data field. In other words, prioritize the features so you can make sure to pay special attention to the high priority ones.

6. Document everything.
Increase speed of onboarding new members and for remembering the rationale behind the decisions you made at the time. This will save you a lot of time down the road. Now that we are on year five of the project, sometimes we will come back to old code and tables, and wonder why we designed them the way they are; this is not ideal and we should have done a better job at documentation. This is less of an issue with the newer portions of the project but in the beginning we did not do much documentation and this has slowed us down recently. Members of the team could leave during a critical stage of the project and without documentation, it is very hard to progress at the same rate. Another thing is to at least version your documentation, so even if you don’t make it a habit to update it after things change in production, at least you should know when it was last updated.

7. Embrace change.
There’s always a lot of change to be expected when working on a small project. Decisions about the future of the project can be made almost any time, especially since there are very few services that are being provided in the beginning. The decisions could be made without you in the room as well. Funding and resources can be reallocated which can severely impact your project. We were lucky in that we didn’t have any of these happen to us but I think that is because we were able to overcome the major technical roadblocks in the beginning.

Now that we are an enterprise-backed service, things have definitely changed, and though I am proud that the project has progressed so far, I am a little nostalgic of the time when we were accomplishing milestones of what felt like every week. You should definitely enjoy the time when the project is just starting and small because it will never get smaller after it gains momentum.

The Person Entity project team now has around 3 full-time members, including one member strictly focusing on data quality. The project is now part of a larger data team, with a portfolio of data systems to manage like Tableau and Cognos. We provide some or all of the data & dashboards for all the different schools on campus such as Financial Aid, Registrar dashboards, Admissions, etc. We now have integration between cVent, Campaign!, the mandatory education modules, myUSC, and with plenty more on the way. We are still far from completing our task of extracting all the data from our legacy student information system, and will steadily continue to work on it, as well as moving our entire infrastructure to the cloud.

Guest Author, Stanley Su is currently a data architect on the Enterprise Data and Analytics team at USC. Stan was one of the original members on the Person Entity project, an enterprise data layer with integrated data from multiple systems at USC, and is the current lead on the project. During the Fall semester, he also TAs for a database class at the Marshall School of Business. Stan is interested in using technology to increase business efficiency and reduce repetitive tasks at work.

Join the discussion

Comments ( 1 )
  • Alpha Data Wednesday, February 13, 2019
    Naturally, all of these exclusive structures make it tough to combine the statistics to make better choices. A plethora of methods to extract statistics from those structures along with one of a kind business intelligence time intervals make it less difficult to make mistakes each time a file needs to be created. Even even though many agencies on campus want to extract/combine the same type of information, each organization needs to for my part do the work to extract it, which wastes time and resources throughout the complete organization.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.