View on GitHub

Linked Representations of Data

Blog for Julia Schmidt, Arizona State University

April 30th, 2018 - May 4th, 2018

We completed our final report this week and I made final attempts to get the backend of the application working. Exams and moving out made it difficult to work on the project for long periods of time.

April 23rd, 2018 - April 29th, 2018

This week, all of us focused on getting ready for the Innovation Showcase at ASU and participating in presenting our work thus far. Being able to showcase our work and explain its impact to other people was really fulfilling, and I actually was able to get some more ideas for future work by talking to people in other disciplines. It was great to talk to people about the intersections between software & sustainability, and help people understand the different definitions of sustainability that we had to sort through during our initial research.

We met as a team earlier in the week, and determined that the program is correctly importing the ontology but is having trouble creating a queriable model. I was unable to get it working before the showcase.

April 16th, 2018 - April 22nd, 2018

I continued to work on the SPARQL queries this week, and am beginning to get worried that there is a greater problem that I’m not understanding. I have been kind of working off of the assumption (dangerous and bad, I know) that the program would be functional once everything was integrated, and am definitely feeling the time crunch.

April 9th, 2018 - April 15th, 2018

We were able to get our poster completed and submitted for the showcase this week, and experienced some challenges with communication. Our documentation has been somewhat lacking during this process, as we move between different iterations and technologies. The division of labor has resulted in each team member having an understanding of a specific part of the project, and we quickly realized that we needed to better communicate about decisions and changes that would affect all of us.

I began integration efforts this week as well, and am running into some issues that I think are stemming from my SPARQL queries. I think it is because of the different structure of the data from the tutorial I have been working with, but I am not sure. I am going to consult with my teammates next week to see if we can figure out what the underlying problem is.

April 2nd, 2018 - April 8th, 2018

As noted last week, I have not had a lot of time this week due to familial obligations. However, I have been able to familiarize myself with the mocked data and create a plan for how to integrate it into the application. Next week, I will begin integration and work on our team poster for ASU’s Innovation Showcase.

March 26, 2018 - April 1st, 2018

We decided to mock the data for now, as there have been barriers to getting everything set. Once this data is prepared, I can start working on it. I have spent some time this week cleaning up the program as it is currently a sort of mashup of examples, tutorials, and my own code. This way, I’ll be able to better integrate the data later.

My family is here from Wisconsin and will be here for the majority of next week, so my available time will be less. I look forward to making up the time by sitting down for a dedicated period of time and working on integrating everything together.

March 19, 2018 - March 25, 2018

We met this past week and figured out which SPARQL queries will be necessary for the functions we have planned for our application. However, we cannot test them or the completed backend without data.

I have been looking for other conferences to apply to without much luck. However, we are preparing to present at ASU’s Innovation Showcase at the end of the year to raise awareness of the project and hopefully conduct general usability studies. This feedback would allow us to improve the application in the future.

March 12, 2018 - March 18, 2018

The resources that I was able to find helped me fix all of the bugs I had identified as far as I can tell, but it was a large task. At this point, I cannot progress further on the application without having data to fill the ontology.

March 5, 2018 - March 11, 2018

I am about 75% finished with a backend framework in Java for our eventual web application. There are a couple of bugs that have taken up an unreasonable amount of time this week, but I was able to locate some resources that I expect to help in the near future.

We were able to finish the GHC paper and get it submitted under Vatricia’s name, as she has not attended the conference before and both Cecilia and I have. However, credit will be given to all three of us if she is accepted.

Feb 26, 2018 - March 4, 2018

This week has been slow due to midterms, but I plan to make it up over break. However, I did finish the paper outline and look forward to submitting the finished paper before the deadline.

Feb 19, 2018 - Feb 25, 2018

I began to modify the resources from a couple of weeks ago to better fit our application. I am running into various problems because I don’t have data to put into the ontology, but I expect this to be rectified by the end of Spring Break.

I have finished refreshing my knowledge of the research, and have detailed notes on each of the papers we have used information from. At this point, I am about halfway through an outline of what I hope the paper for the Grace Hopper Conference will specifically look like. In order to give our group the best chance of being accepted to present, I am using past GHC papers as guidelines for what to include.

Feb 12, 2018 - Feb 18, 2018

This week, I was able to get what feels like a pretty strong knowledge base regarding Jena and Fuseki after our group met. I was able to get our current version of the ontology uploaded to the Fuseki server and queriable via SPARQL. I am optimistic that by the end of Spring Break, I will be able to complete a semi-functional backend for the project.

In addition, I continued retaking notes on our research so that I can eventually condense them into a literature review for our paper.

Feb 5, 2017 - Feb 11, 2018

Dr. Bansal provided us with resources on Apache Jena and Fuseki Server that were really helpful in clearing up my confusion from last week, and I have spent the majority of my time working through those. I am having some issues getting Jena working with Eclipse and with running Fuseki, but I am pretty sure that I have a file named incorrectly and I just have to figure out which one.

UPDATE Feb 11: I had a .jar and a properties incorrectly named. Both are now working.

I also am continuing to work on our paper, but am realizing that all of the research we completed is kind of blurring together. My plan is to go back through the research and condense our notes into a literature review.

Jan 29, 2017 - Feb 4, 2017

This week, I am attempting to do independent research into getting Jena set up on my machine, but I am having trouble figuring out where to start. There are a lot of different applications of Jena and the related Fuseki server, but I am not sure which route is the best way to go for our particular project. During our meeting, Dr. Bansal indicated that she might have some additional resources that I hope will clear some things up for me.

As a team, we have all continued to look for data that would contribute to our indexes. So far, the main category that we are having issues with is the Environmental Sustainability section, but we are realizing that we might have to manually search for and enter the data. Although this is outside the scope of our project, a future endeavor might consist of creating a queriable database of environmental data, similar to the U.S. Census.

Jan 22, 2017 - Jan 28, 2017

We met as a group this week to discuss how we want to move ahead. During our meeting, we were able to complete mockups of the UI for both the web and mobile versions that the entire group was happy and comfortable with. Having these mockups will help us structure the backend of our program to best make the transition between the ontology and the display.

We also discussed various conferences that we could apply to in order to share our work once it is complete. In the coming weeks, I will work on getting a paper draft put together that summarizes our research and progress up to this point.

At the suggestion of Dr. Bansal, I looked further into Apache Jena and hope that we will be able to use it as a method of connecting our ontology to a Java backend. This is something that I am excited about working on, because it is something I have little experience with & a lot of interest in.

Jan 15, 2017 - Jan 21, 2017

I finished the JavaScript tutorial that I was working through but still feel like there are gaps in my knowledge, similar to how I felt when I finished AP Computer Science. I understand syntax and overarching concepts, but really have no idea how to apply them in a meaningful way. I will continue to look for resources and potentially build something so that I can identify and rectify gaps sooner rather than later.

Jan 8, 2017 - Jan 14, 2017

This week, I continued to work through JavaScript tutorials and kept looking for ways to integrate our ontology with the application we will eventually create. At this point, it seems to make sense to develop both a mobile and web application so that we can reach the widest audience possible.

I’m still struggling with exactly how we are going to extract and use the data we have found. The variety in format, completeness, and scale of data seems to mandate a manual approach for most U.S. cities, which seems prohibitive (especially when combined with possible webcrawler data). It is clearly something that merits more research.

Dec 11, 2017 - Jan 7, 2017

Per the post on the CREU Piazza board, I did not complete any work over break (except for making up for lost work over finals; it mostly consisted of working through tutorials). The semester starts this week, so I plan to pick up where I left off with JavaScript and application planning.

Nov 27, 2017 - Dec 10, 2017 (2-week period)

I’ve combined these two weeks because I haven’t been able to get much done due to studying for and taking finals. I plan to make up the lost time over break.

I’ve continued to work through the JavaScript tutorial - no issues there as of yet - and look for more Java/OWL information (not just tutorials). I’ve only found The OWL API so far, but I will continue looking over break and hopefully start to make headway into one of the tutorials.

Nov 20, 2017 - Nov 26, 2017

This week, I’ve been trying to find tutorials bringing OWL and Java together so that we can start the application itself, but information in tutorial form has been surprisingly difficult to find. So far, I’ve found and skimmed through the following:

The rough guide to the OWL API Jena

I’m not entirely sure how we are actually going to be applying either of these, so I will continue to look for more resources so the project is supported no matter the route we take.

I’ve also spent a few hours working through the JavaScript tutorial that I mentioned previously. There’s nothing really interesting to note there, as it’s mostly just basic material so far.

Nov 13, 2017 - Nov 19, 2017

I’ve spent about five hours this week continuing to look for alternate databases with API keys to replace some of the sets we’ve already found that would require manual updates. I have unfortunately not had much luck, but Vatricia is working on a webcrawler that will hopefully be able to fill in some knowledge gaps.

Because we are nearing the completion of our ontology, I think it would be beneficial to put together some ideas for the other half of the project (creating a web/mobile application to display data). I spent the other five hours of my CREU-dedicated time looking through tutorials that will hopefully provide a foundation for web/mobile development, as I have not completed any courses in that area. I was able to gain free access to a Udemy course on Javascript aimed at people with some sort of background in development, and I will try to complete it by the end of winter break.

Nov 6, 2017 - Nov 12, 2017

This week, I worked on locating the datasets that will inform the calculated sustainability rating. The main problem that we will need to deal with is preventing data from becoming outdated. Many datasets are not maintained in a form conducive to frequent updates, and require that data be downloaded in .csv format or something similar. While we can make this work, it will mandate redownloading and processing the data every time an update is published.

So far, I have had some success with data.gov, which is essentially a catalog of federal, state, and local datasets. While there is a wealth of information available, much of it is suspect to the problems outlined above and may not fully serve our purposes. Much of the information on population, demographics, income, and related statistics will come from the U.S. Census and the related American Community Survey. These reports also provide information on facilities and resources in various regions.

We still have not settled on a way to normalize data in a meaningful and consistent way, but we hope to find a viable solution by the end of the year.

Oct 29, 2017 - Nov 5, 2017

Our team met this week to go through possible factors and indicators that could help to create our ontology. We decided that the best way to go about creating a single, all-encompassing index was to split data into three sub-indexes: Social, Economic, and Ecological. These sub-indexes are made up of indicators, which create scores from factors containing datasets. Indicators include concepts like health, the amount of green space in a neighborhood, or the number of entertainment values in an area. We also spent time searching for datasets that would support each factor- for example, CDC datasets indicating suicide life expectancy rates in different areas.

When creating the ontology, the biggest challenge will be normalizing the data in order to create consistent indicators and indexes. We will need to find a way to bring together unlike data such as per capita rates and location counts in a meaningful way that does not unequally weight one over the other.

I continued to search for research on sustainability and ontologies this week, but it really seems like we have exhausted meaningful possibilities.

Oct 22, 2017 - Oct 28, 2017

Unfortunately, I was not able to complete much work this week due to schoolwork. However, I was able to brainstorm some ideas for ways to create our ontology. We will most likely use a multilayered approach to build a main index from other indexes, which will be made up of multiple datasets.

Oct 15, 2017 - Oct 21, 2017

This week, I continued to focus on creating a definition of sustainability that would give our application the greatest amount of relevancy in day-to-day use across the widest audience. As a team, we decided that we would pursue a thin version of sustainability so that we could address environmental concerns alongside financial and societal ones.

In order for thin sustainability to remain sustainable, a safe minimum standard must be established in order to prevent the tradeoff between different capitals does not result in an irreversible loss. Bryan Norton theorized that a safe minimum standard exists that separates low-cost but irreversible tradeoffs from high-cost but reversible tradeoffs. This standard would provide a bit more nuance to the idea of what are and are not acceptable losses.

If we are to develop a standard index that defines the relative sustainability of an area, it would make sense to have a relative idea of what defines an acceptable or unacceptable rating. This safe minimum standard could help us define a baseline.

Oct 8, 2017 - Oct 14, 2017

This week, I tried to find research articles that would provide more guidance, structure, and reputability to the focus areas defined two weeks ago. Two papers, The Difficulty in Defining Sustainability (M.A. Toman, 1992) and Defining Sustainability: A Conceptual Orientation (R.O. Vos, 2007), really stood out to me because they broke down the many definitions of sustainability into concrete parts.

Core Definitions

Both papers provided a core idea of sustainability that seems to be present no matter the other factors taken into account. Toman defines the core of sustainable development as “development that meets the needs of the present without compromising the ability of future generations to meet their own needs” (3), while Vos lists a few important elements that must be addressed in any definition: economy, environment, and society. Vos also notes that sustainability separates itself from other ways of looking at these elements by emphasizing intergenerational equity and going above and beyond mere compliance with laws and regulations.

It will be important for our team to take these core ideas when creating a definition of ‘sustainable’ for our end project.

‘Thin’ and ‘Weak’ vs. ‘Thick’ and ‘Strong’

Based on these papers and other research done this week, sustainability has been split into two types of thought: ‘Thin’ or ‘Weak’ sustainability, which prioritizes the relative well-being of people in society, and ‘Thick’ or ‘Strong’ sustainability, which prioritizes maintaining the environment over all else. While some would argue that these are the same, the line between the two is something that really interested me.

Vos posits that “in terms of the ontology of nature, the difference in thickness for definitions of sustainability is how much of nature is valued intrinsically”. Basically, thin sustainability seeks to ensure that the overall capital value present in society- natural, financial, technological, etc- must remain undiminished for future generations, even if the balance of these resources changes. Thick sustainability views any diminution in the value of natural capital as a negative, even if it results in growth of other areas.

For the purposes of our project, it seems that the thinness or thickness of our definition will differ based on the audience. Families looking to buy homes will look for stability; corporations will look for growth in financial areas; environmentalists will look for growth in nature. Perhaps we could implement separate sustainability ratings in order to appeal to a wider audience.

Oct 1, 2017 - Oct 7, 2017

This week, I spent the majority of my time at the Grace Hopper Celebration of Women in Computing. While I was there, I had the opportunity to look at the research others were presenting in order to get an idea of what our team should be looking to complete for conferences next fall. I also reviewed papers and tutorials from the rest of the project in order to refresh my memory. During this review, I looked for different ways that our team could define sustainability. An important step will be to give ourselves a defined scope in order to avoid attempting to go too deep or do too much with the time & labor resources that we have. In addition to the focus areas defined in last week’s post, beginning with a list of perimeters for our database will prevent excessive information gathering and redundant or irrelevant work. Next week’s tasks will include defining some of these perimeters and brainstorming specific areas of sustainability that must be included in the project.

Sept 24, 2017 - Sept 30, 2017

I continued reading papers discussing ontology creation and use this week, including Building Urban LOD for Solving Illegally Parked Bicycles in Tokyo, Linked Data Analytics in Interdisciplinary Studies: The Health Impact of Air Pollution in Open Areas, and A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns. The last paper revolved around ASP, an acronym with which I was not immediately familiar. Once I refreshed my knowledge of the framework, it became easier for me to process the paper. Based on these papers and our team discussion on Monday, I began to put together some general areas of ‘sustainability’ that it might be beneficial to focus on during the project.

Focus Areas in Sustainability Ratings

When selecting focus areas, I felt that there were some key factors that should be considered in order to maximize functionality and applicability.

Relevancy

The first factor was relevancy, or whether or not the data would actually be used in a public app or website. This is important because the large amount of available data can not be feasibly connected or represented using our current resources. Cutting out data that will not be often used by our target client base will allow the team to focus on creating quality references and a usable interface.

Potential for Effect

Another important factor to consider is the data’s potential to affect the neighborhood in the long-term. It will be important to include data that has direct consequences on the sustainability of a neighborhood, even if it might not be specifically looked for.

Audience

A final important factor that must be taken into effect is the viewer of the end product. Data such as school quality and social opportunities may not be directly related to ‘sustainability’, but site users looking for information on living in a given area will likely be interested in this data.

Sept 17, 2017 - Sept 23, 2017

This week, I focused on reading through research on ontology creation, including Clinga: Bringing Chinese Physical and Human Geography in Linked Open Data, FOOD: FOod in Open Data, and An Ontology of Soil Properties and Processes. For the sake of brevity, I’ll only summarize one here. I also reviewed various tutorials on important technologies and literature from the last two weeks.

Clinga: Bringing Chinese Physical and Human Geography in Linked Open Data

Before this project was completed, there was a severe lack of geographical data based on Chinese names. Only 37% of China-based geographical features were referenced by Chinese names. Clinga was developed in order to better identify Chinese geographical features, their relations, and their Chinese names.

In order to create ontologies, Clinga used data gained from Baidu Baike. This database was chosen for the depth and breadth of information, but inconsistencies in its infrastructure and linking required a great effort when crawling for data. First, information was acquired from concrete points, such as titles, infoboxes, and disambiguations. The high levels of inconsistency required that a new ontology be created. The foundation for the ontology was created through heuristic rules and then refined through a machine learning algorithm.

Once entities were ID’d, properties for each entity were extracted from infobox keys. Finally, Clinga was linked to other databases to promote knowledge sharing. This new ontology was determined necessary because while other work has been done in this area, it has not gone as in-depth and has not provided as complete of a database as Clinga does.

Sept 10, 2017 - Sept 16, 2017

I wasn’t able to get much done this week due to career fair and class obligations. I plan to make up this time over the next few weeks.

Sept 3. 2017 - Sept 9, 2017

This week, I worked on setting up this blog and working through tutorials for XML/XMLSchema, including RDF. I also read through research pertaining to our project that focused on Linked Data, Big Data, and The Semantic Web. These papers led me to do additional research on topics that I did not entirely understand and build a library of resources for the project.

The next steps will be to figure out a plan of action for the group and assign roles and tasks.