The Legislative Data Web Viewer

Publishing data in structured data form often brings many possibilities. For example, we can build a fully-featured web viewer for inquiry and hansard records. The application would make legislative data more accessible for users. We built a demo viewer to show some possibilities, based on the parsed data.

Backend

The project consists of many components. The backend makes the data accessible as a set of APIs. Then, we used django and django-rest-framework to build them. The legislative data is then stored to a PostgreSQL database through django-orm. So the first step is to replicate the schema into the ORM, as shown in the example below:

class Hansard(models.Model):
 present = models.ManyToManyField(
     Person, related_name="hansard_presents", related_query_name="hansard_present"
 )
 absent = models.ManyToManyField(
     Person, related_name="hansard_absents", related_query_name="hansard_absent"
 )
 guest = models.ManyToManyField(
     Person, related_name="hansard_guests", related_query_name="hansard_guest"
 )
 officer = models.ManyToManyField(
     Person, related_name="hansard_officers", related_query_name="hansard_officer"
 )
 akn = models.TextField()

class Inquiry(models.Model):
 is_oral = models.BooleanField()
 inquirer = models.ForeignKey(
     Person,
     related_name="inquirers",
     related_query_name="inquirer",
     on_delete=models.PROTECT,
 )
 respondent = models.ForeignKey(
     Person,
     related_name="respondents",
     related_query_name="respondent",
     on_delete=models.PROTECT,
 )
 number = models.IntegerField()
 title = models.CharField(null=True)
 akn = models.TextField()

 class Meta:
     ordering = ["number"]

One of the notable models is the Hansard, where it can contain both question sessions, and speeches. We simplify the saving of both types by using a computed property named debate, as shown below.

 @property
 def debate(self) -> list[Speech | QuestionSession]:
     return list(
         sorted(
             chain(
                 self.speeches.all(),  # type: ignore
                 self.sessions.all(),  # type: ignore
             ),
             key=lambda item: item.idx,
         )
     )

The corresponding serializer for Hansard and debate field would then look like,

class HansardSerializer(FlexFieldsModelSerializer):
 class Meta:
     model = Hansard
     fields = ["id"]
     expandable_fields = {
         "present": (PersonSerializer, {"many": True, "read_only": True}),
         "absent": (PersonSerializer, {"many": True, "read_only": True}),
         "guest": (PersonSerializer, {"many": True, "read_only": True}),
         "debate": (DebateSerializer, {"many": True, "read_only": True}),
     }


class DebateSerializer(serializers.BaseSerializer):
 def to_representation(self, instance: Speech | QuestionSession) -> dict[Any, Any]:
     return {
         "type": type(instance).__name__,
         "value": (
             SpeechSerializer(instance, many=False, read_only=True)
             if isinstance(instance, Speech)
             else QuestionSessionSerializer(instance, many=False, read_only=True)
         ).data,
     }

Then we implement API endpoints through rest-framework viewset, as follows

class InquiryViewSet(ReadOnlyModelViewSet):
 queryset = Inquiry.objects.all()
 serializer_class = InquirySerializer


class HansardViewSet(ReadOnlyModelViewSet):
 queryset = Hansard.objects.all()
 serializer_class = HansardSerializer

Lastly, we import the data through a management command, as shown in the documentation.

Frontend

The frontend is built using ReactJS. The web viewer is a simple application, so the implementation is straightforward. We used react-router-dom's BrowserRouter to handle routing. API calls are also managed there, as each view only requires an API call.

const router = createBrowserRouter([
 {
     path: "/",
     element: <Root />,
     children: [
         {
             path: "hansard",
             element: <HansardList />,
             loader: async ({ request, param }) => fetch("/api/hansard.json"),
         },
         {
             path: "hansard/:hansardId",
             element: <Hansard />,
             loader: async ({ request, params }) =>
                 fetch(
                     "/api/hansard/".concat(
                         params.hansardId || "",
                         ".json?",
                         new URLSearchParams({ expand: "~all" }).toString(),
                     ),
                 ),
         },
    …
]);

We then use react-toolkit to manage states. In our application, sometimes a table has both text and image representation. Users can then click on a toggle button to switch between the two.

Sharing is also made easier through a button. Users can then click on it and copy the link to any snippets.

The application uses Bootstrap for the general look and feel. For our simple proof-of-concept application, it works well.

Search

The application offers search through OpenSearch. On the backend side, we implemented it through the use of django-opensearch-dsl. Since the application doesn't require much configuration yet, the structure of the documents stored in OpenSearch looks like this:

@registry.register_document
class InquiryContentDocument(Document):
 inquirer = fields.ObjectField(
     properties={"name": fields.TextField(), "raw": fields.TextField()}
 )
 inquiry = fields.ObjectField(
     properties={
         "title": fields.TextField(),
         "id": fields.IntegerField(),
         "number": fields.IntegerField(),
         "is_oral": fields.BooleanField(),
     }
 )

 class Index:
     name = "inquiry"

 class Django:
     model = InquiryContent
     fields = ["id", "value"]

The corresponding serializer and view would look like this

class InquiryContentSearchSerializer(ContentElementSearchSerializer):
 parent_type = "inquiry"
 document_type = "inquiry"

@api_view(["GET"])
def search(request: Request, format=None) -> Response:
 result = None
 data = SearchData(
     **{key: request.query_params.get(key) for key in request.query_params.keys()}
 )

 if not data.query:
     return Response("Bad search request", status=status.HTTP_400_BAD_REQUEST)

 match data.document_type:
     case "inquiry":
         hits = (
             InquiryContentDocument.search()
             .query(
                 "multi_match",
                 query=data.query,
                 fields=["value", "inquirer.raw"],
             )
             .highlight("value")
         )
         serializer = InquiryContentSearchSerializer(hits, many=True)
         result = Response(serializer.data)

      ...
     case None | _:
         result = Response("Bad search request", status=status.HTTP_400_BAD_REQUEST)

 return result

Serving the application

We use gunicorn to serve the application backend, as most django applications. On the frontend side, vite first build the final application. Caddy then serves the built website, and forward all the application to gunicorn.

:8080 {
 encode zstd gzip

 root * /data
 file_server

 reverse_proxy /api backend:8000
 reverse_proxy /api/* backend:8000
}

Container

Deployment is hard, especially when we want to ensure a consistent execution environment. Shipping applications in containers is one of the solutions. Building the frontend and backend is a trivial task. The Dockerfiles are straightforward and self-explanatory.

The DBMS, which is PostgreSQL, does not need much configuration. OpenSearch though, require some configurations. We created a script to generate certificates for the server configurations. Then we generate an internal-users.yaml file for the admin user. As the file contains passwords, we exclude the file from the code repository.

A sample docker compose file would look like follows:

services:
  database:
 image: "postgres:16"
 env_file:
   - .env.docker
 networks:
   - legisdata

  frontend:
 build:
   context: .
   dockerfile: podman/frontend/Dockerfile
   name: legisdata_frontend
   no_cache: true
   pull: true
 ports:
   - 0.0.0.0:8080:8080
 env_file:
   - .env.docker
 networks:
   - legisdata

  backend:
 build:
   context: .
   dockerfile: podman/backend/Dockerfile
   name: legisdata_backend
   no_cache: true
   pull: true
 volumes:
   - ./certificates/root/root-ca.pem:/app/root-ca.pem
   - ./certificates/admin/admin.pem:/app/admin.pem
   - ./certificates/admin/admin-key.pem:/app/admin-key.pem
   - ./certificates/node/node.pem:/app/node.pem
   - ./certificates/node/node-key.pem:/app/node-key.pem
 env_file:
   - .env.docker
 networks:
   - legisdata

  search-node:
 image: opensearchproject/opensearch:latest
 env_file:
   - .env.docker
 environment:
   - node.name=search-node
 ulimits:
   memlock:
     soft: -1 # Set memlock to unlimited (no soft or hard limit)
     hard: -1
   nofile:
     soft: 65536 # Maximum number of open files for the opensearch user - set to at least 65536
     hard: 65536
 volumes:
   - search-data:/usr/share/opensearch/data
   - ./podman/opensearch/usr/share/opensearch/config/opensearch-dev.yml:/usr/share/opensearch/config/opensearch.yml
   - ./podman/opensearch/usr/share/opensearch/config/opensearch-security/internal_users.yml:/usr/share/opensearch/config/opensearch-security/internal_users.yml
   - ./certificates/root/root-ca.pem:/usr/share/opensearch/config/root-ca.pem
   - ./certificates/admin/admin.pem:/usr/share/opensearch/config/admin.pem
   - ./certificates/admin/admin-key.pem:/usr/share/opensearch/config/admin-key.pem
   - ./certificates/node/node.pem:/usr/share/opensearch/config/node.pem
   - ./certificates/node/node-key.pem:/usr/share/opensearch/config/node-key.pem
 ports:
   - 9200:9200
   - 9600:9600
 networks:
   - legisdata

volumes:
  search-data:

networks:
  legisdata:
 name: legisdata

Closing thoughts

The website is a proof-of-concept, though it shows what is possible when data is available. With a more accessible interface, it could ease data navigation and discovery. As the data is also published as a set of API, it is also possible to integrate into other applications. This further extends the possibilities and helpfulness.