Skip to content
@Miniature-Distributed-System

Distributed Systems

Miniature Distributed Systems

Note:- This readme is still half-baked :(

So WTF is this?

  • Its a mini version of distributed computing systems. More specifically it is comprises of a Webapp, server and worker units. These units work togather to provide compute power to the clients.

Who is it targeted for ?

  • What you need to know was this was a fun project that was architected by me (Tejas Udupa), and we found out where this could be implimented even though its also not the proper use case for our product.
  • At the moment its aimed at Hospitals that generate data from various devices but cannot afford to buy server's but have large data which if processed can help them analyze the patient problems more quickly.

So what is it that we are processing ?

  • We are processing Log/data generated by ECG and various machines (btw i have not seen this data its just a use case so spare me).
  • These machines generate a lots of records in a single file for every poll (These machines use polling to get data from sensors on patients body.) they may for example may contain timestamp, heart rate, blood pressure, breath rate, blood composition parameters and these data could be logged every second.
  • These are huge files even though the filesize to us humans looks small(in order of MBs), but good luck going through it :).
  • So these files need processing to get charts and various analysis information (charts and stuff).
  • Now this could be solved with a single python script but when there are multiple patients and multiple files per patient it becomes a mess on the local system.
  • So move eveyrything to the cloud :). Thats where we come !
  • We have a webapp accecible via browser (still WIP). The clients will login into their accounts managed by the administrators. Once logged in they can do set of opererations:-
    • Upload new data file
    • Delete old data file
    • Process uploaded data file
    • View analysis result of processed data
    • Delete analysis data

So how does all this magic happen?

  • As previosly mentioned it comprises of web app, server and worker units lets see how they communicate and work togather. Words used here may or may not be present in the actual system I am trying to explain in as simple words as possible.

WebApp

  • Step 1:
    • When the client uploads data file to the webapp it is pushed into a SQL Server DB (this database is subject to change in the near future. Thnx Microsoft) in the 'user data' table.
  • Step 2:
    • The Database would create a record for this data with various parameters which are all set to NULL/not present/not applicable.
  • Step 3:
    • The client then submits file for processing, now this is where the fun begins. Step 4:
    • The Database sets the file record's submitted for processing as true.

Server

  • Step 1:
    • Server is polling the Database continuosly and checking the 'user data' table for any updates.
    • The Server keeps a timestamp and checks against all the latest timestamps in the database to see any new records were submitted and checks their 'submitted for processing' (Note this is also somewhat over-ridden cause this was initally a VTU project for my final year and many features had to be purged to meet deadlines.)
    • The Server pulls all the names of the files to be processed and passes it to the DataExtactor which extracts the information from the file and creates a data structure which will be used later for sending out the data.
  • Step 2:
    • The DataExtactor schedules the Data to be sent out the worker nodes.
  • Step 3:
    • The SenderCore checks for avialable Workers. It finds the ideal worker for sending out this data for processing.
  • Step 4:
    • The Selected Worker is queued with this data and when the assigned worker responds by locking a websocket coonection the Worker is sent this data in form of a json string.
  • Step 5:
    • The Server polls the worker for a acknowledgement of sent data and if not received it pushes the data again and waits.

Worker

  • The worker units are also known as volenteer units as they are hosted by users who have volenteered to donate a part of their PC for processing. The volenteer host has the worker program running on thier system. Its configured to use a certain amount of resources on the host system therfore the program doesn't overrun its resource allocated.
  • Step 1:
    • Worker has the websocket always running.
    • When it receives the data from the server it passes it the Receiver unit and pre-processes the packet in form of json string if the packet is corrupted the packet is dropped and ERROR is sent out if partially corrupted else abandonded.
    • If packet is required format then the acknowledgement is sent out to the server.
  • Step 2:
    • The Received data is analysed by the Receiver and checked weather it is a User data or Template (This template is a format the user data needs follow and rules for processing this data). And if found to be a User data (one sent by client in webapp) then it extracts feilds from json and sends it to DataProcessor unit.
  • Step 3:
    • The DataProcessor checks validity of data against the rules(if against rules ERROR is sent out to server) and deletes duplicate data for ease of processing.
    • The DataProcessor once done with its job it pushes it to the final unit Algorithm.
  • Step 4:
    • Algorithm unit is the final unit. Every user data has a algorithm specification. The Algorithm module checks this and uses the correct the correct algoirthm for processing the data. The final processed result is generated and sent out to the Sender unit.
  • Step 5:
    • The Sender unit creates a packet and sends it out to the Server.
    • Sever updates this in the Database with result and cleaned data from worker.

Server Again

  • Step 1:

    • The server connects back to the worker and receives the result from worker.
    • The results are prevalidated.
    • The packet is pushed into receiver sink.
  • Step 2:

    • The packet is popped from the receiver sink by the Packet processor module.
    • The packet processor extracts the json packet feilds from the Packet.
    • The packet is identfied as either a Message or Result data. Messages could be acknowledgment or failure types. -Step 3:
    • If packet Message type is ACK then ACK mask is analysed the Outgoing Data Registry is updated for this packet/data.
    • If packet Message type is ERROR then error mask is analysed and necessary actions are taken (packet transfer failure or data has error).
    • if packet is data then the data is validated and updated into the Database repository. later the Outgoing Data Registry is also updated.
  • This marks the end of the whole Webapp -> Server -> Worker -> Server Again pipeline.

  • Receiver, DataProcessor and Algorithm from a pipeline also known as User data pipeline. These units run one after another.

  • Note:- Worker has an internal scheduler to manage Receiver, DataProcessor and Algorithm units, they are not continuosly executed rather time slotted and executed. Each unit gets a slice of the CPU time and re-queued if they overrun their time slot, This allows other processes/pipelines to have a chance at using the CPU so large data don't hog the CPU and Higher proiority tasks get completed quicker.

  • Server and Worker has its own Docs please refer them for more information (Even these are WIP and are subject to change when new changes are introduced as its still in its early stages.)

Setup/Requirements

  • Ubuntu 20.04 or lower/Debian 9 or lower (this is due to mssql libs support issue on newer builds)
  • mssql library for MS SQL server (Look up online fourms for install guide) Note:- password and database name should match that of configs.cpp or you need to edit configs.cpp or during server start up else server won't initilize and will throw error.
  • boost libs (recommended to install 1.66 but latest should work without any issues).
  • sqlite3 libs for worker (This is no more being used and will be soon removed. We have moved to flatfile system).

Build & Run

  • Download the source code.

Worker

  • Go to working directory cd worker_unit
  • type make clean followed by make
  • After successful build navigate to /bin where the binary is created
  • Launch the binary in terminal ./worker
  • Enter the IP address and port number of server and threads the worker can use.
  • Once done worker will start logging process and standby for server to send data.

Server

  • Go to working directory cd server_unit
  • type make clean followed by make
  • After successful build navigate to /bin where the binary is created
  • Launch the binary in terminal ./server
  • Enter the IP address the server will be using. (use 0.0.0.0 if server and worker on localhost, basically for testing purposes.)
  • Enter port number on which server will be listening on.
  • Enter threads server is allowed to use.
  • Enter the Database information if left empty it will take default configs from configs.cpp.
  • Once done server will initilize and will wait for worker units to connect and webapp to populate database.

Immidiate Changes

Things that need to be done now, critical stuff


  • [Worker] Remove global objects and use Dependency injection.
  • [Worker] Cleanup json libs and use heapsy.
  • [Server] Remove global objects and use Dependency injection.
  • Encryption (AES/DES/SHA etc) of packets and CRC check for data.
  • Move from Makefile to CMAKE.
  • CI/CD Flows for automated building/testing.
  • Updation of existing Test Modules.
  • Check if all Modules of Server/Worker are working as expected.
  • Move from MYSQL to MongoDB (Thanks to microsoft for not updating its libs, which has broken installs on new Linux releases)
  • Custom Test methods and env.
  • Fix/Update Existing test modules and make use of DI.
  • File Database Access code improvements.

Future Changes

Things that will be done once above is complete and ideas for future


  • [ASAP] Think of better use case for the project (shifting later would cause a lot of hassel).
  • [ASAP] Seperate Queues for different task priority.
  • [ASAP] More Refined test modules and test for all modules in various scenarios so as to test the whole infrastructure.
  • [ASAP] CI/CD test logging and reporting.
  • [Near Future] Logger Module, This module will log to file using sockets or any other IPC construct.
  • [Near Future] CLI Interface and Interpretter for various commands.
  • [Near Future] Introduction of New metrics for better worker cost calculation in Server.
  • [Near Future] Introduction of Better metrics which calculate worker load more accurately in Worker.
  • [Future] Add support for both Serial/parellel algorithm processing.
  • [Future] Build WebApp either ASP.Net or another framework.
  • [Far Future] Multi-server support.
  • [Far Future] Algorithm building in WebApp and processing of algorithm in Worker aka Templates will not only have data but also Algorithm logic for processing the data.

Yeeey I wanna contribute :))!

  • Contact me on Gmail
  • Contact me on Telegram: @trax85

Pinned Loading

  1. MCU-WEB-LAYER MCU-WEB-LAYER Public

    HTML 1

  2. Server-Unit Server-Unit Public

    Server unit serves Workers with tasks and receives results. These results are updated into Database for displaying on WebApp.

    C++

Repositories

Showing 4 of 4 repositories

Top languages

Loading…

Most used topics

Loading…