Distributed Computing
Distribution, Part II: The History of Distributed
                    Computing
In the beginning...
●
    The first thing you probably think of is
    Mainframe Computing
    –   That’s distributed right?
    –   The computer’s over there, my terminal is over
        here…
    –   There are many terminals, gotta be distributing
        something right?
●
    But this isn’t distributed computing, as all the
    compute is in one place.
In the beginning….
●
    Distribution first arose when you could have
    multiple computers as a single organisation.
●
    Problem is one of resource sharing (on
    ARPANET circa 1976 no less).
●
    Actually predates the TCP/IP stack.
    –   Used NCP, the Network Control Program.
●
    Most RPC stacks were hack jobs for single
    purpose systems.
Scalability? Who needs that?
Xerox PARC
●
    Special projects research lab owned by Xerox (you’ll
    likely know them for their printers)
●
    Invented a Xerox specific RPC system for Xerox
    machines.
●
    This was based on Xerox’s understanding that the
    future of computing would have many computers per
    organisation.
●
    They also invented the first GUI.
    –   A little company called Apple stole it though.
Sun ONC RPC
●
    The year is 1984, and Sun Microsystems has
    not invented Java yet.
●
    They do have very cool Unix systems based on
    RISC architectures though.
●
    And they have a problem:
    –   Hey, I need that file from over there, but I don’t want
        to pay for it to be shipped in floppy disk format to
        me. Surely I can use the company network to get it,
        right?
Sun ONC RPC
●
    Sadly no, you could not, as remote file mounts had
    not been invented.
    –   Actually remote anything was a bit of a far-fetched idea
●
    So naturally, Sun invented the Network File System
    (NFS)
    –   The descendent of this system runs the home directory
        shares in the labs!
●
    This included a means of remotely using file
    systems (via RPC) called Open Network Computing
    Remote Procedure Call.
Sun ONC RPC
●
    ONC RPC was wildly popular, as it was both
    open source (BSD) and was generic and well
    structured.
●
    Problem was this only defined a RPC protocol,
    not a library that actually did it.
●
    So unless you were using C, you were going to
    have a bad time.
DCE/RPC
●
    In the early 1990’s, IBM got around to doing
    RPC properly
    –   Of course they couldn’t do it themselves, so they
        got HP, DEC, and even Sun together to help out.
●
    The Open Software Foundation defined a new
    RPC framework called the Distributed
    Computing Environment.
DCE/RPC
●
    DCE was super cool:
    –   Included a common way of doing authentication
    –   Had the first built in time service
    –   Integrated DNS
    –   Distributed File System
    –   And a Remote Procedure Call system
DCE/RPC
●
    DCE was super cool:
    –   Included a common way of doing authentication
    –   Had the first built in time service
    –   Integrated DNS
    –   Distributed File System
    –   And a Remote Procedure Call system
●
    Wait a minute… That reminds me of Windows
    Domains….
DCE/RPC
●
    DCE is still just a guideline though
    –   Didn’t really have anything greater than a C
        implementation
    –   That said this was the days of Unix…
●
    Was hugely popular with larger organisations,
    especially now that the IBM PC was gaining
    serious traction.
CORBA
●
    Common Object Request Broker Architecture
●
    What if you’re not using C?
●
    What if you don’t like using these big, bloated
    frameworks?
●
    What if you just want two dang programs to
    communicate on two computers?
●
    You use CORBA, that’s what you do!
CORBA
●
    Directly competed with DCE
●
    Didn’t have any of that fancy pants
    authentication/time/file system addons
●
    Just let you define interfaces for computer
    programs to use.
●
    Actually was an integrated system for doing so
    (not just a set of guidelines and some C
    integrations)
CORBA
●
    CORBA is built around the idea of Object
    Request Brokers (or ORBs)
●
    ORBs are a middleware service that allow
    languages to communicate over the network
●
    ORBs are designed to be cross compatible,
    regardless of architecture or underlying
    language.
●
    ORBs are represented as objects, which allows
    the system to hide nasty code inside classes.
CORBA
●
    Objects that define interfaces to the internet?
    That sounds like a Component!
    –   And CORBA agreed… eventually
    –   Added support for all the bloat-features of DCE, but
        they were optional.
●
    CORBA had ORBs for each OO language
    –   C++, Java, etc
    –   Your connection objects simply inherited from
        whichever ORB class was present.
CORBA
●
    Why was this all so cool?
    –   There was still no standardised format for passing
        data around the internet.
    –   CORBA provided one that was language and
        system independent.
    –   It also provided a language independent means of
        writing the interfaces, meaning clients and servers
        were implementation independant!
    –   CORBA is still around today (although not popular)
CORBA IDL
CORBA IDL Process
CORBA Today
●
    Good idea, but there were problems
    –   Spec was hugely complicated because the ORBs
        were written by different vendors.
        ●
            Who charged a lot
        ●
            ORBs turned out to not be as interoperable as promised
    –   Competing less expenive frameworks killed off the
        project
        ●
            Java RMI was free and did the same thing by 1999
        ●
            Also, Microsoft
We’ve forgotten someone important
Microsoft is Distributed Computing
●
    Microsoft has dominated distributed computing
    since the mid 90’s.
●
    This is because Microsoft has based their entire
    OS line around the idea of many computers in
    enormous distributed systems since Windows
    3.11 with Workgroups
●
    This idea has been the key to Microsoft’s
    success throughout the years.
DLLs
●
    The DLL is the fundamental building block of
    modern Windows systems
●
    Very similar to Unix Shared Object libraries
    –   They are linked at run time
    –   Can also be linked at compile time
    –   Language neutral
●
    However, DLLs support Late Binding.
DLL Late Binding
●
    The “killer feature” of DLLs is that functions can be
    bound by name
    –   At run time, the OS can search the DLL for a specific
        function name
●
    This means that applications can check for missing
    DLLs and DLL compatibility issues at run time.
    This can avoid crashes and allows for dynamic
    coding.
●
    However, this is slower and there are no compile
    time checks.
DLL Functions in C or C++
●
    All declarations in DLLs are prefixed with
    __declspec(dllexport)
    –   This includes all classes and functions
●
    An alternative way includes a .def file
    –   This allowed for ordinal positions of functions
    –   But this is not well used, and so not very popular
DLL Definitions in C#
●
    Are just class libraries
    –   Ie groups of classes that work together
●
    These have no special rules and can simply be
    compiled via Visual Studio.
Calling Functions in DLLs
●
    Using C++/C, compile against header file and .lib
    file
    –   The .lib file contains a stub to perform the DLL lookup
●
    Otherwise you need to use the Windows API
    –   Example of this on the next slide
    –   Different languages do this differently
    –   COM DLLs must be handled differently
    –   .NET DLLs need the .NET common language runtime
DLL pros
●
    Exe files are smaller as DLLs are incorporated
    at run time
    –   Disk space use is less too as you only need one
        DLL for many applications
●
    Can share in memory DLL code amongst all
    DLL apps
●
    Upgrading a DLL upgrades all client
    applications
DLL cons
●
    Versions of the DLLs used by an application
    must be compatible with each other and the
    application
    –   Bad upgrades can break every app that uses it
●
    Dependencies are outside of the compiled
    application
●
    Security issues exist with “by name” access
    –   Name clashes?
DLLs today
●
    Very old by component standards
    –   Have existed since OS2 times.
●
    More a component container system
●
    DLLs can be normal, COM, or .NET
    components.
    –   Modern .NET systems allow all compiled code to
        act as DLLs. Even EXEs!
●
    So you will probably use DLLs in industry.
What is COM?
●
    COM: Component Object Model
●
    Also known by it’s cool rebranded name
    ActiveX
●
    Developed out of Microsofts Object Linking and
    Embedding architecture (OLE)
    –   OLE allowed one application to host objects from
        another
    –   This is what lets you embed Excel spreadsheets in
        Word.
What is COM?
●
    COM is OLE extended via CORBA lines
    –   Interfaces defined by Microsoft’s IDL, MIDL
    –   Interface based RPC (called DCOM)
    –   Name server (the Windows registry)
        ●
            Allows for lookup by GUID rather than name
        ●
            This is hideous, but allows for unique component lookup
            by version/system/machine.
        ●
            Eg: f943b44a-0d95-45e3-90c5-34e841c531b2
        ●
            Seperated into Interface GUIDS (IIDs) and Class GUIDS
            (CLSIDs)
COM GUIDs
●
    Interfaces via their IID are unbreakable
    contracts
    –   This guarantees that clients can rely on them
        forever.
●
    Problem: Interfaces change all the time
    –   Every change of any kind needs a new IID.
    –   This results in huge logistical problems in COM
        projects.
DCOM
●
    Distributed computing was added to COM
    –   COM was just initially for OLE use.
●
    DCOM works much like COM, it just uses DCE/
    RPC to perform COM requests over a network
    interface.
●
    DCOM completely dominated DCE via
    Microsoft’s ever popular EEE approach.
●
    This is still the underlying system behind all
    Windows Networks today.
COM GUIs
●
    Microsoft used COM to allow users to embed
    GUI elements into other applications.
●
    This allows for really easy extensibility of
    Microsoft programs, without needing to know
    how the underlying code works.
●
    This could be generalised to any component in
    a container.
●
    This was eventually renamed to ActiveX
ActiveX
●
    ActiveX directly competed with Java applets.
●
    Microsoft allowed ActiveX integration with IE
    –   This was a terrible, terrible idea.
●
    ActiveX implements a standard component
    interface
    –   IOleObject – defines parameters of GUI controls
    –   IDispatch – allows functions to be called by name.
         ●
             This was also a terrible idea.
COM Today
●
    Still the core of Windows networks.
●
    Very outdated, .NET is the king of the Windows
    Environment these days.
    –   However, lots of COM still exists, so .NET and COM
        have a very well defined interface
●
    Microsoft continues to push .NET and the general
    concept of Web Services out into the world.
    –   However, Google/Amazon has stolen their ideas and
        taken their crown.
Java RMI
●
    In the late 1990s, Java arrived, and brought
    with it Sun Microsystem’s RPC knowledge.
●
    Enterprise Java had a thing called RMI.
    –   Normal Java has it too these days
●
    Remote Method Invocation allows for RPC calls
    without any non-language tools.
Java RMI
●
    Like CORBA, uses a defined interface.
●
    Unlike CORBA, this is entirely defined in Java
    –   Using an…. Interface.
    –   Needs to extend java.rmi.Remote interface.
    –   Then create stub classes from that, and follow
        CORBA process from that point.
Java RMI
●
    Like CORBA, inheritance is used to hide the nasty
    stuff.
    –   Server object inherits from UnicastRemoteObject
    –   Again, no IDL class required.
●
    Java also has a name service for finding components
    –   Called rmiregistry.
    –   It’s a command line program.
●
    Problem: RMI has no inbuilt security integration.
Java RMI Today
●
    Java RMI is still used today
●
    It works pretty well, and provides an all-in-one,
    no frills approach to component distribution.
●
    The only problem is, it’s Java.
    –   And therefore kind of stands alone.
.NET
●
    Microsoft very much liked the idea of Java’s VM
    based, universally compatible features.
    –   Microsoft tried to make a Java implementation in
        1996.
    –   Sun actually sued Microsoft for not following the
        spec.
●
    Eventually though, Microsoft decided to build
    their own Java like system.
    –   This was named .NET, and the native language C#
The .NET CLR
●
    Works like JavaVM
    –   Compiles source code to machine-independant
        byte code (the Common Intermediate Language)
    –   Performs memory management and integrates the
        underlying OS.
    –   Converts byte code into platform specific
        executable code via a JIT (Just in Time) compiler.
    –   Both allow multiple lanuages provided they can
        convert to the CIL.
CLR CIL
●
    Code that compiles to CIL is called managed
    code and is managed by the .NET framework
    –   Better security cause no pointers
    –   Platform independence via .NET VM
    –   However, slower due to JIT compilation
        ●
            This is very nearly not a problem these days due to a lot
            of paravirtualization.
CLR Non-CIL
●
    Code not supported by the CIL is called
    unmanaged code (also unsafe or native code)
    –   Less security
    –   Generally speaking limited in languages (to C++)
    –   C++ and C# both can allow for managed and
        unmanaged code in the same application
        ●
            Although this is discouraged and will be penalised if you
            do it in this unit.
        ●
            Basically there should be very nearly no reason to do
            this.
.NET Remoting
●
    .NET Remoting is a system that essentially
    replaces DCOM for .NET
●
    Is, unsurprisingly, very similar to RMI
●
    However, there is no IDL or visible proxy code
    –   It’s all hidden in the .NET backend.
    –   Remotely-callable server objects must derive from
        MarshallByRefObject.
    –   The server object’s public methods are the RPC
        interface. (very cool)
.NET Remoting
●
    The client must reference the server assembly
    (EXE/DLL)
    –   The client needs access to the metadata of the
        object (kind of like IDL).
    –   .NET does this by referencing the server object.
        ●
            This is kind of like including a header file, but with a lot of
            background magic
    –   This can be avoided with class factories.
.NET Remoting Today
●
    Mostly a legacy system, as Microsoft has a
    newer Web Services compatible .NET RPC
    framework called WCF.
●
    Remoting is still relevant because:
    –   Remoting does not require a web server
    –   Remoting supports binary message formats (which
        are always more efficient than XML/JSON systems)
●
    WCF combines Remoting with Web Services
    –   And a healthy dose of automagic coding.
.NET WCF
●
    The Windows Communications Framework
    (WCF) is an extension of .NET Remoting.
●
    More like RMI as it uses an interface class.
●
    MarshalByRefObject now replaced by
    [ServiceContract] and [OperationContract]
    attributes.
●
    Tons more automatic code generation.
●
    Still pretty much the same as older RPC
    frameworks.
Examples!
●
    For completeness sake, lets look at some
    examples.
●
    These could be useful in a tutorial or
    something….
What are we building?
●
    A Calculator!
    –   More specifically, a calculator add function.
●
    Why on earth are we distributing this?
    –   This may be dumb, but makes the code simple and
        lets us focus on the similarities and differences
    –   Also gives you an idea how easy it is.
●
    Examples of code are very useful as you
    progress through industry! Keep these
    somewhere!
Some Generic IDL
C++ Server DLL
C++ Client
COM Component
●
    We’re not going to include COM.
●
    COM is for all practical purposes deprecated
    –   Has been since before Windows XP.
    –   It’s very ugly in implementation
    –   We’ll be using .NET exclusively…. Soooo….
●
    Moving on.
CORBA – Java (Server)
CORBA – Java (Client)
Java RMI Interface
Java RMI Server
Java RMI Client
Fun fact about RMI
●
    Java RMI’s biggest problem is that it is super
    tightly integrated with Java
●
    For example:
    –   The RMI client actually doesn’t have the stub code
        for the server.
    –   Instead, it downloads it from the server on first
        connect.
        ●
            Both versions of Java must be exactly the same.
        ●
            This has implications for security too, as it must trust the
            code it downloads.
.NET Remoting Server
.NET Remoting Client
Fun facts about .NET Remoting
●
    You may have noticed that we didn’t explicitly create
    an instance of the server object.
●
    Instead we quite lazily registers the server’s class.
●
    .NET loves the idea of making object creation an RPC
    too!
●
    This is cool and all, but can result in code errors where
    you create a client side version of the server side
    object.
    –   This is very hard to detect
.NET WCF Server
.NET WCF Client
Some useful things for WCF
●
    You’ll have noticed [ServiceContract]
    [OperationContract] and [ServiceBehavior]
    attributes.
    –   Just remember, Contracts for the Interface, Behavior
        for the implementation.
●
    You need to build a class factory to use a WCF
    interface
    –   Factories are classes that build other classes
    –   Really just here cause Microsoft found it was a popular
        approach to RPC.
Some More Useful Things for WCF
●
    ServiceBehavior has a lot of fields.
●
    What we’re doing is overriding Microsoft’s default
    single threaded automatically synchronised system.
    –   Why? Because it’s really inefficient. And because we like
        taking our lives into our hands.
●
    Basically, Microsoft will often assume that you mean
    single threaded by default
    –   This is very important, as a lot of programmers come to
        Windows first.
    –   But it sucks for us, so we’ll be overriding a lot.
More WCF stuff?
●
    Also, you can’t pass RPC objects via reference.
    –   Why? Because WCF is service oriented, and so it
        wants to force you as the client to come to it.
    –   This fixes a lot of OO problems over the network.
    –   Objects can be passed by value though.
        ●
            These aren’t server objects though, they’re data objects.
Why do people hate old systems?
●
    Why are these older systems falling out of
    favor?
    –   Firewalls (block a lot of ports to stop hackers)
    –   Configuration overheads (gotta tell clients where
        servers are, and COM’s GUIDS make changes very
        expensive)
    –   Proprietary
    –   And because the Internet
        ●
            Seriously, why don’t we just use HTTP?
Next Week
●
    The tiering system of basic distributed systems!
    –   You will have some idea of this from this week’s
        tutorial
●
    Asynchronous Communications
●
    Statelessness!